Translating text using generated visual representations and artificial intelligence

US12499332B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12499332-B2
Application numberUS-202217954845-A
CountryUS
Kind codeB2
Filing dateSep 28, 2022
Priority dateSep 28, 2022
Publication dateDec 16, 2025
Grant dateDec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificial intelligence techniques; generating a tokenized form of at least a portion of the at least one visual representation; and generating an output including a translated version of the input text into at least a second language by processing, using a second set of artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one visual representation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: generating a tokenized form of at least a portion of input text, wherein the input text is in a first language; generating, as output from utilizing a first set of one or more artificial intelligence techniques, at least one image representation of at least a portion of the input text by mapping portions of stored image data, the portions of the stored image data selected in connection with processing the input text using the first set of one or more artificial intelligence techniques, into portions of the at least one image representation of the at least a portion of the input text, wherein the first set of one or more artificial intelligence techniques comprises at least one neural network-based autoregressive transformer trained on the stored image data; generating a tokenized form of at least a portion of the at least one image representation; generating an output comprising a translated version of the input text into at least a second language by processing, using a second set of one or more artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one image representation, wherein the second set of one or more artificial intelligence techniques comprises at least one neural network-based multimodal translation transformer; automatically training, using feedback related to the generated output, at least one of the at least one neural network-based autoregressive transformer and the at least one neural network- based multimodal translation transformer; and automatically executing, subsequent to the automatic training, one or more machine translation operations using the at least one of the at least one neural network-based autoregressive transformer and the at least one neural network-based multimodal translation transformer; wherein the method is carried out by at least one computing device. 2 . The computer-implemented method of claim 1 , further comprising: automatically training the first set of one or more artificial intelligence techniques by processing a training set of text data and processing image data corresponding to the training set of text data. 3 . The computer-implemented method of claim 2 , wherein automatically training the first set of one or more artificial intelligence techniques comprises using visual representation loss-related techniques. 4 . The computer-implemented method of claim 2 , further comprising: automatically training the second set of one or more artificial intelligence techniques using (i) a tokenized combination of a visual representation of the training set of text data and the training set of text data, and (ii) a tokenized combination of the image data and the training set of text data. 5 . The computer-implemented method of claim 4 , wherein automatically training the second set of one or more artificial intelligence techniques comprises using translation loss-related techniques and consistency loss-related techniques. 6 . The computer-implemented method of claim 1 , wherein generating the output comprises mapping, using the second set of one or more artificial intelligence techniques, one or more portions of the tokenized form of at least a portion of input text to one or more portions of the tokenized form of at least a portion of the at least one image representation. 7 . The computer-implemented method of claim 1 , wherein software implementing the method is provided as a service in a cloud environment. 8 . A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to: generate a tokenized form of at least a portion of input text, wherein the input text is in a first language; generate, as output from utilizing a first set of one or more artificial intelligence techniques, at least one image representation of at least a portion of the input text by mapping portions of stored image data, the portions of the stored image data selected in connection with processing the input text using the first set of one or more artificial intelligence techniques, into portions of the at least one image representation of the at least a portion of the input text, wherein the first set of one or more artificial intelligence techniques comprises at least one neural network-based autoregressive transformer trained on the stored image data; generate a tokenized form of at least a portion of the at least one image representation; generate an output comprising a translated version of the input text into at least a second language by processing, using a second set of one or more artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one image representation, wherein the second set of one or more artificial intelligence techniques comprises at least one neural network-based multimodal translation transformer; automatically train, using feedback related to the generated output, at least one of the at least one neural network-based autoregressive transformer and the at least one neural network- based multimodal translation transformer; and automatically execute, subsequent to the automatic training, one or more machine translation operations using the at least one of the at least one neural network-based autoregressive transformer and the at least one neural network-based multimodal translation transformer. 9 . The computer program product of claim 8 , wherein the program instructions is further executable by the computing device to cause the computing device to: automatically train the first set of one or more artificial intelligence techniques by processing a training set of text data and processing image data corresponding to the training set of text data. 10 . The computer program product of claim 9 , wherein the program instructions is further executable by the computing device to cause the computing device to: automatically train the second set of one or more artificial intelligence techniques using (i) a tokenized combination of a visual representation of the training set of text data and the training set of text data, and (ii) a tokenized combination of the image data and the training set of text data. 11 . The computer program product of claim 9 , wherein automatically training the first set of one or more artificial intelligence techniques comprises using visual representation loss-related techniques. 12 . The computer program product of claim 10 , wherein automatically training the second set of one or more artificial intelligence techniques comprises using translation loss- related techniques and consistency loss-related techniques. 13 . The computer program product of claim 8 , wherein generating the output comprises mapping, using the second set of one or more artificial intelligence techniques, one or more portions of the tokenized form of at least a portion of input text to one or more portions of the tokenized form of at least a portion of the at least one image representation. 14 . A system comprising: a memory configured to store program instructions; and a processor operatively coupled to the memory to execute the program instructions to: generate a tokenized form of at least a portion of input text, wherein the input text is in a first language;

Assignees

Inventors

Classifications

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/58Primary

    Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • Learning methods · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499332B2 cover?
Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificia…
Who is the assignee on this patent?
IBM, Massachusetts Inst Technology
What technology area does this patent fall under?
Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).