Who is the assignee on this patent?

IBM, Massachusetts Inst Technology

What technology area does this patent fall under?

Primary CPC classification G06F40/58. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Translating text using generated visual representations and artificial intelligence

US12499332B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12499332-B2
Application number	US-202217954845-A
Country	US
Kind code	B2
Filing date	Sep 28, 2022
Priority date	Sep 28, 2022
Publication date	Dec 16, 2025
Grant date	Dec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificial intelligence techniques; generating a tokenized form of at least a portion of the at least one visual representation; and generating an output including a translated version of the input text into at least a second language by processing, using a second set of artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one visual representation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: generating a tokenized form of at least a portion of input text, wherein the input text is in a first language; generating, as output from utilizing a first set of one or more artificial intelligence techniques, at least one image representation of at least a portion of the input text by mapping portions of stored image data, the portions of the stored image data selected in connection with processing the input text using the first set of one or more artificial intelligence techniques, into portions of the at least one image representation of the at least a portion of the input text, wherein the first set of one or more artificial intelligence techniques comprises at least one neural network-based autoregressive transformer trained on the stored image data; generating a tokenized form of at least a portion of the at least one image representation; generating an output comprising a translated version of the input text into at least a second language by processing, using a second set of one or more artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one image representation, wherein the second set of one or more artificial intelligence techniques comprises at least one neural network-based multimodal translation transformer; automatically training, using feedback related to the generated output, at least one of the at least one neural network-based autoregressive transformer and the at least one neural network- based multimodal translation transformer; and automatically executing, subsequent to the automatic training, one or more machine translation operations using the at least one of the at least one neural network-based autoregressive transformer and the at least one neural network-based multimodal translation transformer; wherein the method is carried out by at least one computing device. 2 . The computer-implemented method of claim 1 , further comprising: automatically training the first set of one or more artificial intelligence techniques by processing a training set of text data and processing image data corresponding to the training set of text data. 3 . The computer-implemented method of claim 2 , wherein automatically training the first set of one or more artificial intelligence techniques comprises using visual representation loss-related techniques. 4 . The computer-implemented method of claim 2 , further comprising: automatically training the second set of one or more artificial intelligence techniques using (i) a tokenized combination of a visual representation of the training set of text data and the training set of text data, and (ii) a tokenized combination of the image data and the training set of text data. 5 . The computer-implemented method of claim 4 , wherein automatically training the second set of one or more artificial intelligence techniques comprises using translation loss-related techniques and consistency loss-related techniques. 6 . The computer-implemented method of claim 1 , wherein generating the output comprises mapping, using the second set of one or more artificial intelligence techniques, one or more portions of the tokenized form of at least a portion of input text to one or more portions of the tokenized form of at least a portion of the at least one image representation. 7 . The computer-implemented method of claim 1 , wherein software implementing the method is provided as a service in a cloud environment. 8 . A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to: generate a tokenized form of at least a portion of input text, wherein the input text is in a first language; generate, as output from utilizing a first set of one or more artificial intelligence techniques, at least one image representation of at least a portion of the input text by mapping portions of stored image data, the portions of the stored image data selected in connection with processing the input text using the first set of one or more artificial intelligence techniques, into portions of the at least one image representation of the at least a portion of the input text, wherein the first set of one or more artificial intelligence techniques comprises at least one neural network-based autoregressive transformer trained on the stored image data; generate a tokenized form of at least a portion of the at least one image representation; generate an output comprising a translated version of the input text into at least a second language by processing, using a second set of one or more artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one image representation, wherein the second set of one or more artificial intelligence techniques comprises at least one neural network-based multimodal translation transformer; automatically train, using feedback related to the generated output, at least one of the at least one neural network-based autoregressive transformer and the at least one neural network- based multimodal translation transformer; and automatically execute, subsequent to the automatic training, one or more machine translation operations using the at least one of the at least one neural network-based autoregressive transformer and the at least one neural network-based multimodal translation transformer. 9 . The computer program product of claim 8 , wherein the program instructions is further executable by the computing device to cause the computing device to: automatically train the first set of one or more artificial intelligence techniques by processing a training set of text data and processing image data corresponding to the training set of text data. 10 . The computer program product of claim 9 , wherein the program instructions is further executable by the computing device to cause the computing device to: automatically train the second set of one or more artificial intelligence techniques using (i) a tokenized combination of a visual representation of the training set of text data and the training set of text data, and (ii) a tokenized combination of the image data and the training set of text data. 11 . The computer program product of claim 9 , wherein automatically training the first set of one or more artificial intelligence techniques comprises using visual representation loss-related techniques. 12 . The computer program product of claim 10 , wherein automatically training the second set of one or more artificial intelligence techniques comprises using translation loss- related techniques and consistency loss-related techniques. 13 . The computer program product of claim 8 , wherein generating the output comprises mapping, using the second set of one or more artificial intelligence techniques, one or more portions of the tokenized form of at least a portion of input text to one or more portions of the tokenized form of at least a portion of the at least one image representation. 14 . A system comprising: a memory configured to store program instructions; and a processor operatively coupled to the memory to execute the program instructions to: generate a tokenized form of at least a portion of input text, wherein the input text is in a first language;

Assignees

Inventors

Classifications

G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F40/58Primary
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G06N3/08
Learning methods · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045Primary
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 90626463

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499332B2 cover?: Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificia…
Who is the assignee on this patent?: IBM, Massachusetts Inst Technology
What technology area does this patent fall under?: Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Photorealistic Text Inpainting for Augmented Reality Using Generative Models

Method and server for training a neural network to generate a textual output sequence

Depicting Humans in Text-Defined Outfits

Mobile supplementation, extraction, and analysis of health records

Panel translation service

Intelligent online personal assistant with image text localization

Frequently asked questions