Question answering for data visualizations
US-2019197154-A1 · Jun 27, 2019 · US
US12468896B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12468896-B2 |
| Application number | US-202218077693-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 8, 2022 |
| Priority date | Dec 8, 2022 |
| Publication date | Nov 11, 2025 |
| Grant date | Nov 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method performed by at least one processor includes receiving a first input stream of a task and a second input stream of a solution. The method further includes selecting the first input stream or the second input stream. The method further includes providing the selected input stream to an image conversion model and a language model. The method further includes creating, based on the selected input stream, a model ensemble of the conversion model and the language model. The method further includes outputting a prediction based on the model ensemble. The method may further include generating an image corresponding to text, converting a textual task into a multimodal task, and solving the multimodal task.
Opening claim text (preview).
What is claimed is: 1 . A method performed by at least one processor for processing language, the method comprising: receiving a first input stream of a task; receiving a second input stream of a solution; selecting the first input stream or the second input stream; providing the selected input stream to an image conversion model and a language model; creating, based on the selected input stream, a model ensemble from outputs of the image conversion model and from outputs of the language model; scoring a first plurality of candidate solutions obtained from the second input stream via the language model; selecting, from the first plurality of candidate solutions, a second plurality of candidate solutions with scores exceeding a threshold; and outputting a prediction based on the model ensemble and the second plurality of candidate solutions. 2 . The method of claim 1 , wherein the language model uses a prompt based approach, and wherein the language model is a Generative Pre-Trained Transformer (GPT) model. 3 . The method of claim 1 , wherein the task is at least one of word sense disambiguation, science question answering, or text classification, wherein the prediction comprises at least one possible word sense of a target word based on the task being the word sense disambiguation; the prediction comprises an answer of a question based on the task being the science question answering, and the prediction comprises a category of text based on the task being the text classification. 4 . The method of claim 1 , wherein the language model uses a Bidirectional Encoder Representations from Transformers (BERT). 5 . The method of claim 4 , wherein the language model uses a natural language inference approach. 6 . The method of claim 4 , wherein the language model uses a latent embedding approach. 7 . The method of claim 1 , wherein the image conversion model uses a combined approach of recall and synthesis. 8 . The method of claim 7 , wherein the synthesis includes a text to image generation model. 9 . The method of claim 7 , wherein the synthesis includes a generative adversarial network. 10 . The method of claim 1 , wherein the model ensemble weights constituent models of the image conversion model and the language model based on a relative size of each constituent model. 11 . An apparatus comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, the program code comprising: receiving code configured to cause the at least one processor to receive a first input stream of a task and a second input stream of a solution; selecting code configured to cause the at least one processor to select the first input stream or the second input stream; providing code configured to cause the at least one processor to provide the selected input stream to an image conversion model and a language model; ensembling code configured to cause the at least one processor to create, based on the selected input stream, a model ensemble from outputs of the image conversion model and from outputs of the language model; scoring code configured to cause the at least one processor to score a first plurality of candidate solutions obtained from the second input stream via the language model; selecting code configured to cause the at least one processor to select, from the first plurality of candidate solutions, a second plurality of candidate solutions with scores exceeding a threshold; and outputting code configured to cause the at least one processor to output a prediction based on the model ensemble and the second plurality of candidate solutions. 12 . The apparatus of claim 11 , wherein the language model uses a prompt based approach, and wherein the language model is a Generative Pre-Trained Transformer (GPT) model. 13 . The apparatus of claim 11 , wherein the task is at least one of word sense disambiguation, science question answering, or text classification, wherein the prediction comprises at least one possible word sense of a target word based on the task being the word sense disambiguation; the prediction comprises an answer of a question based on the task being the science question answering, and the prediction comprises a category of text based on the task being the text classification. 14 . The apparatus of claim 11 , wherein the language model uses a Bidirectional Encoder Representations from Transformers (BERT). 15 . The apparatus of claim 14 , wherein the language model uses a natural language inference approach or a latent embedding approach. 16 . The apparatus of claim 11 , wherein the image conversion model uses a combined approach of recall and synthesis. 17 . The apparatus of claim 16 , wherein the synthesis includes a text to image generation model. 18 . The apparatus of claim 16 , wherein the synthesis includes a generative adversarial network. 19 . The apparatus of claim 11 , wherein the model ensemble weights constituent models of the image conversion model and the language model based on a relative size of each constituent model. 20 . A non-transitory computer readable medium having instructions stored therein, which when executed by a processor cause the processor to execute a method comprising: receiving a first input stream of a task; receiving a second input stream of a solution; selecting the first input stream or the second input stream; providing the selected input stream to an image conversion model and a language model; creating, based on the selected input stream, a model ensemble from outputs of the image conversion model and from outputs of the language model; scoring a first plurality of candidate solutions obtained from the second input stream via the language model; selecting, from the first plurality of candidate solutions, a second plurality of candidate solutions with scores exceeding a threshold; and outputting a prediction based on the model ensemble and the second plurality of candidate solutions.
Combinations of networks · CPC title
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Semantic analysis · CPC title
Natural language generation · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.