Fact correction of natural language sentences using data tables

US11880655B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11880655-B2
Application numberUS-202217724349-A
CountryUS
Kind codeB2
Filing dateApr 19, 2022
Priority dateApr 19, 2022
Publication dateJan 23, 2024
Grant dateJan 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method comprising: receiving an input sentence; tokenizing elements of the input sentence; generating, by a sentence transformer, a sentence embedding representing the input sentence; identifying, by comparator, a data table correlated to content of the input sentence by comparing the sentence embedding representing the input sentence with stored table view embeddings for a plurality of data tables, each stored table view embedding including a plurality of row embeddings generated from text representations of each row of the plurality of data tables; identifying, by a first machine learning model, a tokenized element of the input sentence that renders the input sentence false based on the identified data table; masking the tokenized element of the input sentence that renders the input sentence false; predicting, by a second machine learning model, a new value for the masked tokenized element that renders the input sentence false based on the input sentence with the masked tokenized element and the identified data table; and providing an output including a modified input sentence with the new value. 2. The computer-implemented method of claim 1 , wherein identifying the data table correlated to the content of the input sentence comprises: retrieving the stored table view embeddings for the plurality of data tables; computing a similarity value between the sentence embedding and each of the stored table view embeddings; and identifying the data table from the plurality of data tables having a table view embedding with a highest computed similarity value to the sentence embedding. 3. The computer-implemented method of claim 2 , further comprising: generating the stored table view embeddings for each data table of the plurality of data tables by: generating linearized representations of each row of a data table, and generating a plurality of row embeddings using the linearized representations of each row of the data table, wherein each row embedding of the plurality of row embeddings is associated with a single row of the data table. 4. The computer-implemented method of claim 2 , wherein masking the tokenized element of the input sentence that renders the input sentence false comprises: determining, by the second machine learning model, an error probability for each of the tokenized elements of the input sentence based on the table view embeddings for the identified data table; identifying the tokenized element having an error probability above a threshold value as being the tokenized element that renders the input sentence false; and masking the identified tokenized element. 5. The computer-implemented method of claim 4 , wherein predicting the new value for the tokenized element that renders the input sentence false based on the input sentence with the masked tokenized element and the identified data table comprises: receiving, by the second machine learning model, the input sentence with the masked tokenized element and the table view embeddings for the identified data table; evaluating the masked tokenized elements and the table view embeddings to predict the new value for the masked tokenized element that renders the input sentence true; and generating the modified input sentence using the new value in place of the masked tokenized element. 6. The computer-implemented method of claim 1 , wherein the second machine learning model is trained by: receiving, by the second machine learning model, a training input, the training input including a training sentence, a tokenized training sentence, and training data tables; determining an error probability for each tokenized element of the training sentence based on table view embeddings for the training data tables; identifying a tokenized element having an error probability above a threshold value as being the tokenized element that renders the training sentence false; and training the second machine learning model using the identified tokenized element and a ground truth tokenized element for a ground truth masked sentence. 7. The computer-implemented method of claim 1 , wherein the second machine learning model is trained by: receiving, by the second machine learning model, a training input, the training input including a masked training sentence and training data tables; evaluating tokenized elements of the masked training sentence and table view embeddings for the training data tables to predict a new training value for the tokenized elements of the masked training sentence that renders the masked training sentence true; and training the second machine learning model using the new training value for the masked training sentence and a ground truth value for a ground truth correct sentence. 8. The computer-implemented method of claim 1 , further comprising: receiving a second input sentence; tokenizing second elements of the second input sentence; generating, by the sentence transformer, a second sentence embedding representing the second input sentence; identifying, by the comparator, a second data table correlated to content of the second input sentence by comparing the second sentence embedding representing the second input sentence with the stored table view embeddings for the plurality of data tables; determining, by the first machine learning model, that no tokenized second elements of the second input sentence render the second input sentence false based on the second data table; and providing a second output including the second input sentence. 9. A non-transitory computer-readable storage medium including instructions stored thereon which, when executed by at least one processor, cause the at least one processor to: receive an input sentence; tokenize elements of the input sentence; generate, by a sentence transformer, a sentence embedding representing the input sentence; identify, by a comparator, a data table correlated to content of the input sentence by comparing the sentence embedding representing the input sentence with stored table view embeddings for a plurality of data tables, each stored table view embedding including a plurality of row embeddings generated from text representations of each row of the plurality of data tables; identify, by a first machine learning model, a tokenized element of the input sentence that renders the input sentence false based on the identified data table; mask the tokenized element of the input sentence that renders the input sentence false; predict, by a second machine learning model, a new value for the masked tokenized element that renders the input sentence false based on the input sentence with the masked tokenized element and the identified data table; and provide an output including a modified input sentence with the new value. 10. The non-transitory computer-readable storage medium of claim 9 , wherein to identify the data table associated with the input sentence, the instructions, when executed, further cause the at least one processor to: retrieve the stored table view embeddings for the plurality of data tables; compute a similarity value between the sentence embedding and each of the stored table view embeddings; and identify the data table from the plurality of data tables having a table view embedding with a highest computed similarity value to the sentence embedding. 11. The non-transitory computer-readable storage medium of claim 10 , wherein the instructions, when executed, further cause the at least one processor to: generate the stored table view embeddings for each data table of the plurality of data tables by: generating linearized representations of each row of a data table, and

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11880655B2 cover?
Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further …
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).