Method and system for detection of misinformation
US-2022382795-A1 · Dec 1, 2022 · US
US11880655B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11880655-B2 |
| Application number | US-202217724349-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 19, 2022 |
| Priority date | Apr 19, 2022 |
| Publication date | Jan 23, 2024 |
| Grant date | Jan 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are disclosed for performing fact correction of natural language sentences using data tables. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input sentence, tokenizing elements of the input sentence, and identifying, by a first machine learning model, a data table associated with the input sentence. The systems and methods further comprise a second machine learning model identifying a tokenized element of the input sentence that renders the input sentence false based on the data table and masking the tokenized element of the tokenized input sentence that renders the input sentence false. The systems and method further includes a third machine learning model predicting a new value for the masked tokenized element based on the input sentence with the masked tokenized element and the identified data table and providing an output including a modified input sentence with the new value.
Opening claim text (preview).
We claim: 1. A computer-implemented method comprising: receiving an input sentence; tokenizing elements of the input sentence; generating, by a sentence transformer, a sentence embedding representing the input sentence; identifying, by comparator, a data table correlated to content of the input sentence by comparing the sentence embedding representing the input sentence with stored table view embeddings for a plurality of data tables, each stored table view embedding including a plurality of row embeddings generated from text representations of each row of the plurality of data tables; identifying, by a first machine learning model, a tokenized element of the input sentence that renders the input sentence false based on the identified data table; masking the tokenized element of the input sentence that renders the input sentence false; predicting, by a second machine learning model, a new value for the masked tokenized element that renders the input sentence false based on the input sentence with the masked tokenized element and the identified data table; and providing an output including a modified input sentence with the new value. 2. The computer-implemented method of claim 1 , wherein identifying the data table correlated to the content of the input sentence comprises: retrieving the stored table view embeddings for the plurality of data tables; computing a similarity value between the sentence embedding and each of the stored table view embeddings; and identifying the data table from the plurality of data tables having a table view embedding with a highest computed similarity value to the sentence embedding. 3. The computer-implemented method of claim 2 , further comprising: generating the stored table view embeddings for each data table of the plurality of data tables by: generating linearized representations of each row of a data table, and generating a plurality of row embeddings using the linearized representations of each row of the data table, wherein each row embedding of the plurality of row embeddings is associated with a single row of the data table. 4. The computer-implemented method of claim 2 , wherein masking the tokenized element of the input sentence that renders the input sentence false comprises: determining, by the second machine learning model, an error probability for each of the tokenized elements of the input sentence based on the table view embeddings for the identified data table; identifying the tokenized element having an error probability above a threshold value as being the tokenized element that renders the input sentence false; and masking the identified tokenized element. 5. The computer-implemented method of claim 4 , wherein predicting the new value for the tokenized element that renders the input sentence false based on the input sentence with the masked tokenized element and the identified data table comprises: receiving, by the second machine learning model, the input sentence with the masked tokenized element and the table view embeddings for the identified data table; evaluating the masked tokenized elements and the table view embeddings to predict the new value for the masked tokenized element that renders the input sentence true; and generating the modified input sentence using the new value in place of the masked tokenized element. 6. The computer-implemented method of claim 1 , wherein the second machine learning model is trained by: receiving, by the second machine learning model, a training input, the training input including a training sentence, a tokenized training sentence, and training data tables; determining an error probability for each tokenized element of the training sentence based on table view embeddings for the training data tables; identifying a tokenized element having an error probability above a threshold value as being the tokenized element that renders the training sentence false; and training the second machine learning model using the identified tokenized element and a ground truth tokenized element for a ground truth masked sentence. 7. The computer-implemented method of claim 1 , wherein the second machine learning model is trained by: receiving, by the second machine learning model, a training input, the training input including a masked training sentence and training data tables; evaluating tokenized elements of the masked training sentence and table view embeddings for the training data tables to predict a new training value for the tokenized elements of the masked training sentence that renders the masked training sentence true; and training the second machine learning model using the new training value for the masked training sentence and a ground truth value for a ground truth correct sentence. 8. The computer-implemented method of claim 1 , further comprising: receiving a second input sentence; tokenizing second elements of the second input sentence; generating, by the sentence transformer, a second sentence embedding representing the second input sentence; identifying, by the comparator, a second data table correlated to content of the second input sentence by comparing the second sentence embedding representing the second input sentence with the stored table view embeddings for the plurality of data tables; determining, by the first machine learning model, that no tokenized second elements of the second input sentence render the second input sentence false based on the second data table; and providing a second output including the second input sentence. 9. A non-transitory computer-readable storage medium including instructions stored thereon which, when executed by at least one processor, cause the at least one processor to: receive an input sentence; tokenize elements of the input sentence; generate, by a sentence transformer, a sentence embedding representing the input sentence; identify, by a comparator, a data table correlated to content of the input sentence by comparing the sentence embedding representing the input sentence with stored table view embeddings for a plurality of data tables, each stored table view embedding including a plurality of row embeddings generated from text representations of each row of the plurality of data tables; identify, by a first machine learning model, a tokenized element of the input sentence that renders the input sentence false based on the identified data table; mask the tokenized element of the input sentence that renders the input sentence false; predict, by a second machine learning model, a new value for the masked tokenized element that renders the input sentence false based on the input sentence with the masked tokenized element and the identified data table; and provide an output including a modified input sentence with the new value. 10. The non-transitory computer-readable storage medium of claim 9 , wherein to identify the data table associated with the input sentence, the instructions, when executed, further cause the at least one processor to: retrieve the stored table view embeddings for the plurality of data tables; compute a similarity value between the sentence embedding and each of the stored table view embeddings; and identify the data table from the plurality of data tables having a table view embedding with a highest computed similarity value to the sentence embedding. 11. The non-transitory computer-readable storage medium of claim 10 , wherein the instructions, when executed, further cause the at least one processor to: generate the stored table view embeddings for each data table of the plurality of data tables by: generating linearized representations of each row of a data table, and
Lexical analysis, e.g. tokenisation or collocates · CPC title
of sub-queries or views · CPC title
Validation · CPC title
Ensemble learning · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.