Machine learning systems and methods for many-hop fact extraction and claim verification

US12406150B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12406150-B2
Application numberUS-202117534899-A
CountryUS
Kind codeB2
Filing dateNov 24, 2021
Priority dateNov 25, 2020
Publication dateSep 2, 2025
Grant dateSep 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Machine learning (ML) systems and methods for fact extraction and claim verification are provided. The system receives a claim and retrieves a document from a dataset. The document has a first relatedness score higher than a first threshold, which indicates that ML models of the system determine that the document is most likely to be relevant to the claim. The dataset includes supporting documents and claims including a first group of claims supported by facts from more than two supporting documents and a second group of claims not supported by the supporting documents. The system selects a set of sentences from the document. The set of sentences have second relatedness scores higher than a second threshold, which indicate that the ML models determine that the set of sentences are most likely to be relevant to the claim. The system determines whether the claim includes facts from the set of sentences.

First claim

Opening claim text (preview).

What is desired to be protected by Letters Patent is set forth in the following claims: 1. A machine learning system for fact extraction and claim verification, comprising: a memory; and a processor in communication with the memory, the processor: receiving a claim comprising one or more sentences; retrieving, based at least in part on one or more machine learning models, a document from a dataset, the document having a first relatedness score higher than a first threshold, wherein the first relatedness score indicates that the one or more machine learning models determines that the document is most likely to be relevant to the claim, wherein the dataset comprises a plurality of supporting documents and a plurality of claims, the plurality of claims comprising a first group of claims supported by facts from more than two supporting documents from the plurality of supporting documents and a second group of claims not supported by the plurality of supporting documents; selecting, based at least in part on the one or more machine learning models, a set of sentences from the document, the set of sentences having second relatedness scores higher than a second threshold, wherein the second relatedness scores indicate that the one or more machine learning models determine that the set of sentences are most likely to be relevant to the claim; and determining, based at least in part on the one or more machine learning models, whether the claim includes one or more facts from the set of sentences. 2. The system of claim 1 , wherein the first group of claims comprise an n-hop claim created at least by a valid (n-1)-hop claim supported by one or more facts from (n-1) supporting documents of the plurality of supporting documents, wherein n is an integer number equal to or greater than 2, wherein one or more entities of the valid (n-1)-hop claim are substituted by information from an additional supporting document of the plurality of supporting documents, the information describing the one or more entities. 3. The system of claim 2 , wherein the additional supporting document comprises a hyperlink of the one or more entities in a text body of the additional supporting document, and a title of the additional supporting document is mentioned in a text body of a supporting document of the valid (n-1)-hop claim. 4. The system of claim 3 , wherein the one or more entities comprise a title of the supporting document, or the one more entities are part of a text body of a supporting document of the valid (n-1)-hop claim. 5. The system of claim 1 , wherein the second group of claims comprise claims having information that is not in the first group of claims, or claims having less information than the first group of claims. 6. The system of claim 1 , wherein the processor automatically substitutes one or more words of at least one claim of the first group of claims with one or more new words predicted by a machine learning model to form at least one claim of the second group of claims. 7. The system of claim 1 , wherein the processor automatically substitutes one or more entities of at least one claim of the first group of claims with one or more new entities that are not titles of any supporting documents of the at least one claim to form at least one claim of the second group of claims. 8. The system of claim 1 , wherein at least one claim of the second group of claims is created by removing or adding one or more negation words, or substituting a phrase with its antonyms in at least one claim of the first group of claims. 9. The system of claim 1 , wherein the first group of claims are labeled as supported claims, and the second group of claims are labeled as not-supported claims. 10. The system of claim 1 , wherein the one or more machine learning models comprise one or more pre-trained language representations models and one or more natural language inference models. 11. The system of claim 1 , wherein the processor retrieves, based at least in part on the one or more machine learning models, a plurality of documents from the plurality of supporting documents in response to a query associated with the claim, wherein the document is retrieved from the plurality of documents. 12. The system of claim 1 , wherein the processor determines an accuracy of the one or more machine learning models by comparing the determinations of the one or more machine learning models with ground truth provided by the dataset. 13. The system of claim 1 , wherein the dataset provides reasoning graphs of diverse shapes showing relationships between the first group of claims and the plurality of supporting documents. 14. A machine learning method for fact extraction and claim verification, comprising: receiving a claim comprising one or more sentences; retrieving, based at least in part on one or more machine learning models, a document from a dataset, the document having a first relatedness score higher than a first threshold, wherein the first relatedness score indicates that the one or more machine learning models determines that the document is most likely to be relevant to the claim, wherein the dataset comprises a plurality of supporting documents and a plurality of claims, the plurality of claims comprising a first group of claims supported by facts from more than two supporting documents from the plurality of supporting documents and a second group of claims not supported by the plurality of supporting documents; selecting, based at least in part on the one or more machine learning models, a set of sentences from the document, the set of sentences having second relatedness scores higher than a second threshold, wherein the second relatedness scores indicate that the one or more machine learning models determine that the set of sentences are most likely to be relevant to the claim; and determining, based at least in part on the one or more machine learning models, whether the claim includes one or more facts from the set of sentences. 15. The method of claim 14 , wherein the first group of claims comprise an n-hop claim created at least by a valid (n-1)-hop claim supported by one or more facts from (n-1) supporting documents of the plurality of supporting documents, wherein n is an integer number equal to or greater than 2, wherein one or more entities of the valid (n-1)-hop claim are substituted by information from an additional supporting document of the plurality of supporting documents, the information describing the one or more entities. 16. The method of claim 15 , wherein the additional supporting document comprises a hyperlink of the one or more entities in a text body of the additional supporting document, and a title of the additional supporting document is mentioned in a text body of a supporting document of the valid (n-1)-hop claim. 17. The method of claim 16 , wherein the one or more entities comprise a title of the supporting document, or the one more entities are part of a text body of a supporting document of the valid (n-1)-hop claim. 18. The method of claim 14 , wherein the second group of claims comprise claims having information that is not in the first group of claims, or claims having less information than the first group of claims. 19. The method of claim 14 , further comprising automatically substituting one or more words of at least one claim of the first group of claims with one or more new words predicted by a machine learning model to form at least one claim of the second group of claims. 20. The method of claim 14 , further compris

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12406150B2 cover?
Machine learning (ML) systems and methods for fact extraction and claim verification are provided. The system receives a claim and retrieves a document from a dataset. The document has a first relatedness score higher than a first threshold, which indicates that ML models of the system determine that the document is most likely to be relevant to the claim. The dataset includes supporting docume…
Who is the assignee on this patent?
Insurance Services Office Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).