What technology area does this patent fall under?

Primary CPC classification G06F16/355. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Feb 13 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Verifying textual claims with a document corpus

US2020050621A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2020050621-A1
Application number	US-201916522727-A
Country	US
Kind code	A1
Filing date	Jul 26, 2019
Priority date	Aug 9, 2018
Publication date	Feb 13, 2020
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system verifies textual claims using a document corpus. The system includes a memory for storing program code and a processor device for running the code to retrieve documents from the corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims. The processor extracts named entities and capitalized phrases from the textual claims. The processor retrieves documents from the corpus with titles matching any of the extracted named entities and capitalized phrases. The processor extracts premise sentences from the retrieved documents. The processor classifies the premise sentences together with sources of the premises sentences against the textual claims to obtain classifications from among possible classifications including a supported, an unverified, or a contradicted classification. The processor aggregates the classifications over the premise sentences to selectively output, for each textual claim, an overall decision of the supported classification, the unverified classification, or the contradicted classification.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for verifying textual claims using a document corpus, comprising: a memory for storing program code; and a processor device for running the program code to retrieve documents from the document corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims; extract named entities and capitalized phrases from the textual claims; retrieve documents from the document corpus with titles matching any of the extracted named entities and capitalized phrases; extract premise sentences from the retrieved documents; classify the premise sentences together with sources of the premises sentences against the textual claims to obtain classifications from among possible classifications including a supported classification, an unverified classification, or a contradicted classification; and aggregate the classifications over the premise sentences to selectively output, for each of the textual claims, an overall decision of the supported classification, the unverified classification, or the contradicted classification. 2 . The system of claim 1 , wherein the processor device further outputs supporting statements for the overall decision from the document corpus. 3 . The system of claim 1 , wherein the processor adds a title of a source document to each of the premise sentences. 4 . The system of claim 1 , wherein the title of the source document is added before a corresponding one of the premise statements. 5 . The system of claim 1 , wherein the system is comprised in a news summarization system. 6 . The system of claim 1 , wherein the processor retrieves the documents from the document corpus using a threshold on an overall number of the documents retrieved from the document corpus to limit a number of processed document from the document corpus for each of multiple retrievals. 7 . The system of claim 1 , wherein each of the premise sentences is individually classified together with a corresponding one of the sources against the textual claims to obtain the classifications for each of the textual claims. 8 . The system of claim 1 , wherein concatenated sets of the premise statements are classified together with corresponding ones of the sources against the textual claims to obtain the classifications for each of the textual claims. 9 . The system of claim 1 , wherein the processor device biases the overall decision by resolving conflicts between supporting and refuting information in favor of the supporting information. 10 . A computer-implemented method for verifying textual claims using a document corpus, comprising: retrieving, by a processor device, documents from the document corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims; extracting, by the processor device, named entities and capitalized phrases from the textual claims; retrieving, by the processor device, documents from the document corpus with titles matching any of the extracted named entities and capitalized phrases; extracting, by the processor device, premise sentences from the retrieved documents; classifying, by the processor device, the premise sentences together with sources of the premises sentences against the textual claims to obtain classifications from among possible classifications including a supported classification, an unverified classification, or a contradicted classification; and aggregating, by the processor device, the classifications over the premise sentences to selectively output, for each of the textual claims, an overall decision of the supported classification, the unverified classification, or the contradicted classification. 11 . The computer-implemented method of claim 10 , wherein the processor device further outputs supporting statements for the overall decision from the document corpus. 12 . The computer-implemented method of claim 10 , wherein the processor adds a title of a source document to each of the premise sentences. 13 . The computer-implemented method of claim 10 , wherein the title of the source document is added before a corresponding one of the premise statements. 15 . The computer-implemented method of claim 10 , wherein the system is comprised in a news summarization system. 16 . The computer-implemented method of claim 10 , wherein each of the retrieving steps involve a respective threshold on an overall number of the documents retrieved from the document corpus to limit a number of processed document from the document corpus. 17 . The computer-implemented method of claim 10 , wherein each of the premise sentences is individually classified together with a corresponding one of the sources against the textual claims to obtain the classifications for each of the textual claims. 18 . The computer-implemented method of claim 10 , wherein concatenated sets of the premise statements are classified together with corresponding ones of the sources against the textual claims to obtain the classifications for each of the textual claims. 19 . The computer-implemented method of claim 10 , further comprising biasing the overall decision by resolving conflicts between supporting and refuting information in favor of the supporting information. 20 . A computer program product for verifying textual claims using a document corpus, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: retrieving, by a processor device, documents from the document corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims; extracting, by the processor device, named entities and capitalized phrases from the textual claims; retrieving, by the processor device, documents from the document corpus with titles matching any of the extracted named entities and capitalized phrases; extracting, by the processor device, premise sentences from the retrieved documents; classifying, by the processor device, the premise sentences together with sources of the premises sentences against the textual claims to obtain classifications from among possible classifications including a supported classification, an unverified classification, or a contradicted classification; and aggregating, by the processor device, the classifications over the premise sentences to selectively output, for each of the textual claims, an overall decision of the supported classification, the unverified classification, or the contradicted classification.

Assignees

Nec Lab America Inc

Inventors

Malon Christopher

Classifications

G06F16/345
Summarisation for human users · CPC title
G06F16/355Primary
Creation or modification of classes or clusters · CPC title
G06F16/93
Document management systems · CPC title

Patent family

Related publications grouped by family.

View patent family 69405968

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020050621A1 cover?: A system verifies textual claims using a document corpus. The system includes a memory for storing program code and a processor device for running the code to retrieve documents from the corpus based on Term Frequency Inverse Document Frequency (TFIDF) similarity to a set of textual claims. The processor extracts named entities and capitalized phrases from the textual claims. The processor retr…
Who is the assignee on this patent?: Nec Lab America Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/355. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Feb 13 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).