Automatic detection of claims with respect to a topic

US10013470B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10013470-B2
Application numberUS-201514697658-A
CountryUS
Kind codeB2
Filing dateApr 28, 2015
Priority dateJun 19, 2014
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method comprising using at least one hardware processor for: receiving a topic under consideration (TUC) and content relevant to the TUC; detecting one or more claims relevant to the TUC in the content, based on detection of boundaries of the claims in the content; and outputting a list of said detected one or more claims.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized text analytics method comprising using at least one hardware processor for: training a context-dependent claim (CDC) machine learning classifier and a boundaries machine learning classifier, based on a training set that is obtained by: (a) providing, to multiple human labelers: a topic under consideration (TUC) that is embodied as a digital text sentence, and content relevant to the TUC, wherein the content is embodied as multiple digital text paragraphs, and (b) receiving, from the multiple human labelers, portions of the content labeled as context-dependent claims (CDCs), wherein each of the CDCs is a concise statement that directly supports or contests the TUC, wherein the concise statement is embodied as a digital text sentence; receiving a test TUC embodied as a digital text sentence; receiving test content relevant to the test TUC, wherein the test content is embodied as multiple digital text paragraphs each comprising multiple sentences; identifying sentences in the test content that include claims, by detecting features selected from the group consisting of: predetermined phrases that commonly characterize claims, and predetermined phrases that are commonly found in titles of sections that contain claims; applying the CDC machine learning classifier to the identified sentences and to the test TUC, to associate, with each of the identified sentences, a score that is indicative of a probability that the respective sentence is a claim relevant to the test TUC, wherein the claim relevant to the test TUC is a concise statement that directly supports or contests the test TUC; detecting one or more claims relevant to the test TUC in one or more of the identified sentences, by: applying the boundaries machine learning classifier to some of the identified sentences that have a score higher than other identified sentences, applying, to said some of the identified sentences, a boundaries coarse filter that is based on a Maximum Likelihood (ML) probabilistic model, to produce sub-sentences, and applying a boundaries fine-grained filter to the sub-sentences; and outputting a list of said detected one or more claims relevant to the test TUC. 2. The method of claim 1 , further comprising using said at least one hardware processor for receiving a background related to the TUC. 3. The method of claim 1 , further comprising detecting features of a type selected from the group consisting of: features that characterize a claim in its general sense, features that assess the relevancy of a sentence to the TUC and features that are a mix of features that characterize a claim in its general sense and features that assess the relevancy of a sentence to the TUC. 4. The method of claim 1 , further comprising using said at least one hardware processor for detecting sections in the test content which are highly probable to comprise claims relevant to the test TUC, wherein said detection of one or more claims relevant to the test TUC is performed in the detected sections based on detection of boundaries of the claims in the detected sections. 5. The method of claim 4 , wherein said identifying of the sentences is performed in said detected sections. 6. The method of claim 1 , further comprising using said at least one hardware processor for phrasing the detected one or more claims. 7. The method of claim 1 , further comprising using said at least one hardware processor for classifying the detected one or more claims with respect to the TUC, wherein said classifying comprises characterizing said one or more claims according to predefined types of claims. 8. The method of claim 1 , further comprising using said at least one hardware processor for calculating a claim score for each of said one or more detected claims and ranking each of said one or more detected claims based on its claim score. 9. The method of claim 1 , further comprising using said at least one hardware processor for applying said detecting of one or more claims recursively on previously detected claims. 10. A computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to: train a context-dependent claim (CDC) machine learning classifier and a boundaries machine learning classifier, based on a training set that is obtained by: (a) providing, to multiple human labelers: a topic under consideration (TUC) that is embodied as a digital text sentence, and content relevant to the TUC, wherein the content is embodied as multiple digital text paragraphs, and (b) receiving, from the multiple human labelers, portions of the content labeled as context-dependent claims (CDCs), wherein each of the CDCs is a concise statement that directly supports or contests the TUC, wherein the concise statement is embodied as a digital text sentence; receive a test TUC embodied as a digital text sentence; receive content relevant to the test TUC, wherein the test content is embodied as multiple digital text paragraphs each comprising multiple sentences; identify sentences in the test content that include claims, by detecting features selected from the group consisting of: predetermined phrases that commonly characterize claims, and predetermined phrases that are commonly found in titles of sections that contain claims; apply the CDC machine learning classifier to the identified sentences and to the test TUC, to associate, with each of the identified sentences, a score that is indicative of a probability that the respective sentence is a claim relevant to the test TUC, wherein the claim relevant to the test TUC is a concise statement that directly supports or contests the test TUC; detect one or more claims relevant to the test TUC in one or more of the identified sentences, by: applying the boundaries machine learning classifier to some of the identified sentences that have a score higher than other identified sentences, applying, to said some of the identified sentences, a boundaries coarse filter that is based on a Maximum Likelihood (ML) probabilistic model, to produce sub-sentences, and applying a boundaries fine grained filter to the sub-sentences; and output a list of said detected one or more claims relevant to the test TUC. 11. The computer program product of claim 10 , wherein said program code is further executable by the at least one hardware processor to detect features of a type selected from the group consisting of: features that characterize a claim in its general sense, features that assess the relevancy of a sentence to the TUC and features that are a mix of features that characterize a claim in its general sense and features that assess the relevancy of a sentence to the TUC. 12. The computer program product of claim 10 , wherein said program code is further executable by said at least one hardware processor to detect sections in the test content which are highly probable to comprise claims relevant to the TUC, wherein said detection of one or more claims relevant to the test TUC is performed in the detected sections based on detection of boundaries of the claims in the detected sections. 13. The computer program product of claim 12 , wherein said identifying of the sentences is performed in said detected sections. 14. The computer program product of claim 10 , wherein said program code is further executable by said at least one hardware processor to calculate a claim score for each of said one or more detected claims and rank each of said one or more detected claims based on its claim score. 15.

Assignees

Inventors

Classifications

  • Selection or weighting of terms for indexing · CPC title

  • Clustering or classification · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013470B2 cover?
A method comprising using at least one hardware processor for: receiving a topic under consideration (TUC) and content relevant to the TUC; detecting one or more claims relevant to the TUC in the content, based on detection of boundaries of the claims in the content; and outputting a list of said detected one or more claims.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).