Semantic merge of arguments

US10614100B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10614100-B2
Application numberUS-201514698854-A
CountryUS
Kind codeB2
Filing dateApr 29, 2015
Priority dateJun 19, 2014
Publication dateApr 7, 2020
Grant dateApr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method comprising using at least one hardware processor for: receiving a topic under consideration (TUC) and a set of claims referring to the TUC; identifying semantic similarity relations between claims of the set of claims; clustering the claims into a plurality of claim clusters based on the identified semantic similarity relations, wherein said claim clusters represent semantically different claims of the set of claims; and generating a list of non-redundant claims comprising said semantically different claims.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized text analytics method comprising using at least one hardware processor for: receiving a topic under consideration (TUC) and a set of arguments, each including a respective claim referring to the TUC and evidence supporting the claim, wherein each of the claims is a textual statement that directly supports or contests the TUC; calculating for each pair of claims in the set of claims, values of a plurality of features selected from a group consisting of: a geometric mean of an n-gram overlap between the pair of claims, a geometric mean of an n-gram overlap between the pair of claims, in which synonyms are considered as equivalent, an arithmetic mean of the n-gram overlap between the pair of claims, a number of edits needed in order to move from one of the claims of the pair to the other claim of the pair, where differences in the word order are discounted, an overlap between the claims of the pair, when each claim is considered as a bag of words, a part of speech word error rate, a distance between parse trees of the claims of the pair, and a WordNet based similarity measure of the claims of the pair; applying a trained binary classifier to the calculated values of the plurality of features for each pair of claims, so as to provide for each pair of claims of the set of claims, a probability that the claims in the pair convey a same idea; clustering the claims into a plurality of claim clusters based on the probabilities provided by the binary classifier for the pairs of claims in the set of claims, wherein said claim clusters represent semantically-different claims of the set of claims; clustering the arguments into a plurality of argument clusters corresponding to the claim clusters, wherein said argument clusters represent semantically different arguments of the set of arguments; clustering the evidence included in the arguments of each of the plurality of argument clusters, into evidence sub-clusters; generating a list of non-redundant claims comprising a representative claim for each of the claim clusters; generating a non-redundant list of evidence, wherein each piece of evidence in the list of evidence represents an evidence sub-cluster; and outputting the generated lists to a human user. 2. The method of claim 1 , further comprising generating a list of non-redundant arguments comprising said semantically different arguments. 3. The method of claim 2 , wherein said clustering of the arguments comprises: constructing a control flow graph comprising nodes and edges, wherein the nodes represent said set of arguments and the edges are weights representing the probability that said arguments are equivalent, sparsifying said control flow graph by deleting edges of said edges having weights below a predefined threshold, and applying a connected components algorithm to said control flow graph to receive said clusters of said arguments. 4. The method of claim 1 , wherein said binary classifier is a logistic regression classifier. 5. The method of claim 1 , wherein said clustering of the claims comprises: constructing a control flow graph comprising nodes and edges, wherein the nodes represent said set of claims and the edges are weights representing the probability that said claims are equivalent, sparsifying said control flow graph by deleting edges of said edges having weights below a predefined threshold, and applying a connected components algorithm to said control flow graph to receive said clusters of said claims. 6. The method of claim 1 , wherein the binary classifier is trained by: receiving a large dataset of claims; receiving indications provided by human annotators of similarity between each pair of the claims, in the received dataset of claims; calculating for each of the pair of the claims, values of the plurality of features; and training the binary classifier based on the received dataset of claims, the received similarity indications and the calculated values of the plurality of features. 7. The method of claim 1 , wherein the probability provided by the binary classifier comprises a binary similarity score. 8. The method of claim 1 , wherein generating the list of non-redundant claims comprises selecting for each cluster a claim which is a centroid of the cluster. 9. The method of claim 1 , wherein generating the list of non-redundant claims comprises selecting for each cluster a claim based on quality scores of the claims in the cluster. 10. The method of claim 1 , wherein clustering the evidence included in the arguments of each of the plurality of argument clusters, into evidence sub-clusters comprises clustering based on semantic similarity relations. 11. The method of claim 1 , wherein clustering the evidence included in the arguments of each of the plurality of argument clusters, into evidence sub-clusters comprises clustering based on the evidence source or the evidence date. 12. A computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to: receive a topic under consideration (TUC) and a set of arguments, each including a respective claim referring to the TUC and evidence supporting the claim, wherein each of the claims is a textual statement that directly supports or contests the TUC; calculate for each pair of claims in the set of claims, values of a plurality of features selected from a group consisting of: a geometric mean of an n-gram overlap between the pair of claims, a geometric mean of an n-gram overlap between the pair of claims, in which synonyms are considered as equivalent, an arithmetic mean of the n-gram overlap between the pair of claims, a number of edits needed in order to move from one of the claims of the pair to the other claim of the pair, where differences in the word order are discounted, an overlap between the claims of the pair, when each claim is considered as a bag of words, a part of speech word error rate, a distance between parse trees of the claims of the pair, and a WordNet based similarity measure of the claims of the pair; apply a trained binary classifier to the calculated values of the plurality of features for each pair of claims, so as to provide for each pair of claims of the set of claims, a probability that the claims in the pair convey a same idea; cluster the claims into a plurality of claim clusters based on the probabilities provided by the binary classifier for the pairs of claims in the set of claims, wherein said claim clusters represent semantically different claims of the set of claims; cluster the arguments into a plurality of argument clusters corresponding to the claim clusters; cluster the evidence included in the arguments of each of the plurality of argument clusters, into evidence sub-clusters; generate a list of non-redundant claims comprising a representative claim for each of the claim clusters; generate a non-redundant list of evidence, wherein each piece of evidence in the list of evidence represents an evidence sub-cluster; and output the generated lists to a human user. 13. The computer program product of claim 12 , wherein said program code is further executable by said at least one hardware processor to: generate a list of non-redundant arguments comprising said semantically different arguments. 14. The computer program product of claim 13 , wherein said clustering of the arguments comprises: constructing a control flow graph comprising nodes and edges, wherein the nodes represent said set of arguments and the edg

Assignees

Inventors

Classifications

  • G06F16/285Primary

    Clustering or classification · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Machine learning · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614100B2 cover?
A method comprising using at least one hardware processor for: receiving a topic under consideration (TUC) and a set of claims referring to the TUC; identifying semantic similarity relations between claims of the set of claims; clustering the claims into a plurality of claim clusters based on the identified semantic similarity relations, wherein said claim clusters represent semantically differ…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).