Unsupervised ontology-based graph extraction from texts

US10169454B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10169454-B2
Application numberUS-201615156623-A
CountryUS
Kind codeB2
Filing dateMay 17, 2016
Priority dateMay 17, 2016
Publication dateJan 1, 2019
Grant dateJan 1, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for extracting a relations graph uses an ontology graph in which nodes represent entity classes or concepts and edges represent properties of the classes. A property is associated with a constraint which defines a range of values that can be taken without incurring a cost. Input text in which entity and concept mentions are identified is received. An optimal set of alignments between a subgraph of the ontology graph and the identified mentions is identified by optimizing a function of constraint costs incurred by the alignments and a distance measure computed over the set of alignments. The relations graph is generated, based on the optimal set of alignments. The relations graph represents a linked set of relations instantiating a subgraph of the ontology. The relations graph can include relations involving implicit mentions corresponding to subgraph nodes that are not aligned to any of the concept or entity mentions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for extracting a relations graph, comprising: providing an ontology of elements in the form of a graph in which nodes represent entity classes and edges connecting the nodes represent properties of the entity classes, at least one of the properties being associated with a constraint which defines a range of values for the respective property to take without incurring a constraint cost; receiving input text in which entity mentions are identified; identifying an optimal set of alignments between a subgraph of the ontology graph and the identified entity mentions in the input text, each of the alignments in the optimal set of alignments mapping one of identified entity mentions to a node in the subgraph, the optimal set of alignments being identified by optimizing a function of the constraint costs incurred by alignments in the optimal set of alignments for which the respective property does not take a value within the range of values and a distance measure computed over the set of alignments, the optimizing of the function comprising identifying a subgraph of the ontology and a graph linking mentions in the text for which a combination of the distance measure and the constraint costs is minimal; and generating the relations graph based on the optimal set of alignments, the relations graph representing a linked set of relations instantiating the subgraph of the ontology, and where the subgraph of the ontology includes a node that is not aligned to any of the entity mentions, including an implicit mention in the relations graph corresponding to that subgraph node; at least one of the identifying of the optimal set of alignments and the generating of the relations graph is performed with a processor. 2. The method of claim 1 , wherein the method comprises identifying the entity mentions in the input text. 3. The method of claim 2 , wherein the identifying of the entity mentions in the input text comprises matching text strings in the input text to entities in a domain-specific resource, each of the entities in the domain-specific resource being associated with at least one sense, entity mentions matching entities in the domain-specific resource that are associated with more than one sense being labeled with more than one sense. 4. The method of claim 3 , wherein the identifying of the optimal set of alignments comprises, for an entity mention which matches an entity in the domain-specific resource that is associated with more than one sense, aligning the entity mention with an entity class corresponding to a single one of the senses. 5. The method of claim 1 , wherein the identifying of the optimal set of alignments comprises computing a similarity function between the entity mentions in the text and the entity classes in the ontology. 6. The method of claim 1 , wherein the optimizing of the function comprises optimizing a dual objective function which is function of a combination of: a probability of generating an aligned subgraph instance given the identified set of mentions in the input text and the ontology graph; and a probability that the constraints are valid in the subgraph instance. 7. The method of claim 1 , wherein the method further includes learning parameters of the function on a training set of text samples. 8. The method of claim 1 , further comprising outputting the relations graph or information based thereon. 9. The method of claim 1 , wherein the method further comprises for each of the identified entity mentions, computing similarity scores between labels of senses of the entity mention and labels of the ontology classes and properties in the ontology, and where none of the similarity scores for the entity mention meets a threshold, aligning the entity mention to an artificial node in the ontology. 10. The method of claim 1 wherein the relations graph comprises a set of triples of the form <S,P,O>, where S is the subject of the triple and refers to an entity, P is a property that represents an attribute or a relation, and O is the object of the triple, which is the value of the attribute or the entity to which the subject of the triple is linked. 11. A computer program product comprising a non-transitory recording medium storing instructions, which when executed on a computer, perform the method of claim 1 . 12. A method for extracting a relations graph, comprising: providing an ontology of elements in the form of a graph in which nodes represent entity classes and edges connecting the nodes represent properties of the entity classes, at least one of the properties being associated with a constraint which defines a range of values for the respective property to take without incurring a constraint cost; receiving input text in which entity mentions are identified; identifying an optimal set of alignments between a subgraph of the ontology graph and the identified entity mentions in the input text, the optimal set of alignments optimizing a function of the constraint costs incurred by the alignments in the set when the at least one property does not take a value in the range of values and a distance measure for the set of alignments, the distance measure for the set of alignments being computed by: for each of a plurality of pairs of consistent alignments in the set of alignments, computing a distance measure for the pair of consistent alignments, the distance measure being a multidimensional distance which is a combination of distance components of at least two dimensions; and combining the computed distance measures to generate a distance measure for the set of alignments; and generating the relations graph based on the optimal set of alignments, the relations graph representing a linked set of relations instantiating the subgraph of the ontology, and where the subgraph of the ontology includes a node that is not aligned to any of the entity mentions, including an implicit mention in the relations graph corresponding to that subgraph node; at least one of the identifying of the optimal set of alignments and the generating of the relations graph is performed with a processor. 13. The method of claim 12 , wherein the distance component dimensions are selected from: a linguistic dimension based on the entity mentions in the pair of alignments; an ontological dimension based on the ontology elements in the pair of alignments; a document structure dimension which includes distances between nodes in the document structure; a linguistic-ontological dimension based on one of the entity mentions in the pair of alignments and the ontology element in the same alignment; and combinations thereof. 14. The method of claim 13 , wherein the ontological dimension comprises a length of a path between the two ontological elements. 15. The method of claim 12 , wherein multidimensional distance is a combination of distance components of at least three dimensions. 16. A system for extracting a relations graph, comprising: memory which stores an ontology of elements in the form of a graph in which nodes represent entity classes and edges connecting the nodes represent properties of the entity classes, at least one of the properties being associated with a constraint which defines a range of values for the respective property to take without incurring a constraint cost; a preprocessor which identifies entity mentions in an input text; a graphing component which generates the relations graph based on the identified entity mentions and the ontology, the graphing component comprising: an alignment component which generates alignments between entity mentions

Assignees

Inventors

Classifications

  • Knowledge representation; Symbolic representation · CPC title

  • G06F16/367Primary

    Ontology · CPC title

  • using natural language analysis · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10169454B2 cover?
A method for extracting a relations graph uses an ontology graph in which nodes represent entity classes or concepts and edges represent properties of the classes. A property is associated with a constraint which defines a range of values that can be taken without incurring a cost. Input text in which entity and concept mentions are identified is received. An optimal set of alignments between a…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/367. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).