What technology area does this patent fall under?

Primary CPC classification G06N99/005. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Extracting complex entities and relationships from unstructured data

US9569733B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9569733-B2
Application number	US-201514627430-A
Country	US
Kind code	B2
Filing date	Feb 20, 2015
Priority date	Feb 20, 2015
Publication date	Feb 14, 2017
Grant date	Feb 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

To extract relationships between complex entities from unstructured data, a parser parses, using an existing language model, the unstructured data to generate a parse tree. From the parse tree, a set of tokens is created. A token in the set of tokens includes a set of words found in the unstructured data. The set of tokens is inserted in the existing language model to form an enhanced language model. The unstructured data is re-parsed using the enhanced language model to create a knowledge graph. From the knowledge graph, a relationship between a subset of the set of tokens is extracted.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for extracting relationships between complex entities from unstructured data, the method comprising: parsing, using a parser application executing using a processor and a memory, using an existing language model, the unstructured data to generate a parse tree; creating, from the parse tree, a set of tokens, wherein a token in the set of tokens comprises a set of words found in the unstructured data; inserting the set of tokens in the existing language model to form an enhanced language model; re-parsing the unstructured data using the enhanced language model to create a knowledge graph; and extracting, from the knowledge graph, a relationship between a subset of the set of tokens. 2. The method of claim 1 , wherein the relationship is an expressed relationship, further comprising: identifying, as a branch in the knowledge graph a set of edges between the tokens in the subset, each edge in the set of edges using a corresponding predicate in a set of predicates; collapsing the branch of the knowledge graph such that the subset of tokens become related by a single edge representing the set of predicates; and concluding, as a part of the extracting, that tokens in the subset of tokens are related in the expressed relationship by the set of predicates. 3. The method of claim 2 , further comprising: concluding that a first token in the subset of tokens and a second token in a second subset of tokens are related in an inferred relationship, wherein tokens in the second subset are in a second expressed relationship according to collapsing a second branch in the knowledge graph; identifying a common token, wherein the branch leads from the common token to the first token and the second branch leads from the common token to the second token; and making the common token a condition of the inferred relationship. 4. The method of claim 3 , further comprising: determining that tokens in the second subset of tokens are related in the second expressed relationship by a second set of predicates. 5. The method of claim 1 , further comprising: using, as a part of creating the set of tokens, a knowledge repository, wherein the knowledge repository is related to a subject matter of the unstructured data. 6. The method of claim 1 , further comprising: using, as a part of creating the set of tokens, contents of the unstructured data. 7. The method of claim 1 , further comprising: using, as a part of creating the set of tokens, contents of a different unstructured data, wherein the unstructured data and the different unstructured data are related to a subject matter. 8. The method of claim 1 , wherein the token can be recognized as a single construct according to the enhanced language model during the re-parsing. 9. The method of claim 1 , wherein the words in the set of words appear together and refer to a concept identified in a subject matter of the unstructured data. 10. The method of claim 1 , wherein the parsing comprises a word-by-word parsing, and wherein the parse tree comprises single word entities related by single predicate edges. 11. The method of claim 1 , wherein the existing language model comprises a previously enhanced language model, further comprising: forming the previously enhanced language model by inserting in an original language model a previous set of tokens. 12. The method of claim 11 , further comprising: creating the previous set of tokens from parsing a different unstructured data. 13. The method of claim 1 , wherein the method is embodied in a computer program product comprising one or more computer-readable storage devices and computer-readable program instructions which are stored on the one or more computer-readable tangible storage devices and executed by one or more processors. 14. The method of claim 1 , wherein the method is embodied in a computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices and program instructions which are stored on the one or more computer-readable storage devices for execution by the one or more processors via the one or more memories and executed by the one or more processors. 15. A computer program product for extracting relationships between complex entities from unstructured data, the computer program product comprising: one or more computer-readable tangible storage devices; program instructions, stored on at least one of the one or more storage devices, to parse, using a parser application executing using a processor and a memory, using an existing language model, the unstructured data to generate a parse tree; program instructions, stored on at least one of the one or more storage devices, to create, from the parse tree, a set of tokens, wherein a token in the set of tokens comprises a set of words found in the unstructured data; program instructions, stored on at least one of the one or more storage devices, to insert the set of tokens in the existing language model to form an enhanced language model; program instructions, stored on at least one of the one or more storage devices, to re-parse the unstructured data using the enhanced language model to create a knowledge graph; and program instructions, stored on at least one of the one or more storage devices, to extract, from the knowledge graph, a relationship between a subset of the set of tokens. 16. The computer program product of claim 15 , wherein the relationship is an expressed relationship, further comprising: program instructions, stored on at least one of the one or more storage devices, to identify, as a branch in the knowledge graph a set of edges between the tokens in the subset, each edge in the set of edges using a corresponding predicate in a set of predicates; program instructions, stored on at least one of the one or more storage devices, to collapse the branch of the knowledge graph such that the subset of tokens become related by a single edge representing the set of predicates; and program instructions, stored on at least one of the one or more storage devices, to conclude, as a part of the extracting, that tokens in the subset of tokens are related in the expressed relationship by the set of predicates. 17. The computer program product of claim 16 , further comprising: program instructions, stored on at least one of the one or more storage devices, to conclude that a first token in the subset of tokens and a second token in a second subset of tokens are related in an inferred relationship, wherein tokens in the second subset are in a second expressed relationship according to collapsing a second branch in the knowledge graph; program instructions, stored on at least one of the one or more storage devices, to identify a common token, wherein the branch leads from the common token to the first token and the second branch leads from the common token to the second token; and program instructions, stored on at least one of the one or more storage devices, to make the common token a condition of the inferred relationship. 18. The computer program product of claim 17 , further comprising: program instructions, stored on at least one of the one or more storage devices, to determine that tokens in the second subset of tokens are related in the second expressed relationship by a second set of predicates. 19. The computer program product of claim 15 , further comprising: program instructions, stored on at least one of the one or more storage devices, to use, as a part of cr

Assignees

Inventors

Classifications

G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F40/40
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
G06F40/205
Parsing · CPC title
G06F16/316
Indexing structures · CPC title
G06F16/36
Creation of semantic tools, e.g. ontology or thesauri · CPC title

Patent family

Related publications grouped by family.

View patent family 56693197

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9569733B2 cover?: To extract relationships between complex entities from unstructured data, a parser parses, using an existing language model, the unstructured data to generate a parse tree. From the parse tree, a set of tokens is created. A token in the set of tokens includes a set of words found in the unstructured data. The set of tokens is inserted in the existing language model to form an enhanced language …
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).