Augmenting semantic models based on morphological rules
US-2015378984-A1 · Dec 31, 2015 · US
US9569733B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9569733-B2 |
| Application number | US-201514627430-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 20, 2015 |
| Priority date | Feb 20, 2015 |
| Publication date | Feb 14, 2017 |
| Grant date | Feb 14, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
To extract relationships between complex entities from unstructured data, a parser parses, using an existing language model, the unstructured data to generate a parse tree. From the parse tree, a set of tokens is created. A token in the set of tokens includes a set of words found in the unstructured data. The set of tokens is inserted in the existing language model to form an enhanced language model. The unstructured data is re-parsed using the enhanced language model to create a knowledge graph. From the knowledge graph, a relationship between a subset of the set of tokens is extracted.
Opening claim text (preview).
What is claimed is: 1. A method for extracting relationships between complex entities from unstructured data, the method comprising: parsing, using a parser application executing using a processor and a memory, using an existing language model, the unstructured data to generate a parse tree; creating, from the parse tree, a set of tokens, wherein a token in the set of tokens comprises a set of words found in the unstructured data; inserting the set of tokens in the existing language model to form an enhanced language model; re-parsing the unstructured data using the enhanced language model to create a knowledge graph; and extracting, from the knowledge graph, a relationship between a subset of the set of tokens. 2. The method of claim 1 , wherein the relationship is an expressed relationship, further comprising: identifying, as a branch in the knowledge graph a set of edges between the tokens in the subset, each edge in the set of edges using a corresponding predicate in a set of predicates; collapsing the branch of the knowledge graph such that the subset of tokens become related by a single edge representing the set of predicates; and concluding, as a part of the extracting, that tokens in the subset of tokens are related in the expressed relationship by the set of predicates. 3. The method of claim 2 , further comprising: concluding that a first token in the subset of tokens and a second token in a second subset of tokens are related in an inferred relationship, wherein tokens in the second subset are in a second expressed relationship according to collapsing a second branch in the knowledge graph; identifying a common token, wherein the branch leads from the common token to the first token and the second branch leads from the common token to the second token; and making the common token a condition of the inferred relationship. 4. The method of claim 3 , further comprising: determining that tokens in the second subset of tokens are related in the second expressed relationship by a second set of predicates. 5. The method of claim 1 , further comprising: using, as a part of creating the set of tokens, a knowledge repository, wherein the knowledge repository is related to a subject matter of the unstructured data. 6. The method of claim 1 , further comprising: using, as a part of creating the set of tokens, contents of the unstructured data. 7. The method of claim 1 , further comprising: using, as a part of creating the set of tokens, contents of a different unstructured data, wherein the unstructured data and the different unstructured data are related to a subject matter. 8. The method of claim 1 , wherein the token can be recognized as a single construct according to the enhanced language model during the re-parsing. 9. The method of claim 1 , wherein the words in the set of words appear together and refer to a concept identified in a subject matter of the unstructured data. 10. The method of claim 1 , wherein the parsing comprises a word-by-word parsing, and wherein the parse tree comprises single word entities related by single predicate edges. 11. The method of claim 1 , wherein the existing language model comprises a previously enhanced language model, further comprising: forming the previously enhanced language model by inserting in an original language model a previous set of tokens. 12. The method of claim 11 , further comprising: creating the previous set of tokens from parsing a different unstructured data. 13. The method of claim 1 , wherein the method is embodied in a computer program product comprising one or more computer-readable storage devices and computer-readable program instructions which are stored on the one or more computer-readable tangible storage devices and executed by one or more processors. 14. The method of claim 1 , wherein the method is embodied in a computer system comprising one or more processors, one or more computer-readable memories, one or more computer-readable storage devices and program instructions which are stored on the one or more computer-readable storage devices for execution by the one or more processors via the one or more memories and executed by the one or more processors. 15. A computer program product for extracting relationships between complex entities from unstructured data, the computer program product comprising: one or more computer-readable tangible storage devices; program instructions, stored on at least one of the one or more storage devices, to parse, using a parser application executing using a processor and a memory, using an existing language model, the unstructured data to generate a parse tree; program instructions, stored on at least one of the one or more storage devices, to create, from the parse tree, a set of tokens, wherein a token in the set of tokens comprises a set of words found in the unstructured data; program instructions, stored on at least one of the one or more storage devices, to insert the set of tokens in the existing language model to form an enhanced language model; program instructions, stored on at least one of the one or more storage devices, to re-parse the unstructured data using the enhanced language model to create a knowledge graph; and program instructions, stored on at least one of the one or more storage devices, to extract, from the knowledge graph, a relationship between a subset of the set of tokens. 16. The computer program product of claim 15 , wherein the relationship is an expressed relationship, further comprising: program instructions, stored on at least one of the one or more storage devices, to identify, as a branch in the knowledge graph a set of edges between the tokens in the subset, each edge in the set of edges using a corresponding predicate in a set of predicates; program instructions, stored on at least one of the one or more storage devices, to collapse the branch of the knowledge graph such that the subset of tokens become related by a single edge representing the set of predicates; and program instructions, stored on at least one of the one or more storage devices, to conclude, as a part of the extracting, that tokens in the subset of tokens are related in the expressed relationship by the set of predicates. 17. The computer program product of claim 16 , further comprising: program instructions, stored on at least one of the one or more storage devices, to conclude that a first token in the subset of tokens and a second token in a second subset of tokens are related in an inferred relationship, wherein tokens in the second subset are in a second expressed relationship according to collapsing a second branch in the knowledge graph; program instructions, stored on at least one of the one or more storage devices, to identify a common token, wherein the branch leads from the common token to the first token and the second branch leads from the common token to the second token; and program instructions, stored on at least one of the one or more storage devices, to make the common token a condition of the inferred relationship. 18. The computer program product of claim 17 , further comprising: program instructions, stored on at least one of the one or more storage devices, to determine that tokens in the second subset of tokens are related in the second expressed relationship by a second set of predicates. 19. The computer program product of claim 15 , further comprising: program instructions, stored on at least one of the one or more storage devices, to use, as a part of cr
Lexical analysis, e.g. tokenisation or collocates · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Parsing · CPC title
Indexing structures · CPC title
Creation of semantic tools, e.g. ontology or thesauri · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.