Extracting information from unstructured text using generalized extraction patterns
US-9043197-B1 · May 26, 2015 · US
US9547640B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9547640-B2 |
| Application number | US-201314055185-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 16, 2013 |
| Priority date | Oct 16, 2013 |
| Publication date | Jan 17, 2017 |
| Grant date | Jan 17, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An approach for determining a combination of terms that represents subject matter of a natural language sentence is provided. Numbers of words from a beginning of the sentence to terms in the sentence that match terms in the combination of terms are determined. The sentence is divided into natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase. Based in part on (a) the numbers of words from the beginning of the sentence to the terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, how well the combination of terms represents the subject matter is determined.
Opening claim text (preview).
What is claimed is: 1. A method of determining a combination of terms that represents subject matter of a natural language sentence, the method comprising the steps of: a computer determining respective numbers of words from a beginning of the sentence to respective terms in the sentence that match terms in the combination of terms; the computer dividing the sentence in a multiplicity of natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase, the complex phrase being less than an entirety of the sentence; based in part on (a) the respective numbers of words from the beginning of the sentence to respective terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, the computer determining a confidence level indicating how well the combination of terms represents a condition or problem which is the subject matter of the sentence; the computer generating a table having a top row and other rows, the top row including entries that include respective words in the sentence that match terms in the combination, the other rows including entries that include the multiplicity of natural language phrases, the other rows including first and second rows, the first row including the first and second simple phrases, and the second row including the complex phrase; the computer determining respective numbers of rows from the words in the top row that match the terms in the combination to the first row if all terms in the combination are contained in the first and/or second simple phrases, or to the second row if all terms in the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, wherein the step of determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence is further based in part on the numbers of rows from the words in the top row to the first or second row; the computer determining that the confidence level exceeds a threshold; in response to the step of determining that the confidence level exceeds the threshold, the computer retrieving contextual information from a knowledge base, the contextual information being related to the subject matter of the sentence; and based on the confidence level exceeding the threshold, the computer determining that the contextual information retrieved from the knowledge base is a cause of the condition or problem which is the subject matter of the sentence. 2. The method of claim 1 , further comprising the step of: the computer determining whether a negation is included in a phrase in the sentence, the phrase including a term included in the terms of the sentence that match the terms in the combination of terms, wherein the step of determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence is based in part on the negation being included in the phrase in the sentence. 3. The method of claim 1 , wherein the step of determining the respective numbers of words includes: determining a first number of words from the beginning of the sentence to a first term in the sentence that matches a first term in the combination of terms; determining a second number of words from the beginning of the sentence to a second term in the sentence that matches a second term in the combination of terms; and determining a difference between the first and second numbers of words, wherein the step of determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence is further based in part on the difference between the first and second numbers of words. 4. The method of claim 3 , further comprising the steps of: the computer determining whether the difference between the first and second numbers of words exceeds a threshold; in response to determining the difference between the first and second numbers of words exceeds the threshold, the computer determining a first amount by which the difference exceeds the threshold; the computer determining a second amount by multiplying the first amount by a factor; and the computer adjusting the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence by subtracting the second amount from the confidence level. 5. The method of claim 1 , further comprising the steps of: the computer receiving an ontology that includes rules; and the computer forming the combination of terms based on the rules included in the ontology. 6. The method of claim 1 , wherein the step of determining the respective numbers of rows includes: the computer determining a difference between first and second numbers of rows included in the numbers of rows; the computer determining an amount by multiplying the difference by a factor; and the computer adjusting the confidence level by subtracting the amount from the confidence level. 7. The method of claim 1 , further comprising the step of: the computer receiving the sentence as a transcription of speech that is input as a user query to an expert system by a user, wherein the steps of receiving the sentence, determining the respective numbers of words, dividing the sentence, determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence, generating the table, determining the respective numbers of rows, determining that the confidence level exceeds the threshold, retrieving the contextual information, and determining that the contextual information is the cause of the condition or problem are performed by one or more processors of the computer, the one or more processors executing program instructions via at least one memory of the computer. 8. A computer program product for determining a combination of terms that represents subject matter of a natural language sentence, the computer program product comprising: one or more computer-readable storage devices and program instructions stored on the one or more storage devices, the program instructions comprising: program instructions to determine respective numbers of words from a beginning of the sentence to respective terms in the sentence that match terms in the combination of terms; program instructions to divide the sentence in a multiplicity of natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase, the complex phrase being less than an entirety of the sentence; and program instructions to determine, based in part on (a) the respective numbers of words from the beginning of the sentence to respective terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, a confidence level indicating how well the combination of terms represents a condition or problem which is the subject matter of the sentence; program instructions to generate a table having a top row and other rows, the top row including entries that include respective words in the sentence that match terms in the combination, the other r
Selection or weighting of terms from queries, including natural language queries · CPC title
Parsing · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.