Ontology-driven annotation confidence levels for natural language processing

US9547640B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9547640-B2
Application numberUS-201314055185-A
CountryUS
Kind codeB2
Filing dateOct 16, 2013
Priority dateOct 16, 2013
Publication dateJan 17, 2017
Grant dateJan 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach for determining a combination of terms that represents subject matter of a natural language sentence is provided. Numbers of words from a beginning of the sentence to terms in the sentence that match terms in the combination of terms are determined. The sentence is divided into natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase. Based in part on (a) the numbers of words from the beginning of the sentence to the terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, how well the combination of terms represents the subject matter is determined.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of determining a combination of terms that represents subject matter of a natural language sentence, the method comprising the steps of: a computer determining respective numbers of words from a beginning of the sentence to respective terms in the sentence that match terms in the combination of terms; the computer dividing the sentence in a multiplicity of natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase, the complex phrase being less than an entirety of the sentence; based in part on (a) the respective numbers of words from the beginning of the sentence to respective terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, the computer determining a confidence level indicating how well the combination of terms represents a condition or problem which is the subject matter of the sentence; the computer generating a table having a top row and other rows, the top row including entries that include respective words in the sentence that match terms in the combination, the other rows including entries that include the multiplicity of natural language phrases, the other rows including first and second rows, the first row including the first and second simple phrases, and the second row including the complex phrase; the computer determining respective numbers of rows from the words in the top row that match the terms in the combination to the first row if all terms in the combination are contained in the first and/or second simple phrases, or to the second row if all terms in the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, wherein the step of determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence is further based in part on the numbers of rows from the words in the top row to the first or second row; the computer determining that the confidence level exceeds a threshold; in response to the step of determining that the confidence level exceeds the threshold, the computer retrieving contextual information from a knowledge base, the contextual information being related to the subject matter of the sentence; and based on the confidence level exceeding the threshold, the computer determining that the contextual information retrieved from the knowledge base is a cause of the condition or problem which is the subject matter of the sentence. 2. The method of claim 1 , further comprising the step of: the computer determining whether a negation is included in a phrase in the sentence, the phrase including a term included in the terms of the sentence that match the terms in the combination of terms, wherein the step of determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence is based in part on the negation being included in the phrase in the sentence. 3. The method of claim 1 , wherein the step of determining the respective numbers of words includes: determining a first number of words from the beginning of the sentence to a first term in the sentence that matches a first term in the combination of terms; determining a second number of words from the beginning of the sentence to a second term in the sentence that matches a second term in the combination of terms; and determining a difference between the first and second numbers of words, wherein the step of determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence is further based in part on the difference between the first and second numbers of words. 4. The method of claim 3 , further comprising the steps of: the computer determining whether the difference between the first and second numbers of words exceeds a threshold; in response to determining the difference between the first and second numbers of words exceeds the threshold, the computer determining a first amount by which the difference exceeds the threshold; the computer determining a second amount by multiplying the first amount by a factor; and the computer adjusting the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence by subtracting the second amount from the confidence level. 5. The method of claim 1 , further comprising the steps of: the computer receiving an ontology that includes rules; and the computer forming the combination of terms based on the rules included in the ontology. 6. The method of claim 1 , wherein the step of determining the respective numbers of rows includes: the computer determining a difference between first and second numbers of rows included in the numbers of rows; the computer determining an amount by multiplying the difference by a factor; and the computer adjusting the confidence level by subtracting the amount from the confidence level. 7. The method of claim 1 , further comprising the step of: the computer receiving the sentence as a transcription of speech that is input as a user query to an expert system by a user, wherein the steps of receiving the sentence, determining the respective numbers of words, dividing the sentence, determining the confidence level indicating how well the combination of terms represents the condition or problem which is the subject matter of the sentence, generating the table, determining the respective numbers of rows, determining that the confidence level exceeds the threshold, retrieving the contextual information, and determining that the contextual information is the cause of the condition or problem are performed by one or more processors of the computer, the one or more processors executing program instructions via at least one memory of the computer. 8. A computer program product for determining a combination of terms that represents subject matter of a natural language sentence, the computer program product comprising: one or more computer-readable storage devices and program instructions stored on the one or more storage devices, the program instructions comprising: program instructions to determine respective numbers of words from a beginning of the sentence to respective terms in the sentence that match terms in the combination of terms; program instructions to divide the sentence in a multiplicity of natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase, the complex phrase being less than an entirety of the sentence; and program instructions to determine, based in part on (a) the respective numbers of words from the beginning of the sentence to respective terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, a confidence level indicating how well the combination of terms represents a condition or problem which is the subject matter of the sentence; program instructions to generate a table having a top row and other rows, the top row including entries that include respective words in the sentence that match terms in the combination, the other r

Assignees

Inventors

Classifications

  • Selection or weighting of terms from queries, including natural language queries · CPC title

  • Parsing · CPC title

  • G06F40/289Primary

    Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9547640B2 cover?
An approach for determining a combination of terms that represents subject matter of a natural language sentence is provided. Numbers of words from a beginning of the sentence to terms in the sentence that match terms in the combination of terms are determined. The sentence is divided into natural language phrases including a complex phrase and first and second simple phrases extracted from the…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/3334. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).