System and method for providing answers to questions

US9703861B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9703861-B2
Application numberUS-201414283929-A
CountryUS
Kind codeB2
Filing dateMay 21, 2014
Priority dateMay 14, 2008
Publication dateJul 11, 2017
Grant dateJul 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

System and method for providing answers to questions based on any corpus of data implements a method that generates a number of candidate passages from the corpus that answer an input query, and finds the correct resulting answer by collecting supporting evidence from the multiple passages. By analyzing all retrieved passages and that passage's metadata in parallel, an output plurality of data structures is generated including candidate answers based upon the analyzing. Then, supporting passage retrieval operations are performed upon the set of candidate answers, and for each candidate answer, the data corpus is traversed to find those passages having candidate answer in addition to query terms. All candidate answers are automatically scored by a plurality of scoring modules, each producing a module score. The modules scores are processed to determine one or more query answers; and, a query response is generated based on the one or more query answers.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method of generating answers to natural language questions, said method comprising: receiving said natural language question; processing said question using a plurality of natural language processing techniques to obtain a plurality of searchable components; conducting a first search in a corpus based on said plurality of searchable components to obtain a plurality of relevant documents; analyzing said plurality documents to generate a plurality of candidate answers; after said candidate answers are generated, retrieving a plurality of supporting passages from said corpus based on said candidate answers; analyzing said candidate answers based on grammatical and semantic structures in said query and in said retrieved supporting passages; generating a confidence score for each analyzed candidate answer, said confidence score based on a ratio of the number of query terms present in a supporting passage with a total number of the searchable components of the received question; and outputting said candidate answers and associated confidence scores. 2. The method as claimed in claim 1 , wherein said processing said question further comprises: determining one or more predicate argument structures of said query. 3. The method as claimed in claim 1 , wherein said processing said question further comprises: determining one or more lexical answer types for each input query. 4. The method as claimed in claim 1 , wherein said processing said question further comprises: extending one or more searchable components using a function for term weighting and query expansion. 5. The method as claimed in claim 1 , wherein said retrieving said plurality of supporting passages and generating a confidence score for each candidate answer based on said supporting passages occurs in parallel operations. 6. The method as claimed in claim 5 , wherein said confidence score comprises a term match score, said method further comprising: counting a number of query terms in a passage; and determining if said number matches a number of terms in a candidate answer. 7. The method as claimed in claim 5 , wherein said confidence score comprises a textual alignment score, said method comprising: determining if a placement of words in said passages are in alignment with a placement of words of said candidate answers. 8. The method as claimed in claim 7 , wherein said determining if placement of words in said supporting passages are in alignment includes: determining whether said words in said passages are one of: a same order, a similar order, or with a similar distance between them. 9. The method as claimed in claim 5 , wherein said confidence score comprises a deeper analysis score, said method further comprising: determining a meaning of the passage and the input query; and computing a degree of satisfying a lexical or a semantic relation in the passage. 10. The method as claimed in claim 1 , further comprising: ranking said candidate answers based on their scores to determine one or more query answers, wherein an output candidate answer includes one of: a single query answer, or a ranked list of query answers. 11. The method as claimed in claim 10 , further comprising: generating an elaboration question for delivery to a user, and receiving information from a user responsive to said elaboration question for use in determining said query answer. 12. The method as claimed in claim 10 , further comprising: determining if a query answer or ranked list of query answers is above a threshold rank level, and if below said threshold rank level, delivering a response to a user comprising one or more clarification questions, and receiving information from a user responsive to said one or more clarification questions, said received user information being added to said query. 13. The method as claimed in claim 10 , further comprising: producing, using machine learning techniques, a trained model component from prior data, said prior data encoding on one or more of: features of candidate answers, features of passages having the candidate answers, the candidate answer scores and whether the candidate answer was correct or not; and utilizing said trained model component for ranking said candidate answers. 14. The method as claimed in claim 13 , further comprising: producing a prediction function from said trained model component; and applying said prediction function for answer ranking, said applying implementing methods to weight said candidate answers. 15. The method as claimed in claim 10 , further comprising: determining a final candidate answer by: collecting results across all supporting passages having said scored candidate answers, normalizing and merging candidate answers produced by a same answer scorer across multiple instances of the candidate answer, and aggregating the results; and, applying a candidate answer scoring function to produce said final candidate answer. 16. The method as claimed in claim 15 , wherein said applying a candidate answer scoring function comprises one of: performing context independent scoring where the answer is scored independently of the supporting passage; or performing context dependent scoring where the answer score depends on the supporting passage content. 17. A system for generating answers to natural language questions comprising: a memory storage device; a hardware processor in communication with said memory storage device and configured to: receive said natural language question; process said question using a plurality of natural language processing techniques to obtain a plurality of searchable components; conduct a first search in a corpus based on said plurality of searchable components to obtain a plurality of relevant documents; analyze said plurality documents to generate a plurality of candidate answers; after said candidate answers are generated, retrieve a plurality of supporting passages from said corpus based on said candidate answers; analyzing said candidate answers based on grammatical and semantic structures in said query and in said supporting passages; generate a confidence score for each analyzed candidate answer, said confidence score based on a ratio of the number of query terms present in a supporting passage with a total number of the searchable components of the received question; and output said candidate answers and associated confidence scores. 18. The system as claimed in claim 17 , wherein said natural language question and said output candidate answers is provided in accordance with one or more of multiple modalities including text, audio, image, video, tactile or gesture. 19. The system as claimed in claim 18 , wherein said hardware processor is further configured to: determine, from said natural language question, one or more predicate argument structures; and, determine, from said natural language question, one or more lexical answer types. 20. The system as claimed in claim 18 , wherein said retrieving said plurality of supporting passages and generating a confidence score for each candidate answer based on said supporting passages occurs in parallel operations. 21. The system as claimed in claim 20 , wherein said hardware processor is further configured to conduct, in parallel, one or more analyses in a plurality of scoring modules to each produce a confidence score, wherein said confidence score comprises one or more of: conduct, in parallel, one or more anal

Assignees

Inventors

Classifications

  • G06N5/02Primary

    Knowledge representation; Symbolic representation · CPC title

  • Natural language query formulation · CPC title

  • Query execution (filtering based on additional data G06F16/335) · CPC title

  • using natural language analysis · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9703861B2 cover?
System and method for providing answers to questions based on any corpus of data implements a method that generates a number of candidate passages from the corpus that answer an input query, and finds the correct resulting answer by collecting supporting evidence from the multiple passages. By analyzing all retrieved passages and that passage's metadata in parallel, an output plurality of data …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N5/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).