Dynamic Concept Based Query Expansion

US2015379010A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2015379010-A1
Application numberUS-201414315118-A
CountryUS
Kind codeA1
Filing dateJun 25, 2014
Priority dateJun 25, 2014
Publication dateDec 31, 2015
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided expand queries processed by a question/answer (QA) system. In the approach, concepts are extracted from documents using natural language processing to identify the concepts included in passages found in the documents. The approach generates child level categories in a category hierarchy from the concepts and groups the child level categories into sets based on related concepts. The process creates parent categories from the sets and divides a corpus used by the QA system into a number of sub-corpora, with each of the sub-corpora corresponding to one of the child level categories. The approach answers questions posed to the QA system by identifying a child level category related to the question and searching the sub-corpora corresponding to the child level category.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, in an information handling system comprising a processor and a memory to expand queries processed by a question/answer (QA) system, the method comprising: extracting, by at least one of the processors, a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents, and wherein the concepts are stored in the memory; generating, by at least one of the processors, a plurality of child level categories in a category hierarchy from the plurality of concepts, and storing the generated child level categories in the memory; grouping, by at least one of the processors, the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating, by at least one of the processors, a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets, and storing the parent categories in the memory; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories, wherein each of the sub-corpora is stored in the memory; and answering, by at least one of the processors, a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. 2 . The method of claim 1 further comprising: indexing each of the sub-corpora separately; and associating each of the sub-corpora to the parent category of the child level category that corresponds to the sub-corpora. 3 . The method of claim 1 wherein a plurality of parent category levels are created, and wherein higher level parent categories are associated with a group of related parent level categories at a lower level. 4 . The method of claim 1 wherein the answering of the question further comprises: analyzing the question by utilizing the NLP, the analysis resulting in an identification of a question concept; identify a child level category that matches the question concept; searching the sub-corpora associated with the identified child level category for one or more supporting passages from the natural language passages; utilizing the supporting passages to generate one or more candidate answers; scoring the candidate answers; and answering the question using one or more of the scored candidate answers. 5 . The method of claim 4 further comprising: detecting a lack of supporting passages resulting from the searching; in response to detecting the lack of supporting passages: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-corpora is of child level categories previously associated with the identified parent category. 6 . The method of claim 4 further comprising: detecting that the scored candidate answers have insufficient scores; in response to detecting the insufficient scores of the scored candidate answers: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-corpora is associated with one of the child level categories included in the set of child level categories previously associated with the identified parent category. 7 . The method of claim 4 further comprising: retrieving a profile corresponding to a requestor of the question, wherein the question concept is identified based on the analysis of the question and the retrieved profile. 8 . An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to expand queries processed by a question/answer (QA) system, wherein the set of instructions perform actions of: extracting a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents; generating a plurality of child level categories in a category hierarchy from the plurality of concepts; grouping the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories; and answering a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. 9 . The information handling system of claim 8 wherein the actions further comprise: indexing each of the sub-corpora separately; and associating each of the sub-corpora to the parent category of the child level category that corresponds to the sub-corpora. 10 . The information handling system of claim 8 wherein a plurality of parent category levels are created, and wherein higher level parent categories are associated with a group of related parent level categories at a lower level. 11 . The information handling system of claim 8 wherein the answering of the question further comprises: analyzing the question by utilizing the NLP, the analysis resulting in an identification of a question concept; identify a child level category that matches the question concept; searching the sub-corpora associated with the identified child level category for one or more supporting passages from the natural language passages; utilizing the supporting passages to generate one or more candidate answers; scoring the candidate answers; and answering the question using one or more of the scored candidate answers. 12 . The information handling system of claim 11 wherein the actions further comprise: detecting a lack of supporting passages resulting from the searching; in response to detecting the lack of supporting passages: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-corpora is associated with one of the child level categories included in the set of child level categories previously associated with the identified parent category. 13 . The information handling system of claim 11 wherein the actions further comprise: detecting that the scored candidate answers have insufficient scores; in response to detecting the insufficient scores of the scored candidate answers: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-cor

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2015379010A1 cover?
An approach is provided expand queries processed by a question/answer (QA) system. In the approach, concepts are extracted from documents using natural language processing to identify the concepts included in passages found in the documents. The approach generates child level categories in a category hierarchy from the concepts and groups the child level categories into sets based on related co…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/3043. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 31 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).