Expanding high level queries
US-9208194-B2 · Dec 8, 2015 · US
US2015379010A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2015379010-A1 |
| Application number | US-201414315118-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 25, 2014 |
| Priority date | Jun 25, 2014 |
| Publication date | Dec 31, 2015 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An approach is provided expand queries processed by a question/answer (QA) system. In the approach, concepts are extracted from documents using natural language processing to identify the concepts included in passages found in the documents. The approach generates child level categories in a category hierarchy from the concepts and groups the child level categories into sets based on related concepts. The process creates parent categories from the sets and divides a corpus used by the QA system into a number of sub-corpora, with each of the sub-corpora corresponding to one of the child level categories. The approach answers questions posed to the QA system by identifying a child level category related to the question and searching the sub-corpora corresponding to the child level category.
Opening claim text (preview).
What is claimed is: 1 . A method, in an information handling system comprising a processor and a memory to expand queries processed by a question/answer (QA) system, the method comprising: extracting, by at least one of the processors, a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents, and wherein the concepts are stored in the memory; generating, by at least one of the processors, a plurality of child level categories in a category hierarchy from the plurality of concepts, and storing the generated child level categories in the memory; grouping, by at least one of the processors, the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating, by at least one of the processors, a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets, and storing the parent categories in the memory; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories, wherein each of the sub-corpora is stored in the memory; and answering, by at least one of the processors, a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. 2 . The method of claim 1 further comprising: indexing each of the sub-corpora separately; and associating each of the sub-corpora to the parent category of the child level category that corresponds to the sub-corpora. 3 . The method of claim 1 wherein a plurality of parent category levels are created, and wherein higher level parent categories are associated with a group of related parent level categories at a lower level. 4 . The method of claim 1 wherein the answering of the question further comprises: analyzing the question by utilizing the NLP, the analysis resulting in an identification of a question concept; identify a child level category that matches the question concept; searching the sub-corpora associated with the identified child level category for one or more supporting passages from the natural language passages; utilizing the supporting passages to generate one or more candidate answers; scoring the candidate answers; and answering the question using one or more of the scored candidate answers. 5 . The method of claim 4 further comprising: detecting a lack of supporting passages resulting from the searching; in response to detecting the lack of supporting passages: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-corpora is of child level categories previously associated with the identified parent category. 6 . The method of claim 4 further comprising: detecting that the scored candidate answers have insufficient scores; in response to detecting the insufficient scores of the scored candidate answers: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-corpora is associated with one of the child level categories included in the set of child level categories previously associated with the identified parent category. 7 . The method of claim 4 further comprising: retrieving a profile corresponding to a requestor of the question, wherein the question concept is identified based on the analysis of the question and the retrieved profile. 8 . An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to expand queries processed by a question/answer (QA) system, wherein the set of instructions perform actions of: extracting a plurality of concepts from a plurality of documents, wherein the extracting includes utilizing natural language processing (NLP) to identify the concepts included in natural language passages found in the documents; generating a plurality of child level categories in a category hierarchy from the plurality of concepts; grouping the child level categories into a plurality of sets based on a related concept identified for each of the child level categories included in each of the sets; creating a plurality of parent categories, wherein each of the parent categories corresponds to a plurality of child level categories included in one of the plurality of sets; dividing a corpus utilized by the QA system into a plurality of sub-corpora, wherein each of the sub-corpora corresponds to one of the child level categories; and answering a question posed to the QA system by identifying one of the child level categories related to the question and searching the sub-corpora corresponding to the identified child level category. 9 . The information handling system of claim 8 wherein the actions further comprise: indexing each of the sub-corpora separately; and associating each of the sub-corpora to the parent category of the child level category that corresponds to the sub-corpora. 10 . The information handling system of claim 8 wherein a plurality of parent category levels are created, and wherein higher level parent categories are associated with a group of related parent level categories at a lower level. 11 . The information handling system of claim 8 wherein the answering of the question further comprises: analyzing the question by utilizing the NLP, the analysis resulting in an identification of a question concept; identify a child level category that matches the question concept; searching the sub-corpora associated with the identified child level category for one or more supporting passages from the natural language passages; utilizing the supporting passages to generate one or more candidate answers; scoring the candidate answers; and answering the question using one or more of the scored candidate answers. 12 . The information handling system of claim 11 wherein the actions further comprise: detecting a lack of supporting passages resulting from the searching; in response to detecting the lack of supporting passages: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-corpora is associated with one of the child level categories included in the set of child level categories previously associated with the identified parent category. 13 . The information handling system of claim 11 wherein the actions further comprise: detecting that the scored candidate answers have insufficient scores; in response to detecting the insufficient scores of the scored candidate answers: identifying one of the parent categories at a higher level in the hierarchy than the identified child level category; and searching a plurality of sub-corpora associated with the identified parent category, wherein each of the plurality of sub-cor
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Query expansion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.