Question answering device, question answering method, and question answering program
US-8983977-B2 · Mar 17, 2015 · US
US10339453B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10339453-B2 |
| Application number | US-201314139589-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2013 |
| Priority date | Dec 23, 2013 |
| Publication date | Jul 2, 2019 |
| Grant date | Jul 2, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided in a data processing system for automatically generating question and answer pairs for training a question answering system for a given domain. The mechanism identifies a set of patterns of components in passages within a corpus of documents for the given domain. The mechanism identifies a set of rules that correspond to the set of patterns for generating question and answer pairs from the passages within the corpus of documents. The mechanism applies the set of rules to the passages to generate the question and answer pairs.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system configured with a computer readable program that causes the data processing system to implement a question and answer creation system executing on a processor of the data processing system for automatically generating question and answer pairs for training a question answering system for a given domain, the method comprising: automatically identifying, by the question and answer creation system executing on the processor of the data processing system, a set of most frequently occurring patterns of components in passages within a corpus of documents for the given domain using an unsupervised technique; automatically filtering the set of most frequently occurring patterns to remove frequently occurring patterns that are unlikely to result in meaningful questions based on a domain dictionary to form a filtered set of patterns; identifying, by the question and answer creation system, a set of rules that correspond to the filtered set of patterns for generating question and answer pairs from the passages within the corpus of documents; storing, by the question and answer creation system, the filtered set of patterns in association with the set of rules in a pattern-rules mapping storage; identifying, by the question and answer creation system, an identified set of passages in the corpus that match the filtered set of patterns in the pattern-rules mapping storage; performing, by the question and answer creation system, pre-processing on the set of passages to select a subset of the passages in the identified set of passages to be used for generating question and answer pairs to form a selected set of passages, wherein the pre-processing collects metadata attributes of the identified set of passages to select the selected set of passages; applying, by the question and answer creation system, the set of rules in the pattern-rules mapping storage to the selected set of passages to generate a set of question and answer pairs; performing, by the question and answer creation system, post-processing on the set of question and answer pairs using the metadata attributes to form a final set of question and answer pairs, wherein performing post-processing comprises ordering questions by similarity; merging similar questions with the same answer; scoring similar questions with different answers; and applying an analytic algorithm to the similar questions to resolve conflicts and generate new questions; and training a question answering system using the final set of question and answer pairs. 2. The method of claim 1 , wherein performing pre-processing comprises collecting the metadata attributes based on syntactic and semantic clues from a document in which each given passage in the set of passages occurs. 3. The method of claim 1 , wherein the components of the patterns are selected from a group consisting of: words, part-of-speech tags, named entities, or subject-predicate relations. 4. The method of claim 1 , wherein identifying the set of rules utilizes techniques selected from a group consisting of: pronoun disambiguation, anaphora resolution, language linguistics, sentence relationships, frequency, or lexical databases. 5. The method of claim 1 , further comprising ranking the generated question and answer pairs and using a high ranked subset of question and answer pairs to train the question answering system. 6. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein a computing device configured with the computer readable program implements a question and answer creation system executing on a processor of the computing device for automatically generating question and answer pairs for training a question answering system for a given domain, wherein the computer readable program causes the computing device to: automatically identify, by the question and answer creation system executing on the computing device, a set of most frequently occurring patterns of components in passages within a corpus of documents for the given domain using an unsupervised technique; automatically filtering the set of most frequently occurring patterns to remove frequently occurring patterns that are unlikely to result in meaningful questions based on a domain dictionary to form a filtered set of patterns; identify, by the question and answer creation system, a set of rules that correspond to the filtered set of patterns for generating question and answer pairs from the passages within the corpus of documents; store, by the question and answer creation system, the filtered set of patterns in association with the set of rules in a pattern-rules mapping storage; identify, by the question and answer creation system, an identified set of passages in the corpus that match the filtered set of patterns in the pattern-rules mapping storage; perform, by the question and answer creation system, pre-processing on the set of passages to select a subset of the passages in the identified set of passages to be used for generating question and answer pairs to form a selected set of passages, wherein the pre-processing collects metadata attributes of the identified set of passages to select the selected set of passages; apply, by the question and answer creation system, the set of rules in the pattern-rules mapping storage to the selected set of passages to generate a set of question and answer pairs; perform, by the question and answer creation system, post-processing on the set of question and answer pairs using the metadata attributes to form a final set of question and answer pairs, wherein performing post-processing comprises ordering questions by similarity; merging similar questions with the same answer; scoring similar questions with different answers; and applying an analytic algorithm to the similar questions to resolve conflicts and generate new questions; and train a question answering system using the final set of question and answer pairs. 7. The computer program product of claim 6 , wherein performing pre-processing comprises collecting the metadata attributes based on syntactic and semantic clues from a document in which each given passage in the set of passages occurs. 8. The computer program product of claim 6 , wherein the components of the patterns are selected from a group consisting of: words, part-of-speech tags, named entities, or subject-predicate relations. 9. The computer program product of claim 6 , wherein identifying the set of rules utilizes techniques selected from a group consisting of: pronoun disambiguation, anaphora resolution, language linguistics, sentence relationships, frequency, or lexical databases. 10. The computer program product of claim 6 , wherein the computer readable program further causes the computing device to rank the generated question and answer pairs and using a high ranked subset of question and answer pairs to train the question answering system. 11. An apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises a computer readable program, wherein the apparatus configured with the computer readable program implements a question and answer creation system executing on the processor for automatically generating question and answer pairs for training a question answering system for a given domain, wherein the computer readable program causes the processor to: automatically identify, by the question and answer creation system, a set of most frequently occurring patterns of components in passages within a corpus of documents for the given domain using an unsupervised technique; automatically fi
Extracting rules from data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.