Automatic Generation of Training Cases and Answer Key from Historical Corpus

US2016148114A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016148114-A1
Application numberUS-201414552948-A
CountryUS
Kind codeA1
Filing dateNov 25, 2014
Priority dateNov 25, 2014
Publication dateMay 26, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Mechanisms are provided for training and operating a Question and Answer (QA) system pipeline. A corpus of information is received which comprises historical data to which one or more filter criteria are applied to extract filtered historical data relevant to a training objective for training the QA system pipeline. Attribute data, action data, and temporal characteristic data are captured from the filtered historical data. An answer key entry is automatically generated in an automatically generated training answer key data structure based on the attribute data, action data, and temporal characteristic data. The correct answer associated with the answer key entry is an action specified by the action data. The temporal characteristic data provides a historical context for the answer key entry. The QA system pipeline is trained using the automatically generated training answer key data structure.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, in a data processing system having a processor and a memory configured with logic for implementing a Question and Answer (QA) system pipeline, the method comprising: receiving, by the data processing system, a corpus of information comprising historical data; automatically applying, by the data processing system, one or more filter criteria to the historical data to extract filtered historical data relevant to a training objective for training the QA system pipeline; automatically capturing, by the data processing system, attribute data, action data, and temporal characteristic data from the filtered historical data; automatically generating, by the data processing system, an answer key entry in an automatically generated training answer key data structure based on the attribute data, action data, and temporal characteristic data; and training, by the data processing system, the QA system pipeline using the automatically generated training answer key data structure. 2 . The method of claim 1 , wherein a correct answer associated with the answer key entry is an action specified by the action data, and wherein the temporal characteristic data provides a historical context for the answer key entry. 3 . The method of claim 1 , wherein the one or more filter criteria comprises at least one of a source filter criterion used to select historical data associated with specific sources of information in the corpus of information, a temporal filter criterion that is used to select information in the corpus of information that is more contemporary, or a confidence filter criterion that is used to select historical data with at least a specified level of confidence associated with the historical data. 4 . The method of claim 1 , wherein the action data of the answer key entry specifies a correct answer to a question, the attribute data of the answer key entry comprises question features for correlating the question to the correct answer, and the temporal characteristic data specifies a historical date or time at which the answer was considered correct for the question. 5 . The method of claim 4 , wherein training the QA system pipeline using the automatically generated training answer key data structure comprises: receiving, by the QA system pipeline, a training case comprising a training question for processing by the QA system pipeline and a training case temporal characteristic; filtering, by the QA system pipeline, the corpus of information based on the training case temporal characteristic to thereby generate a temporally filtered sub-corpus; and processing, by the QA system pipeline, the training question based on the temporally filtered sub-corpus to generate an answer to the training question. 6 . The method of claim 5 , wherein training the QA system pipeline further comprises: comparing, by the QA system pipeline, the answer to the training question with a correct answer in a corresponding training answer key entry of the training answer key data structure; and modifying, by the QA system pipeline, an operation of the QA system pipeline based on results of the comparing. 7 . The method of claim 6 , wherein the corresponding training answer key entry of the training answer key data structure is a training answer key entry having attributes matching attributes of the training question and a temporal characteristic matching the training case temporal characteristic. 8 . The method of claim 5 , wherein training the QA system pipeline comprises generating, through a machine learning process, a trained model, having weights to be applied to annotation logic of the QA system pipeline, based on a degree of correspondence between the answer to the training question and a correct answer specified in a training answer key entry corresponding to the training question and training case temporal characteristic of the training case. 9 . The method of claim 1 , wherein the corpus of information comprises financial investment information, and wherein the filtered historical data comprises entries in the financial investment information directed to investment decisions made by financial investors on or prior to a date or time specified in the temporal characteristic. 10 . The method of claim 1 , wherein the corpus of information comprises patient medical records, and wherein the filtered historical data comprises entries in patient medical records directed to healthcare services provided to patients on or prior to a date or time specified in the temporal characteristic. 11 . A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: receive a corpus of information comprising historical data; automatically apply one or more filter criteria to the historical data to extract filtered historical data relevant to a training objective for training a Question and Answer (QA) system pipeline; automatically capture attribute data, action data, and temporal characteristic data from the filtered historical data; automatically generate an answer key entry in an automatically generated training answer key data structure based on the attribute data, action data, and temporal characteristic data; and train the QA system pipeline using the automatically generated training answer key data structure. 12 . The computer program product of claim 11 , wherein a correct answer associated with the answer key entry is an action specified by the action data, and wherein the temporal characteristic data provides a historical context for the answer key entry. 13 . The computer program product of claim 11 , wherein the one or more filter criteria comprises at least one of a source filter criterion used to select historical data associated with specific sources of information in the corpus of information, a temporal filter criterion that is used to select information in the corpus of information that is more contemporary, or a confidence filter criterion that is used to select historical data with at least a specified level of confidence associated with the historical data. 14 . The computer program product of claim 11 , wherein the action data of the answer key entry specifies a correct answer to a question, the attribute data of the answer key entry comprises question features for correlating the question to the correct answer, and the temporal characteristic data specifies a historical date or time at which the answer was considered correct for the question. 15 . The computer program product of claim 14 , wherein the computer readable program further causes the computing device to train the QA system pipeline using the automatically generated training answer key data structure at least by: receiving, by the QA system pipeline, a training case comprising a training question for processing by the QA system pipeline and a training case temporal characteristic; filtering, by the QA system pipeline, the corpus of information based on the training case temporal characteristic to thereby generate a temporally filtered sub-corpus; and processing, by the QA system pipeline, the training question based on the temporally filtered sub-corpus to generate an answer to the training question. 16 . The computer program product of claim 15 , wherein the computer readable program further causes the computing device to train the QA system pipeline at least by: comparing, by the QA system pipeline, the answer to the tr

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016148114A1 cover?
Mechanisms are provided for training and operating a Question and Answer (QA) system pipeline. A corpus of information is received which comprises historical data to which one or more filter criteria are applied to extract filtered historical data relevant to a training objective for training the QA system pipeline. Attribute data, action data, and temporal characteristic data are captured from…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G16H10/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).