Automatic Generation of Training Cases and Answer Key from Historical Corpus

US2019325347A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019325347-A1
Application numberUS-201916460368-A
CountryUS
Kind codeA1
Filing dateJul 2, 2019
Priority dateNov 25, 2014
Publication dateOct 24, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Mechanisms are provided for training and operating a Question and Answer (QA) system pipeline. A corpus of information is received which comprises historical data to which one or more filter criteria are applied to extract filtered historical data relevant to a training objective for training the QA system pipeline. Attribute data, action data, and temporal characteristic data are captured from the filtered historical data. An answer key entry is automatically generated in an automatically generated training answer key data structure based on the attribute data, action data, and temporal characteristic data. The correct answer associated with the answer key entry is an action specified by the action data. The temporal characteristic data provides a historical context for the answer key entry. The QA system pipeline is trained using the automatically generated training answer key data structure.

First claim

Opening claim text (preview).

1 . A method, in a data processing system having a processor and a memory, wherein the memory comprises instructions which are executed by the processor to cause the processor to implement a training engine for generating training cases and an answer key from a historical corpus for training a Question and Answer (QA) system, the method comprising: receiving, by corpus ingestion logic executing within the training engine, a corpus of information comprising historical data; automatically applying, by filtering logic executing within the training engine, one or more filter criteria to the historical data to extract filtered historical data relevant to a training objective for training the QA system pipeline; automatically capturing, by system answer key and training case generation logic executing within the training engine, attribute data, action data, and temporal characteristic data from the filtered historical data; automatically generating, by the answer key and training case generation logic, an answer key entry in an automatically generated training answer key data structure based on the attribute data, action data, and temporal characteristic data; and training, by the training engine, a QA system pipeline for the QA system using the automatically generated training answer key data structure. 2 . The method of claim 1 , wherein a correct answer associated with the answer key entry is an action specified by the action data, and wherein the temporal characteristic data provides a historical context for the answer key entry. 3 . The method of claim 1 , wherein the one or more filter criteria comprises at least one of a source filter criterion used to select historical data associated with specific sources of information in the corpus of information, a temporal filter criterion that is used to select information in the corpus of information that is more contemporary, or a confidence filter criterion that is used to select historical data with at least a specified level of confidence associated with the historical data. 4 . The method of claim 1 , wherein the action data of the answer key entry specifies a correct answer to a question, the attribute data of the answer key entry comprises question features for correlating the question to the correct answer, and the temporal characteristic data specifies a historical date or time at which the answer was considered correct for the question. 5 . The method of claim 4 , wherein training the QA system pipeline using the automatically generated training answer key data structure comprises: receiving, by the QA system pipeline, a training case comprising a training question for processing by the QA system pipeline and a training case temporal characteristic; filtering, by the QA system pipeline, the corpus of information based on the training case temporal characteristic to thereby generate a temporally filtered sub-corpus; and processing, by the QA system pipeline, the training question based on the temporally filtered sub-corpus to generate an answer to the training question. 6 . The method of claim 5 , wherein training the QA system pipeline further comprises: comparing, by the QA system pipeline, the answer to the training question with a correct answer in a corresponding training answer key entry of the training answer key data structure; and modifying, by the QA system pipeline, an operation of the QA system pipeline based on results of the comparing. 7 . The method of claim 6 , wherein the corresponding training answer key entry of the training answer key data structure is a training answer key entry having attributes matching attributes of the training question and a temporal characteristic matching the training case temporal characteristic. 8 . The method of claim 5 , wherein training the QA system pipeline comprises generating, through a machine learning process, a trained model, having weights to be applied to annotation logic of the QA system pipeline, based on a degree of correspondence between the answer to the training question and a correct answer specified in a training answer key entry corresponding to the training question and training case temporal characteristic of the training case. 9 . The method of claim 1 , wherein the corpus of information comprises financial investment information, and wherein the filtered historical data comprises entries in the financial investment information directed to investment decisions made by financial investors on or prior to a date or time specified in the temporal characteristic. 10 . The method of claim 1 , wherein the corpus of information comprises patient medical records, and wherein the filtered historical data comprises entries in patient medical records directed to healthcare services provided to patients on or prior to a date or time specified in the temporal characteristic. 11 . A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to implement a training engine for generating training cases and an answer key from a historical corpus for training a Question and Answer (OA) system, wherein the computer readable program further causes the computing device to: receive, by corpus ingestion logic executing within the training engine, a corpus of information comprising historical data; automatically apply, by filtering logic executing within the training engine, one or more filter criteria to the historical data to extract filtered historical data relevant to a training objective for training a Question and Answer (QA) system pipeline; automatically capture, by answer key and training case generation logic executing within the training engine, attribute data, action data, and temporal characteristic data from the filtered historical data; automatically generate, by the answer key and training case generation logic, an answer key entry in an automatically generated training answer key data structure based on the attribute data, action data, and temporal characteristic data; and train, by the training engine, a QA system pipeline for the QA system using the automatically generated training answer key data structure. 12 . The computer program product of claim 11 , wherein a correct answer associated with the answer key entry is an action specified by the action data, and wherein the temporal characteristic data provides a historical context for the answer key entry. 13 . The computer program product of claim 11 , wherein the one or more filter criteria comprises at least one of a source filter criterion used to select historical data associated with specific sources of information in the corpus of information, a temporal filter criterion that is used to select information in the corpus of information that is more contemporary, or a confidence filter criterion that is used to select historical data with at least a specified level of confidence associated with the historical data. 14 . The computer program product of claim 11 , wherein the action data of the answer key entry specifies a correct answer to a question, the attribute data of the answer key entry comprises question features for correlating the question to the correct answer, and the temporal characteristic data specifies a historical date or time at which the answer was considered correct for the question. 15 . The computer program product of claim 14 , wherein the computer readable program further causes the computing device to train the QA system pipeline using the aut

Assignees

Inventors

Classifications

  • G16H10/60Primary

    for patient-specific data, e.g. for electronic patient records · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • G06N5/02Primary

    Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019325347A1 cover?
Mechanisms are provided for training and operating a Question and Answer (QA) system pipeline. A corpus of information is received which comprises historical data to which one or more filter criteria are applied to extract filtered historical data relevant to a training objective for training the QA system pipeline. Attribute data, action data, and temporal characteristic data are captured from…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G16H10/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 24 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).