Training a Question/Answer System Using Answer Keys Based on Forum Content

US2016171373A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016171373-A1
Application numberUS-201414570305-A
CountryUS
Kind codeA1
Filing dateDec 15, 2014
Priority dateDec 15, 2014
Publication dateJun 16, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided to train a question answering (QA) system using answer keys based on forum content. In the approach, a question is selected from a post in a threaded discussion. An answer to the selected question is automatically identified from crowd-based sources, with the identified answer having a confidence level greater than a threshold. An answer key is built using the selected question and the identified answer. The QA system is automatically trained using the answer key.

First claim

Opening claim text (preview).

1 . A method implemented by an information handling system that includes a memory and a processor, the method comprising: selecting a question from a post in a threaded discussion; automatically identifying an answer to the selected question from one or more crowd-based sources, wherein the identified answer has a confidence level greater than a threshold; building an answer key with the selected question and the identified answer; and automatically training a question answering (QA) system using the answer key. 2 . The method of claim 1 further comprising: revising the selected question based on a focus and a lexical answer type (LAT) of the selected question given a noun, a subject, a verb and a context identified in the selected question, wherein the revised question is used in the answer key. 3 . The method of claim 1 wherein the selection of the question further comprises: analyzing a plurality of posts included in one or more threads of the threaded discussion, wherein the analyzing further comprises: identifying a term in a parent post of the threaded discussion; detecting that an anaphor in a child post of the threaded discussion references the identified term; and resolving the anaphor found in the child post with the identified term; storing the parent post with the identified term and the child post with the resolved anaphor in a forum tree; and selecting the parent post as the selected question. 4 . The method of claim 1 wherein automatically identifying the answer further comprises: identifying one or more question keywords and a context in the selected question using natural language processing (NLP); mining a plurality of crowd sourced data sets for crowd sourced information, wherein the mining is based on the identified question keywords and context, and wherein the crowd sourced data sets have stored therein a collective opinion of a crowd of individuals; evaluating the mined crowd sourced information based on a social support attribute included in a crowd sourced metadata, wherein the evaluating results in a most likely answer that is scored based on the social support attribute; and identifying the answer as the resulting most likely answer. 5 . The method of claim 4 wherein a knowledge base comprises the crowd sourced information and the crowd sourced metadata, and wherein the method further comprises: identifying one or more of the crowd sourced metadata based on the identified question keywords and the identified question context; and creating a social search criteria based on the identified crowd sourced metadata. 6 . The method of claim 5 further comprising: searching the crowd sourced information using the social search criteria, the result of the searching being a plurality of candidate answers. 7 . The method of claim 6 further comprising: identifying the crowd metadata associated with the plurality of candidate answers; determining a metadata strength of the identified crowd metadata, wherein the metadata strength is based on one or more factors, and wherein at least one of the factors relates to a social support of an opinion; and scoring the plurality of candidate answers based on an association of the identified crowd metadata to each of the candidate answers. 8 . An information handling system comprising: one or more processors; one or more data stores accessible by at least one of the processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: selecting a question from a post in a threaded discussion; automatically identifying an answer to the selected question from one or more crowd-based sources, wherein the identified answer has a confidence level greater than a threshold; building an answer key with the selected question and the identified answer; and automatically training a question answering (QA) system using the answer key. 9 . The information handling system of claim 8 wherein the actions further comprise: revising the selected question based on a focus and a lexical answer type (LAT) of the selected question given a noun, a subject, a verb and a context identified in the selected question, wherein the revised question is used in the answer key. 10 . The information handling system of claim 8 wherein the selection of the question further comprises actions of: analyzing a plurality of posts included in one or more threads of the threaded discussion, wherein the analyzing further comprises: identifying a term in a parent post of the threaded discussion; detecting that an anaphor in a child post of the threaded discussion references the identified term; and resolving the anaphor found in the child post with the identified term; storing the parent post with the identified term and the child post with the resolved anaphor in a forum tree; and selecting the parent post as the selected question. 11 . The information handling system of claim 8 wherein automatically identifying the answer further comprises actions of: identifying one or more question keywords and a context in the selected question using natural language processing (NLP); mining a plurality of crowd sourced data sets for crowd sourced information, wherein the mining is based on the identified question keywords and context, and wherein the crowd sourced data sets have stored therein a collective opinion of a crowd of individuals; evaluating the mined crowd sourced information based on a social support attribute included in a crowd sourced metadata, wherein the evaluating results in a most likely answer that is scored based on the social support attribute; and identifying the answer as the resulting most likely answer. 12 . The information handling system of claim 11 wherein a knowledge base comprises the crowd sourced information and the crowd sourced metadata, and wherein the actions further comprise: identifying one or more of the crowd sourced metadata based on the identified question keywords and the identified question context; and creating a social search criteria based on the identified crowd sourced metadata. 13 . The information handling system of claim 12 wherein the actions further comprise: searching the crowd sourced information using the social search criteria, the result of the searching being a plurality of candidate answers. 14 . The information handling system of claim 13 wherein the actions further comprise: identifying the crowd metadata associated with the plurality of candidate answers; determining a metadata strength of the identified crowd metadata, wherein the metadata strength is based on one or more factors, and wherein at least one of the factors relates to a social support of an opinion; and scoring the plurality of candidate answers based on an association of the identified crowd metadata to each of the candidate answers. 15 . A computer program product stored in a computer readable storage medium, comprising computer program code that, when executed by an information handling system, causes the information handling system to perform actions comprising: selecting a question from a post in a threaded discussion; automatically identifying an answer to the selected question from one or more crowd-based sources, wherein the identified answer has a confidence level greater than a threshold; building an answer key with the selected question and the identified answer; and automatically training a question answering (QA) system using the answer ke

Assignees

Inventors

Classifications

  • Inference or reasoning models · CPC title

  • Translation of natural language queries to structured queries · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • Computing arrangements using knowledge-based models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016171373A1 cover?
An approach is provided to train a question answering (QA) system using answer keys based on forum content. In the approach, a question is selected from a post in a threaded discussion. An answer to the selected question is automatically identified from crowd-based sources, with the identified answer having a confidence level greater than a threshold. An answer key is built using the selected q…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N5/022. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).