Realtime ingestion via multi-corpus knowledge base with weighting

US9690862B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9690862-B2
Application numberUS-201414517813-A
CountryUS
Kind codeB2
Filing dateOct 18, 2014
Priority dateOct 18, 2014
Publication dateJun 27, 2017
Grant dateJun 27, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach is provided for updating corpora in a Question and Answer (QA) system. Ingestion of a first set of sources into a first corpus and a second set of sources into a second corpus with the second set of sources including updates to the first set of sources. A question is received. The system identifies candidate answers to the question using the sources included in the corpuses. Each candidate answer has a weighting. The system determines whether the first and second corpuses have an overlapping source from which two candidate answers were identified. If an overlapping source is found, the candidate answer from the overlapping source in the first corpus is assigned a lower weighting than the candidate answer from the second corpus. Likely answers are selected from the candidate answers based on the weighting and returned to the requestor.

First claim

Opening claim text (preview).

What is claimed is: 1. An information handling system that serves as a question and answering (QA) system, the system comprising: one or more processors; a memory coupled to at least one of the processors; a network adapter that connects the information handling system to a computer network; and a set of instructions stored in the memory and executed by at least one of the processors, wherein the set of instructions perform actions of: ingesting a first set of sources into a first corpus; ingesting a second set of sources into a second corpus, wherein the second set of sources are a subset of the first set of sources, and wherein the second set of sources include updates to the first set of sources; receiving a question from a requestor; identifying a plurality of candidate answers to the question using one or more sources included in the first and second corpuses, wherein a weighting is associated with each of the candidate answers; determining whether the first and second corpuses have an overlapping source from which two or more of the candidate answers were identified; in response to determining the overlapping source, assigning a first candidate answer from the overlapping source in the first corpus with a lower weighting than a second candidate answer from the overlapping source in the second corpus; selecting one or more likely answers from the plurality of candidate answers, wherein the selecting is based on the weighting associated with the respective candidate answers; and returning the selected likely answers to the requestor. 2. The information handling system of claim 1 wherein the actions further comprise: ingesting a first set of passages from the first set of sources into the first corpus; selecting a second set of passages from the second set of sources, wherein the selection is based on each of the second set of passages being an update to at least one of the first set of passages; and ingesting the selected second set of passages into the second corpus. 3. The information handling system of claim 2 wherein the ingestion into the first corpus is performed on a first ingestion cycle and the ingestion into the second corpus is performed on a second ingestion cycle, wherein the first ingestion cycle occurs less frequently than the second ingestion cycle, and wherein the actions further comprise: including unselected passages from the second set of sources in a data store; and ingesting the unselected passages into the first corpus during a next first ingestion cycle. 4. The information handling system of claim 3 wherein the actions further comprise: ingesting the second set of passages into the first corpus during the next first ingestion cycle; and clearing the second set of passages from the second corpus after ingestion of the second set of passages into the first corpus. 5. The information handling system of claim 1 wherein the actions further comprise: associating an input date corresponding to each of the first and second set of sources, wherein the assigning of the lower weighting to the overlapping source in the first source is also based on the input date of the overlapping source in the first corpus being earlier than the input date of the overlapping source in the second corpus. 6. The information handling system of claim 1 wherein the actions further comprise: ingesting, into the first corpus, a first set of one or more passages from a first source selected from the first set of sources; and identifying a second set of one or more passages from a second source selected from the second set of sources, wherein the identification is based on the second set of passages being an update to the first set of passages and the second source being the same as the first source. 7. The information handling system of claim 6 wherein the first and second sources are selected from the group consisting of a newspaper, a magazine, a journal, and a periodical. 8. A computer program product stored in a computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the information handling system to perform actions comprising: ingesting a first set of sources into a first corpus; ingesting a second set of sources into a second corpus, wherein the second set of sources are a subset of the first set of sources, and wherein the second set of sources include updates to the first set of sources; receiving a question from a requestor; identifying a plurality of candidate answers to the question using one or more sources included in the first and second corpuses, wherein a weighting is associated with each of the candidate answers; determining whether the first and second corpuses have an overlapping source from which two or more of the candidate answers were identified; in response to determining the overlapping source, assigning a first candidate answer from the overlapping source in the first corpus with a lower weighting than a second candidate answer from the overlapping source in the second corpus; selecting one or more likely answers from the plurality of candidate answers, wherein the selecting is based on the weighting associated with the respective candidate answers; and returning the selected likely answers to the requestor. 9. The computer program product of claim 8 wherein the actions further comprise: ingesting a first set of passages from the first set of sources into the first corpus; selecting a second set of passages from the second set of sources, wherein the selection is based on each of the second set of passages being an update to at least one of the first set of passages; and ingesting the selected second set of passages into the second corpus. 10. The computer program product of claim 9 wherein the ingestion into the first corpus is performed on a first ingestion cycle and the ingestion into the second corpus is performed on a second ingestion cycle, wherein the first ingestion cycle occurs less frequently than the second ingestion cycle, and wherein the actions further comprise: including unselected passages from the second set of sources in a data store; and ingesting the unselected passages into the first corpus during a next first ingestion cycle. 11. The computer program product of claim 10 wherein the actions further comprise: ingesting the second set of passages into the first corpus during the next first ingestion cycle; and clearing the second set of passages from the second corpus after ingestion of the second set of passages into the first corpus. 12. The computer program product of claim 8 wherein the actions further comprise: associating an input date corresponding to each of the first and second set of sources, wherein the assigning of the lower weighting to the overlapping source in the first source is also based on the input date of the overlapping source in the first corpus being earlier than the input date of the overlapping source in the second corpus. 13. The computer program product of claim 8 wherein the actions further comprise: ingesting, into the first corpus, a first set of one or more passages from a first source selected from the first set of sources; and identifying a second set of one or more passages from a second source selected from the second set of sources, wherein the identification is based on the second set of passages being an update to the first set of passages and the second source being the same as the first source.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9690862B2 cover?
An approach is provided for updating corpora in a Question and Answer (QA) system. Ingestion of a first set of sources into a first corpus and a second set of sources into a second corpus with the second set of sources including updates to the first set of sources. A question is received. The system identifies candidate answers to the question using the sources included in the corpuses. Each ca…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/23. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).