Real-time or frequent ingestion by running pipeline in order of effectiveness

US9697099B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9697099-B2
Application numberUS-201414295913-A
CountryUS
Kind codeB2
Filing dateJun 4, 2014
Priority dateJun 4, 2014
Publication dateJul 4, 2017
Grant dateJul 4, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is provided in a data processing system for partial ingestion of content. The mechanism receives new content to be ingested into a corpus of information. The mechanism applies a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness. The plurality of sub-pipelines include all annotation engines of an ingestion pipeline. Each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects. The mechanism provides access to the one or more intermediate output objects.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system, for partial ingestion of content, the method comprising: identifying a set of features that contribute to generating candidate answers for input questions; identifying a set of annotation engines in the ingestion pipeline that contribute to each of the set of features and at least one annotation engine on which the one or more annotation engines depend; generating a sub-pipeline for each set of annotation engines to form a plurality of sub-pipelines of annotation engines; receiving new content to be ingested into a corpus of information; applying the plurality of sub-pipelines of annotation engines against the new content in order of effectiveness, wherein the plurality of sub-pipelines include all annotation engines of an ingestion pipeline and wherein each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects; and providing access to the one or more intermediate output objects, wherein the one or more intermediate output objects represent the partially ingested new content. 2. The method of claim 1 , further comprising responsive to applying all of the plurality of sub-pipelines, storing fully ingested new content in the corpus of information. 3. The method of claim 1 , wherein identifying the set of features comprises examining feature score weighting from a trained model. 4. The method of claim 1 , wherein generating a sub-pipeline for each set of annotation engines comprises: determining performance or runtime cost of each annotation engine in the ingestion pipeline; determining an efficiency score for each set of annotation engines based on the performance or runtime costs of the set of annotation engines and a feature score weighting of a corresponding feature; and ranking the sub-pipelines by efficiency score. 5. The method of claim 4 , wherein generating a sub-pipeline for each set of annotation engines comprises for each sub-pipeline, removing annotation engines that are present in a next higher ranked sub-pipeline. 6. The method of claim 5 , wherein generating a sub-pipeline for each set of annotation engines comprises combining a given sub-pipeline with a next higher or lower ranked sub-pipeline responsive to the given sub-pipeline having fewer than a predetermined number of annotation engines. 7. The method of claim 4 , wherein applying the plurality of sub-pipelines of annotation engines against the new content comprises applying the plurality of sub-pipelines according to the ranking by efficiency score. 8. The method of claim 1 , wherein each sub-pipeline within the plurality of sub-pipelines deletes one or more intermediate output objects generated by a previous sub-pipeline. 9. The method of claim 1 , wherein providing access to the one or more intermediate output objects comprises mapping the new content being ingested to the one or more intermediate output objects. 10. The method of claim 9 , further comprising: responsive to receiving an input question in a question answering system, running a question answering pipeline of software engines against available partially and fully ingested content according to the mapping; generating one or more candidate answers for the input question; ranking the one or more candidate answers; and presenting the ranked one or more candidate answers. 11. The method of claim 10 , further comprising marking the given candidate answer as being based on partially ingested content responsive to determining evidence for the given candidate answer is partially ingested. 12. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: identify a set of features that contribute to generating candidate answers for input questions; identify a set of annotation engines in the ingestion pipeline that contribute to each of the set of features and at least one annotation engine on which the one or more annotation engines depend; generate a sub-pipeline for each set of annotation engines to form a plurality of sub-pipelines of annotation engines; receive new content to be ingested into a corpus of information; apply a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness, wherein the plurality of sub-pipelines include all annotation engines of an ingestion pipeline and wherein each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects; and provide access to the one or more intermediate output objects, wherein the one or more intermediate output objects represent the partially ingested new content. 13. The computer program product of claim 12 , wherein generating a sub-pipeline for each set of annotation engines comprises: determining performance or runtime cost of each annotation engine in the ingestion pipeline; determining an efficiency score for each set of annotation engines based on the performance or runtime costs of the set of annotation engines and a feature score weighting of a corresponding feature; and ranking the sub-pipelines by efficiency score. 14. The computer program product of claim 13 , wherein applying the plurality of sub-pipelines of annotation engines against the new content comprises applying the plurality of sub-pipelines according to the ranking by efficiency score. 15. The computer program product of claim 12 , wherein providing access to the one or more intermediate output objects comprises mapping the new content being ingested to the one or more intermediate output objects, wherein the computer readable program further causes the computing device to: responsive to receiving an input question in a question answering system, run a question answering pipeline of software engines against available partially and fully ingested content according to the mapping; generate one or more candidate answers for the input question; rank the one or more candidate answers; and present the ranked one or more candidate answers. 16. The computer program product of claim 15 , wherein the computer readable program further causes the computing device to mark the given candidate answer as being based on partially ingested content responsive to determining evidence for the given candidate answer is partially ingested. 17. An apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: identify a set of features that contribute to generating candidate answers for input questions; identify a set of annotation engines in the ingestion pipeline that contribute to each of the set of features and at least one annotation engine on which the one or more annotation engines depend; generate a sub-pipeline for each set of annotation engines to form a plurality of sub-pipelines of annotation engines; receive new content to be ingested into a corpus of information; apply a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness, wherein the plurality of sub-pipelines include all annotation engines of an ingestion pipeline and wherein each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects; and provide access to the one or more intermediate output objects, wherein the one or more intermediate output objects represent the partially inges

Assignees

Inventors

Classifications

  • Selection or weighting of terms for indexing · CPC title

  • using natural language analysis · CPC title

  • G06F11/302Primary

    where the computing system component is a software system · CPC title

  • using probabilistic model · CPC title

  • Natural language query formulation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9697099B2 cover?
A mechanism is provided in a data processing system for partial ingestion of content. The mechanism receives new content to be ingested into a corpus of information. The mechanism applies a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness. The plurality of sub-pipelines include all annotation engines of an ingestion pipeline. Each sub-pipeline w…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/302. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).