Managing documents in question answering systems
US-2015347587-A1 · Dec 3, 2015 · US
US9697099B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9697099-B2 |
| Application number | US-201414295913-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 4, 2014 |
| Priority date | Jun 4, 2014 |
| Publication date | Jul 4, 2017 |
| Grant date | Jul 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided in a data processing system for partial ingestion of content. The mechanism receives new content to be ingested into a corpus of information. The mechanism applies a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness. The plurality of sub-pipelines include all annotation engines of an ingestion pipeline. Each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects. The mechanism provides access to the one or more intermediate output objects.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system, for partial ingestion of content, the method comprising: identifying a set of features that contribute to generating candidate answers for input questions; identifying a set of annotation engines in the ingestion pipeline that contribute to each of the set of features and at least one annotation engine on which the one or more annotation engines depend; generating a sub-pipeline for each set of annotation engines to form a plurality of sub-pipelines of annotation engines; receiving new content to be ingested into a corpus of information; applying the plurality of sub-pipelines of annotation engines against the new content in order of effectiveness, wherein the plurality of sub-pipelines include all annotation engines of an ingestion pipeline and wherein each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects; and providing access to the one or more intermediate output objects, wherein the one or more intermediate output objects represent the partially ingested new content. 2. The method of claim 1 , further comprising responsive to applying all of the plurality of sub-pipelines, storing fully ingested new content in the corpus of information. 3. The method of claim 1 , wherein identifying the set of features comprises examining feature score weighting from a trained model. 4. The method of claim 1 , wherein generating a sub-pipeline for each set of annotation engines comprises: determining performance or runtime cost of each annotation engine in the ingestion pipeline; determining an efficiency score for each set of annotation engines based on the performance or runtime costs of the set of annotation engines and a feature score weighting of a corresponding feature; and ranking the sub-pipelines by efficiency score. 5. The method of claim 4 , wherein generating a sub-pipeline for each set of annotation engines comprises for each sub-pipeline, removing annotation engines that are present in a next higher ranked sub-pipeline. 6. The method of claim 5 , wherein generating a sub-pipeline for each set of annotation engines comprises combining a given sub-pipeline with a next higher or lower ranked sub-pipeline responsive to the given sub-pipeline having fewer than a predetermined number of annotation engines. 7. The method of claim 4 , wherein applying the plurality of sub-pipelines of annotation engines against the new content comprises applying the plurality of sub-pipelines according to the ranking by efficiency score. 8. The method of claim 1 , wherein each sub-pipeline within the plurality of sub-pipelines deletes one or more intermediate output objects generated by a previous sub-pipeline. 9. The method of claim 1 , wherein providing access to the one or more intermediate output objects comprises mapping the new content being ingested to the one or more intermediate output objects. 10. The method of claim 9 , further comprising: responsive to receiving an input question in a question answering system, running a question answering pipeline of software engines against available partially and fully ingested content according to the mapping; generating one or more candidate answers for the input question; ranking the one or more candidate answers; and presenting the ranked one or more candidate answers. 11. The method of claim 10 , further comprising marking the given candidate answer as being based on partially ingested content responsive to determining evidence for the given candidate answer is partially ingested. 12. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: identify a set of features that contribute to generating candidate answers for input questions; identify a set of annotation engines in the ingestion pipeline that contribute to each of the set of features and at least one annotation engine on which the one or more annotation engines depend; generate a sub-pipeline for each set of annotation engines to form a plurality of sub-pipelines of annotation engines; receive new content to be ingested into a corpus of information; apply a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness, wherein the plurality of sub-pipelines include all annotation engines of an ingestion pipeline and wherein each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects; and provide access to the one or more intermediate output objects, wherein the one or more intermediate output objects represent the partially ingested new content. 13. The computer program product of claim 12 , wherein generating a sub-pipeline for each set of annotation engines comprises: determining performance or runtime cost of each annotation engine in the ingestion pipeline; determining an efficiency score for each set of annotation engines based on the performance or runtime costs of the set of annotation engines and a feature score weighting of a corresponding feature; and ranking the sub-pipelines by efficiency score. 14. The computer program product of claim 13 , wherein applying the plurality of sub-pipelines of annotation engines against the new content comprises applying the plurality of sub-pipelines according to the ranking by efficiency score. 15. The computer program product of claim 12 , wherein providing access to the one or more intermediate output objects comprises mapping the new content being ingested to the one or more intermediate output objects, wherein the computer readable program further causes the computing device to: responsive to receiving an input question in a question answering system, run a question answering pipeline of software engines against available partially and fully ingested content according to the mapping; generate one or more candidate answers for the input question; rank the one or more candidate answers; and present the ranked one or more candidate answers. 16. The computer program product of claim 15 , wherein the computer readable program further causes the computing device to mark the given candidate answer as being based on partially ingested content responsive to determining evidence for the given candidate answer is partially ingested. 17. An apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: identify a set of features that contribute to generating candidate answers for input questions; identify a set of annotation engines in the ingestion pipeline that contribute to each of the set of features and at least one annotation engine on which the one or more annotation engines depend; generate a sub-pipeline for each set of annotation engines to form a plurality of sub-pipelines of annotation engines; receive new content to be ingested into a corpus of information; apply a plurality of sub-pipelines of annotation engines against the new content in order of effectiveness, wherein the plurality of sub-pipelines include all annotation engines of an ingestion pipeline and wherein each sub-pipeline within the plurality of sub-pipelines generates one or more intermediate output objects; and provide access to the one or more intermediate output objects, wherein the one or more intermediate output objects represent the partially inges
Selection or weighting of terms for indexing · CPC title
using natural language analysis · CPC title
where the computing system component is a software system · CPC title
using probabilistic model · CPC title
Natural language query formulation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.