Cognitive system with ingestion of natural language documents with embedded code
US-9606990-B2 · Mar 28, 2017 · US
US10169328B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10169328-B2 |
| Application number | US-201615152747-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 12, 2016 |
| Priority date | May 12, 2016 |
| Publication date | Jan 1, 2019 |
| Grant date | Jan 1, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided in a data processing system for identifying nonsense passages. The mechanism annotates an input passage with linguistic features to form an annotated passage. The mechanism counts a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts. The mechanism determines a value for a metric based on the set of feature counts and compares the value for the metric to a predetermined model threshold. The mechanism identifies whether the input passage is a nonsense passage based on a result of the comparison.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system, for identifying nonsense passages, the method comprising: annotating, by an annotator in a nonsense identification component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage; counting, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts; determining, by the metric counters component, a value for a metric based on the set of feature counts; comparing, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold; determining, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison; responsive to the filter component determining the given evidence passage is a nonsense passage, sending, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and preventing the input passage from proceeding in the natural language processing pipeline; and responsive to the filter component not determining that the input passage is a nonsense passage, passing, by the filter component, the input passage to the natural language processing pipeline. 2. The method of claim 1 , wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features. 3. The method of claim 1 , wherein the metric comprises a ratio of a number of instances of a first part -of -speech to a number of instances of a second part-of-speech in the input passage. 4. The method of claim 1 , wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system. 5. The method of claim wherein the metric and the predetermined model threshold arc defined in a policy data structure. 6. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program comprises a natural language processing pipeline configured to execute on a data processing system to cause the data processing system to process natural language, wherein the computer readable program comprises: an annotator in a nonsense identification component with the natural language processing pipeline configured to annotate an input passage with linguistic features to form an annotated passage; a metric counters component in the nonsense identification component configured to count a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts and determine a value for a metric based on the set of feature counts; a comparator component of the nonsense identification component configured to compare the value for the metric to a predetermined model threshold; and a filter component of the nonsense identification component configured to determine whether the input passage is a nonsense passage based on a result of the comparison; wherein the filter component is configured to send the given evidence passage to a semi-structured data pipeline and to prevent the given evidence passage from proceeding in the natural language processing pipeline responsive to the filter component determining the given evidence passage is a nonsense passage; and wherein the filter component is configured to pass the given evidence passage to the natural language processing pipeline responsive to the filter component not determining that the given evidence passage is a nonsense passage. 7. The computer program product of claim 6 , wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features. 8. The computer program product of claim 6 , wherein the metric comprises a ratio of a number of instances of a first part-of-speech to a number of instances of a second part-of-speech in the input passage. 9. The computer program product of claim 6 , wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system. 10. The computer program product of claim 6 , wherein the metric and the predetermined model threshold are defined in a policy data structure. 11. An apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: annotate, by an annotator in a nonsense identification, component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage; count, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts; determine, by the metric counters component of the nonsense identification component, value for a metric based on the set of feature counts; compare, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold; determine, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison; responsive to the filter component determining the given evidence passage is a nonsense passage, send, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and prevent the input passage from proceeding in the natural language processing pipeline; and responsive to the filter component not determining that the input passage is a nonsense passage, pass, by the filter component, the input passage to the natural language processing pipeline. 12. The apparatus of claim 11 , wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features. 13. The apparatus of claim 11 , wherein the metric comprises a ratio of a number of instances of a first part-of-speech to a number of instances of a second part-of-speech in the input passage. 14. The apparatus of claim 11 , wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system. 15. The apparatus of claim 11 , wherein the metric and the predetermined model threshold are defined in a policy data structure.
Ontology · CPC title
using statistical methods · CPC title
Semantic analysis · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.