Post-processing for identifying nonsense passages in a question answering system

US10169328B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10169328-B2
Application numberUS-201615152747-A
CountryUS
Kind codeB2
Filing dateMay 12, 2016
Priority dateMay 12, 2016
Publication dateJan 1, 2019
Grant dateJan 1, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is provided in a data processing system for identifying nonsense passages. The mechanism annotates an input passage with linguistic features to form an annotated passage. The mechanism counts a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts. The mechanism determines a value for a metric based on the set of feature counts and compares the value for the metric to a predetermined model threshold. The mechanism identifies whether the input passage is a nonsense passage based on a result of the comparison.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system, for identifying nonsense passages, the method comprising: annotating, by an annotator in a nonsense identification component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage; counting, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts; determining, by the metric counters component, a value for a metric based on the set of feature counts; comparing, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold; determining, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison; responsive to the filter component determining the given evidence passage is a nonsense passage, sending, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and preventing the input passage from proceeding in the natural language processing pipeline; and responsive to the filter component not determining that the input passage is a nonsense passage, passing, by the filter component, the input passage to the natural language processing pipeline. 2. The method of claim 1 , wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features. 3. The method of claim 1 , wherein the metric comprises a ratio of a number of instances of a first part -of -speech to a number of instances of a second part-of-speech in the input passage. 4. The method of claim 1 , wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system. 5. The method of claim wherein the metric and the predetermined model threshold arc defined in a policy data structure. 6. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program comprises a natural language processing pipeline configured to execute on a data processing system to cause the data processing system to process natural language, wherein the computer readable program comprises: an annotator in a nonsense identification component with the natural language processing pipeline configured to annotate an input passage with linguistic features to form an annotated passage; a metric counters component in the nonsense identification component configured to count a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts and determine a value for a metric based on the set of feature counts; a comparator component of the nonsense identification component configured to compare the value for the metric to a predetermined model threshold; and a filter component of the nonsense identification component configured to determine whether the input passage is a nonsense passage based on a result of the comparison; wherein the filter component is configured to send the given evidence passage to a semi-structured data pipeline and to prevent the given evidence passage from proceeding in the natural language processing pipeline responsive to the filter component determining the given evidence passage is a nonsense passage; and wherein the filter component is configured to pass the given evidence passage to the natural language processing pipeline responsive to the filter component not determining that the given evidence passage is a nonsense passage. 7. The computer program product of claim 6 , wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features. 8. The computer program product of claim 6 , wherein the metric comprises a ratio of a number of instances of a first part-of-speech to a number of instances of a second part-of-speech in the input passage. 9. The computer program product of claim 6 , wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system. 10. The computer program product of claim 6 , wherein the metric and the predetermined model threshold are defined in a policy data structure. 11. An apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: annotate, by an annotator in a nonsense identification, component within a natural language processing pipeline configured to execute in the data processing system, an input passage with linguistic features to form an annotated passage; count, by metric counters component in the nonsense identification component, a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts; determine, by the metric counters component of the nonsense identification component, value for a metric based on the set of feature counts; compare, by a comparator component of the nonsense identification component, the value for the metric to a predetermined model threshold; determine, by a filter component of the nonsense identification component, whether the input passage is a nonsense passage based on a result of the comparison; responsive to the filter component determining the given evidence passage is a nonsense passage, send, by the filter component of the nonsense identification component, the input passage to a semi-structured data pipeline configured to execute in the data processing system and prevent the input passage from proceeding in the natural language processing pipeline; and responsive to the filter component not determining that the input passage is a nonsense passage, pass, by the filter component, the input passage to the natural language processing pipeline. 12. The apparatus of claim 11 , wherein annotating the input passage comprises annotating the input passage for linguistic part-of-speech features. 13. The apparatus of claim 11 , wherein the metric comprises a ratio of a number of instances of a first part-of-speech to a number of instances of a second part-of-speech in the input passage. 14. The apparatus of claim 11 , wherein the input passage is a candidate evidence passage for a candidate answer in a question answering system. 15. The apparatus of claim 11 , wherein the metric and the predetermined model threshold are defined in a policy data structure.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10169328B2 cover?
A mechanism is provided in a data processing system for identifying nonsense passages. The mechanism annotates an input passage with linguistic features to form an annotated passage. The mechanism counts a number of instances of each type of linguistic feature in the annotated passage to form a set of feature counts. The mechanism determines a value for a metric based on the set of feature coun…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).