Automated correction of natural language processing systems

US9535894B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9535894-B2
Application numberUS-201514696677-A
CountryUS
Kind codeB2
Filing dateApr 27, 2015
Priority dateApr 27, 2015
Publication dateJan 3, 2017
Grant dateJan 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Machine logic that automatically detects natural language processing (NLP) system annotation errors and correspondingly updates NLP annotators to prevent future erroneous annotations by performing the following steps: (i) determining that a first annotation error has occurred in an annotation of a corpus by the natural language processing system; (ii) generating a candidate set of annotation correction actions, where each annotation correction action of the set is adapted to prevent an occurrence of an error similar to the first annotation error by the natural language processing system; (iii) selecting an annotation correction action from the candidate set of annotation correction actions, based, at least in part, on a set of annotation correction confidence characteristics; and (iv) automatically applying the selected annotation correction action to the natural language processing system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: causing, by one or more processors, a first natural language processing (NLP) annotator of an NLP system to annotate a corpus, thereby producing an annotated corpus that includes a first set of annotation(s); causing, by one or more processors, a second NLP annotator of the NLP system to annotate the annotated corpus that includes the first set of annotation(s), thereby producing a second set of annotation(s) that annotate the annotated corpus that includes the first set of annotation(s); determining, by one or more processors, based, at least in part, on the second set of annotation(s), that a first annotation error has occurred in the annotation of the corpus by the first NLP annotator; identifying, by one or more processors, a cause of the first annotation error; generating, by one or more processors, a candidate set of annotation correction actions, where each annotation correction action of the set is directed to the identified cause and is adapted to prevent an occurrence of an error similar to the first annotation error by the first NLP annotator; selecting, by one or more processors, an annotation correction action from the candidate set of annotation correction actions, based, at least in part, on a set of annotation correction confidence characteristics and based, at least in part, on an impact analysis performed against one or more ground truth annotations, the ground truth annotations having been performed by humans; automatically applying, by one or more processors, the selected annotation correction action to the first NLP annotator; and causing, by one or more processors, the first NLP annotator to annotate a new corpus, thereby producing new annotation(s) based on the applied annotation correction action. 2. The method of claim 1 , wherein determining that the first annotation error has occurred is based, at least in part, on metrics collected during the annotation of the corpus by the first NLP annotator. 3. The method of claim 1 , wherein the identified cause of the first annotation error is a missing item. 4. The method of claim 3 , wherein the missing item is a contextual trigger. 5. The method of claim 1 , wherein the annotation correction confidence characteristics include a degree of harmony/discord of a data point. 6. The method of claim 1 , wherein the selected annotation correction action includes adding a set of word(s) to an inclusion list, or adding a set of word(s) to an exception list. 7. The method of claim 1 , where in the selected annotation correction action includes updating a natural language processing rule. 8. A computer program product comprising a computer readable storage medium having stored thereon: program instructions programmed to cause a first natural language processing (NLP) annotator of an NLP system to annotate a corpus, thereby producing an annotated corpus that includes a first set of annotation(s); program instructions programmed to cause a second NLP annotator of the NLP system to annotate the annotated corpus that includes the first set of annotation(s), thereby producing a second set of annotation(s) that annotate the annotated corpus that includes the first set of annotation(s); program instructions programmed to determine, based, at least in part, on the second set of annotation(s), that a first annotation error has occurred in the annotation of the corpus by the first NLP annotator; program instructions programmed to identify a cause of the first annotation error; program instructions programmed to generate a candidate set of annotation correction actions, where each annotation correction action of the set is directed to the identified cause and is adapted to prevent an occurrence of an error similar to the first annotation error by the first NLP annotator; program instructions programmed to select an annotation correction action from the candidate set of annotation correction actions, based, at least in part, on a set of annotation correction confidence characteristics and based, at least in part, on an impact analysis performed against one or more ground truth annotations, the ground truth annotations having been performed by humans; and program instructions programmed to automatically apply the selected annotation correction action to the first NLP annotator; and program instructions programmed to cause the first NLP annotator to annotate a new corpus, thereby producing new annotation(s) based on the applied annotation correction action. 9. The computer program product of claim 8 , wherein determining that the first annotation error has occurred is based, at least in part, on metrics collected during the annotation of the corpus by the first NLP annotator. 10. The computer program product of claim 8 , wherein the identified cause of the first annotation error is a missing item in a dictionary. 11. The computer program product of claim 8 , where in the selected annotation correction action includes updating a natural language processing rule. 12. A computer system comprising: a processor(s) set; and a computer readable storage medium; wherein: the processor set is structured, located, connected and/or programmed to run program instructions stored on the computer readable storage medium; and the program instructions include: program instructions programmed to cause a first natural language processing (NLP) annotator of an NLP system to annotate a corpus, thereby producing an annotated corpus that includes a first set of annotation(s); program instructions programmed to cause a second NLP annotator of the NLP system to annotate the annotated corpus that includes the first set of annotation(s), thereby producing a second set of annotation(s) that annotate the annotated corpus that includes the first set of annotation(s); program instructions programmed to determine, based, at least in part, on the second set of annotation(s), that a first annotation error has occurred in the annotation of the corpus by the first NLP annotator; program instructions programmed to identify a cause of the first annotation error; program instructions programmed to generate a candidate set of annotation correction actions, where each annotation correction action of the set is directed to the identified cause and is adapted to prevent an occurrence of an error similar to the first annotation error by the first NLP annotator; program instructions programmed to select an annotation correction action from the candidate set of annotation correction actions, based, at least in part, on a set of annotation correction confidence characteristics and based, at least in part, on an impact analysis performed against one or more ground truth annotations, the ground truth annotations having been performed by humans; and program instructions programmed to automatically apply the selected annotation correction action to the first NLP annotator; and program instructions programmed to cause the first NLP annotator to annotate a new corpus, thereby producing new annotation(s) based on the applied annotation correction action. 13. The computer system of claim 12 , wherein determining that the first annotation error has occurred is based, at least in part, on metrics collected during the annotation of the corpus by the first NLP annotator. 14. The computer system of claim 12 , wherein: the identified cause of the first annotation error is a missing item in a dictionary. 15. The computer system of claim 12 , wherein the selected annotation correction action includes updating a natural language processing rule.

Assignees

Inventors

Classifications

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Dictionaries · CPC title

  • Recognition of textual entities · CPC title

  • G06F40/169Primary

    Annotation, e.g. comment data or footnotes · CPC title

  • Automatic justification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9535894B2 cover?
Machine logic that automatically detects natural language processing (NLP) system annotation errors and correspondingly updates NLP annotators to prevent future erroneous annotations by performing the following steps: (i) determining that a first annotation error has occurred in an annotation of a corpus by the natural language processing system; (ii) generating a candidate set of annotation co…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/169. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).