Process for identifying completion of domain adaptation dictionary activities

US10872205B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10872205-B2
Application numberUS-201916376338-A
CountryUS
Kind codeB2
Filing dateApr 5, 2019
Priority dateJan 6, 2017
Publication dateDec 22, 2020
Grant dateDec 22, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus comprising a memory and a processor configured for semi-autonomous natural language processing domain adaptation related activities. The processor coupled to the memory and configured to identify a corpus of documents of an evaluation domain and generate a first lexicon based on the corpus of documents of the evaluation domain, and determine a threshold that indicates a sufficiency of domain adaptation of the evaluation domain based at least in part on the first lexicon. The processor is further configured to identify a corpus of documents of a client domain, generate a second lexicon based on the corpus of documents of the client domain, determine a metric associated with the corpus of documents of the client domain and the second lexicon, and determine that domain adaptation of the client domain is complete when the metric exceeds the threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying, by a processor, a corpus of documents from within a domain; determining, by the processor, an evaluation question for use with a question answering system to determine an answer to the evaluation question based on content of the domain; partitioning the corpus of documents into a plurality of sub-corpora; generating a lexicon for each of the respective sub-corpora; generating a plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora; evaluating the evaluation question using the plurality of test systems to determine a plurality of evaluation results each corresponding uniquely to one of the plurality of test systems; and determining a threshold for sufficiency of domain adaptation based on at least one of the evaluation results. 2. The computer-implemented method of claim 1 , wherein generating the plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora comprises combining one or more lexicons of each of the respective sub-corpora to generate the plurality of test systems. 3. The computer-implemented method of claim 1 , wherein evaluating the evaluation question using the test systems to determine the plurality of evaluation results each corresponding uniquely to one of the plurality of test systems comprises determining that a first evaluation result of the plurality of evaluation results corresponding to a first test system of the plurality of test systems is greater than or equal to a second evaluation result of the plurality of evaluation results that corresponds to a second test system of the plurality of test systems. 4. The computer-implemented method of claim 3 , wherein determining the threshold for sufficiency of domain adaptation based on at least one of the evaluation results comprises determining a ratio of domain terms associated with the first evaluation result and the first test system which are not associated with a third evaluation result of a third test system that is prior to the first test system to a total number of the domain terms associated with the first evaluation result and the first test system. 5. The computer-implemented method of claim 4 , wherein the threshold for sufficiency of domain adaptation is determined according to T =  L n ⋂ ⋃ i n - 1 ⁢ L i   L n  , wherein T is the threshold for sufficiency of domain adaptation, wherein L n is a lexicon containing the domain terms associated with the first evaluation result and the first test system, wherein ∩ denotes an intersection operation, wherein ∪ denotes a union operation, and wherein i is an index beginning at 1. 6. The computer-implemented method of claim 5 , wherein the lexicon containing the domain terms associated with the first evaluation result and the first test system contains domain terms associated with one or more tests systems prior to the first test system in the plurality of test systems. 7. The computer-implemented method of claim 1 , wherein before identifying, by the processor, the corpus of documents from within the domain, the method further comprises: receiving a first question for processing, by the processor, according to natural language processing; and performing, by the processor, first natural language processing to determine a first answer to the first question, wherein after determining the threshold for sufficiency of domain adaptation based on the at least one of the evaluation results, the method further comprises: performing, by the processor, domain adaptation of a client domain to determine a second lexicon of the client domain; receiving, by the processor, a second question for processing according to natural language processing; and performing, by the processor, second natural language processing to determine a second answer to the second question based at least in part on the second lexicon, and wherein the second answer has a greater degree of accuracy with respect to the second question than a degree of accuracy of the first answer with respect to the first question. 8. An apparatus comprising: a memory comprising computer-readable instructions; and a processor coupled to the memory and configured to execute the instructions, which causes the processor to be configured to: identify a corpus of documents from within a domain; determine an evaluation question for use with a question answering system to determine an answer to the evaluation question based on content of the domain; partition the corpus of documents into a plurality of sub-corpora; generate a lexicon for each of the respective sub-corpora; generate a plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora; evaluate the evaluation question using the plurality of test systems to determine a plurality of evaluation results each corresponding uniquely to one of the plurality of test systems; and determine a threshold for sufficiency of domain adaptation based on at least one of the evaluation results. 9. The apparatus of claim 8 , wherein generating the plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora comprises combining one or more lexicons of each of the respective sub-corpora to generate the plurality of test systems. 10. The apparatus of claim 8 , wherein evaluating the evaluation question using the test systems to determine the plurality of evaluation results each corresponding uniquely to one of the plurality of test systems comprises determining that a first evaluation result of the plurality of evaluation results corresponding to a first test system of the plurality of test systems is greater than or equal to a second evaluation result of the plurality of evaluation results that corresponds to a second test system of the plurality of test systems. 11. The apparatus of claim 10 , wherein determining the threshold for sufficiency of domain adaptation based on at least one of the evaluation results comprises determining a ratio of domain terms associated with the first evaluation result and the first test system which are not associated with a third evaluation result of a third test system that is prior to the first test system to a total number of the domain terms associated with the first evaluation result and the first test system. 12. The apparatus of claim 11 , wherein the threshold for sufficiency of domain adaptation is determined according to T =

Assignees

Inventors

Classifications

  • Document management systems · CPC title

  • G06F40/242Primary

    Dictionaries · CPC title

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Semantic analysis · CPC title

  • using natural language analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10872205B2 cover?
An apparatus comprising a memory and a processor configured for semi-autonomous natural language processing domain adaptation related activities. The processor coupled to the memory and configured to identify a corpus of documents of an evaluation domain and generate a first lexicon based on the corpus of documents of the evaluation domain, and determine a threshold that indicates a sufficiency…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/242. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 22 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).