What technology area does this patent fall under?

Primary CPC classification G06F40/242. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 22 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Process for identifying completion of domain adaptation dictionary activities

Patent metadata
Field	Value
Publication number	US-10872205-B2
Application number	US-201916376338-A
Country	US
Kind code	B2
Filing date	Apr 5, 2019
Priority date	Jan 6, 2017
Publication date	Dec 22, 2020
Grant date	Dec 22, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus comprising a memory and a processor configured for semi-autonomous natural language processing domain adaptation related activities. The processor coupled to the memory and configured to identify a corpus of documents of an evaluation domain and generate a first lexicon based on the corpus of documents of the evaluation domain, and determine a threshold that indicates a sufficiency of domain adaptation of the evaluation domain based at least in part on the first lexicon. The processor is further configured to identify a corpus of documents of a client domain, generate a second lexicon based on the corpus of documents of the client domain, determine a metric associated with the corpus of documents of the client domain and the second lexicon, and determine that domain adaptation of the client domain is complete when the metric exceeds the threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying, by a processor, a corpus of documents from within a domain; determining, by the processor, an evaluation question for use with a question answering system to determine an answer to the evaluation question based on content of the domain; partitioning the corpus of documents into a plurality of sub-corpora; generating a lexicon for each of the respective sub-corpora; generating a plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora; evaluating the evaluation question using the plurality of test systems to determine a plurality of evaluation results each corresponding uniquely to one of the plurality of test systems; and determining a threshold for sufficiency of domain adaptation based on at least one of the evaluation results. 2. The computer-implemented method of claim 1 , wherein generating the plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora comprises combining one or more lexicons of each of the respective sub-corpora to generate the plurality of test systems. 3. The computer-implemented method of claim 1 , wherein evaluating the evaluation question using the test systems to determine the plurality of evaluation results each corresponding uniquely to one of the plurality of test systems comprises determining that a first evaluation result of the plurality of evaluation results corresponding to a first test system of the plurality of test systems is greater than or equal to a second evaluation result of the plurality of evaluation results that corresponds to a second test system of the plurality of test systems. 4. The computer-implemented method of claim 3 , wherein determining the threshold for sufficiency of domain adaptation based on at least one of the evaluation results comprises determining a ratio of domain terms associated with the first evaluation result and the first test system which are not associated with a third evaluation result of a third test system that is prior to the first test system to a total number of the domain terms associated with the first evaluation result and the first test system. 5. The computer-implemented method of claim 4 , wherein the threshold for sufficiency of domain adaptation is determined according to T =  L n ⋂ ⋃ i n - 1 ⁢ L i   L n  , wherein T is the threshold for sufficiency of domain adaptation, wherein L n is a lexicon containing the domain terms associated with the first evaluation result and the first test system, wherein ∩ denotes an intersection operation, wherein ∪ denotes a union operation, and wherein i is an index beginning at 1. 6. The computer-implemented method of claim 5 , wherein the lexicon containing the domain terms associated with the first evaluation result and the first test system contains domain terms associated with one or more tests systems prior to the first test system in the plurality of test systems. 7. The computer-implemented method of claim 1 , wherein before identifying, by the processor, the corpus of documents from within the domain, the method further comprises: receiving a first question for processing, by the processor, according to natural language processing; and performing, by the processor, first natural language processing to determine a first answer to the first question, wherein after determining the threshold for sufficiency of domain adaptation based on the at least one of the evaluation results, the method further comprises: performing, by the processor, domain adaptation of a client domain to determine a second lexicon of the client domain; receiving, by the processor, a second question for processing according to natural language processing; and performing, by the processor, second natural language processing to determine a second answer to the second question based at least in part on the second lexicon, and wherein the second answer has a greater degree of accuracy with respect to the second question than a degree of accuracy of the first answer with respect to the first question. 8. An apparatus comprising: a memory comprising computer-readable instructions; and a processor coupled to the memory and configured to execute the instructions, which causes the processor to be configured to: identify a corpus of documents from within a domain; determine an evaluation question for use with a question answering system to determine an answer to the evaluation question based on content of the domain; partition the corpus of documents into a plurality of sub-corpora; generate a lexicon for each of the respective sub-corpora; generate a plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora; evaluate the evaluation question using the plurality of test systems to determine a plurality of evaluation results each corresponding uniquely to one of the plurality of test systems; and determine a threshold for sufficiency of domain adaptation based on at least one of the evaluation results. 9. The apparatus of claim 8 , wherein generating the plurality of test systems each corresponding uniquely to one of the plurality of sub-corpora comprises combining one or more lexicons of each of the respective sub-corpora to generate the plurality of test systems. 10. The apparatus of claim 8 , wherein evaluating the evaluation question using the test systems to determine the plurality of evaluation results each corresponding uniquely to one of the plurality of test systems comprises determining that a first evaluation result of the plurality of evaluation results corresponding to a first test system of the plurality of test systems is greater than or equal to a second evaluation result of the plurality of evaluation results that corresponds to a second test system of the plurality of test systems. 11. The apparatus of claim 10 , wherein determining the threshold for sufficiency of domain adaptation based on at least one of the evaluation results comprises determining a ratio of domain terms associated with the first evaluation result and the first test system which are not associated with a third evaluation result of a third test system that is prior to the first test system to a total number of the domain terms associated with the first evaluation result and the first test system. 12. The apparatus of claim 11 , wherein the threshold for sufficiency of domain adaptation is determined according to T =

Assignees

IBM

Inventors

Classifications

G06F16/93
Document management systems · CPC title
G06F40/242Primary
Dictionaries · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F40/30
Semantic analysis · CPC title
G06F16/3344
using natural language analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 62781881

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10872205B2 cover?: An apparatus comprising a memory and a processor configured for semi-autonomous natural language processing domain adaptation related activities. The processor coupled to the memory and configured to identify a corpus of documents of an evaluation domain and generate a first lexicon based on the corpus of documents of the evaluation domain, and determine a threshold that indicates a sufficiency…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/242. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 22 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Hybrid technique for sentiment analysis

Data dictionary with a reduced need for rebuilding

System and method for domain adaptation in question answering

Generating a Superset of Question/Answer Action Paths Based on Dynamically Generated Type Sets

Non-factoid question-answering system and computer program

Frequently asked questions