What technology area does this patent fall under?

Primary CPC classification G06F40/169. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Ranking candidate documents for human annotation task in real-time

US11263272B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11263272-B2
Application number	US-202016856577-A
Country	US
Kind code	B2
Filing date	Apr 23, 2020
Priority date	Apr 23, 2020
Publication date	Mar 1, 2022
Grant date	Mar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system configured to rank and display candidate documents for human annotation task. The system executes instructions to receive a human annotation of a first unannotated document in a list of documents from a document set; update an annotated entities and corresponding entity types set based on the human annotation of the document from the document set; perform auto-mapping of annotated entities to corresponding entity types on a remaining set of documents in the document set based on the updated annotated entities and corresponding entity types set; calculate a score for each document in the remaining set of documents based on the auto-mapping of annotated entities; and update an order of the remaining set of documents being displayed for human annotation based on the calculated score for each document in the remaining set of documents in the document set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for ranking candidate documents for human annotation task, the method comprising: retrieving a document set; displaying a list of documents from the document set for human annotation; and performing a real-time ranking candidate documents for human annotation loop comprising: receiving a human annotation of a first unannotated document in the list of documents from the document set for human annotation, wherein an unannotated document is a document that has not been annotated by a human, wherein the human annotation annotates entities in the first unannotated document with their corresponding entity types, and wherein an entity is a specific noun and an entity type is an entity category for the specific noun; updating an annotated entities and corresponding entity types set based on the human annotation of the document from the document set; performing auto-mapping of annotated entities to corresponding entity types on a remaining set of documents in the document set based on the updated annotated entities and corresponding entity types set; calculating a score for each document in the remaining set of documents in the document set based on the auto-mapping of annotated entities to corresponding entity types on the remaining set of documents in the document set; and updating an order of the remaining set of documents being displayed for human annotation based on the calculated score for each document in the remaining set of documents in the document set. 2. The method of claim 1 , further comprising: pre-annotating documents in the document set using predefined dictionaries; calculating a pre-score for each document in the document set based on the pre-annotations; and wherein displaying the list of documents from the document set for human annotation comprises displaying the list of documents in an order based on the pre-score for each document in the document set. 3. The method of claim 1 , further comprising: training a model using a bulk number of human annotated documents when the bulk number of human annotated documents is completed; evaluating an F-score of the model after training the model using the bulk number of human annotated documents; and repeating the real-time ranking candidate documents for human annotation loop until one of a first condition and a second condition is satisfied, the first condition is that the F-score of the model is higher than a predetermined threshold, and the second condition is that there are no unannotated documents left in the document set. 4. The method of claim 3 , wherein when the first condition is satisfied, the method further comprises using the model to automatically annotate documents for natural language processing (NLP). 5. The method of claim 1 , wherein the score of a remaining document is based on a target distribution of entities that should be annotated for the remaining set of documents in the document set, and on a token variety contained in the remaining document. 6. The method of claim 1 , wherein the score for a document i is calculated using the formula: Score(i)=αΣ m=0 k ((λ m −p setm )p docm )−β Entropy(token), wherein α and β are the coefficients, k is a total number of entity types, p set is annotated entity distribution over all document sets, p doc is a mapped entity distribution in the document i, and Entropy(token) denotes a token entropy of the document i. 7. A system configured to rank and display candidate documents for human annotation task, the system comprising memory for storing instructions, and a processor configured to execute the instructions to: receive a human annotation of a first unannotated document in a list of documents from a document set for human annotation, wherein an unannotated document is a document that has not been annotated by a human, and wherein the human annotation annotates entities in the first unannotated document with their corresponding entity types, and wherein an entity is a specific noun and an entity type is an entity category for the specific noun; update an annotated entities and corresponding entity types set based on the human annotation of the document from the document set; perform auto-mapping of annotated entities to corresponding entity types on a remaining set of documents in the document set based on the updated annotated entities and corresponding entity types set; calculate a score for each document in the remaining set of documents in the document set based on the auto-mapping of annotated entities to corresponding entity types on the remaining set of documents in the document set; and update an order of the remaining set of documents being displayed for human annotation based on the calculated score for each document in the remaining set of documents in the document set. 8. The system of claim 7 , wherein the processor is further configured to execute the instructions to pre-annotate documents in the document set using predefined dictionaries. 9. The system of claim 7 , wherein the score of a remaining document is based on a target distribution of entities that should be annotated for the remaining set of documents in the document set. 10. The system of claim 9 , wherein the score of a remaining document is further based on a token variety contained in the remaining document. 11. The system of claim 7 , wherein the processor is further configured to train a model using a bulk number of human annotated documents when the bulk number of human annotated documents is completed. 12. The system of claim 11 , wherein the processor is further configured to evaluate an F-score of the model against a predetermined threshold after training the model using the bulk number of human annotated documents. 13. The system of claim 12 , wherein the processor is further configured to use the model to automatically annotate documents for natural language processing (NLP) when the F-score of the model satisfies the predetermined threshold. 14. The system of claim 7 , wherein the processor is further configured to calculate the score for a document i using the formula: Score (i)=αΣ m=0 k =((λ−p setm )p docm )−β Entropy(token), wherein α and β are the coefficients, k is a total number of entity types, p set is annotated entity distribution over all document sets, p doc is a mapped entity distribution in the document i, and Entropy(token) denotes a token entropy of the document i. 15. A computer program product for ranking candidate documents for human annotation task, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of a system to cause the system to: receive a human annotation of a first unannotated document in a list of documents from a document set for human annotation, wherein an unannotated document is a document that has not been annotated by a human, and wherein the human annotation annotates entities in the first unannotated document with their corresponding entity types, and wherein an entity is a specific noun and an entity type is an entity category for the specific noun; update an annotated entities and corresponding entity types set based on the human annotation of the document from the document set; perform auto-mapping of annotated entities to corresponding entity types on a remaining set of documents in the document set based on the updated annotated entities and corresponding entity types set; calculate a score for each document in the remaining set of documents in the document set based on the auto-mapping of annotated

Assignees

Inventors

Classifications

G06F40/169Primary
Annotation, e.g. comment data or footnotes · CPC title
G06F16/93Primary
Document management systems · CPC title
G06N20/00
Machine learning · CPC title
G06F40/295
Named entity recognition · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

View patent family 78222680

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11263272B2 cover?: A system configured to rank and display candidate documents for human annotation task. The system executes instructions to receive a human annotation of a first unannotated document in a list of documents from a document set; update an annotated entities and corresponding entity types set based on the human annotation of the document from the document set; perform auto-mapping of annotated enti…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/169. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Rich formatting of annotated clinical documentation, and related methods and apparatus

Annotation mapping

Calculating correlations between annotations

Frequently asked questions