What technology area does this patent fall under?

Primary CPC classification G06F16/215. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for the determining annotator performance in the distributed annotation of source data

US9898701B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9898701-B2
Application number	US-201615166617-A
Country	US
Kind code	B2
Filing date	May 27, 2016
Priority date	Jun 22, 2012
Publication date	Feb 20, 2018
Grant date	Feb 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for determining annotator performance in the distributed annotation of source data in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a method for clustering annotators includes obtaining a set of source data, determining a training data set representative of the set of source data, obtaining sets of annotations from a set of annotators for a portion of the training data set, for each annotator determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set and determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set, and grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for clustering annotators via a distributed data annotation process, comprising: obtaining a set of source data using a distributed data annotation server system, where a piece of source data in the set of source data comprises at least one identifying feature and the source data comprises image data; determining a training data set representative of the set of source data using the distributed data annotation server system, where at least one piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes at least one feature contained in the piece of source data and a correct label associated with at least one feature; obtaining at least one set of annotations from a plurality of annotators for a portion of the training data set using the distributed data annotation server system, where an annotation identifies one or more features within a piece of source data in the training data set; for each annotator: determining annotator recall metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and determining annotator precision metadata based on the set of annotations provided by the annotator for the training data set using the distributed data annotation server system, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and grouping the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata using the distributed data annotation server system. 2. The method of claim 1 , further comprising generating an annotation task comprising a portion of the set of source data using the distributed data annotation server system, where the annotation task directs an annotator to annotate one or more features within the set of source data. 3. The method of claim 2 , wherein the annotation tasks involve a new set of source data and are targeted toward one or more annotator groups. 4. The method of claim 1 , further comprising measuring the time taken by an annotator to provide an annotation within the sets of annotations using the distributed data annotation server system. 5. The method of claim 1 , wherein the obtained sets of annotations are clustered into annotation clusters based on the features within the piece of source data identified by the annotations using the distributed data annotation server system. 6. The method of claim 5 , wherein: the annotation clusters comprise annotations that are within a distance threshold from each other within the image data. 7. The method of claim 5 , wherein the annotation clusters comprise annotations that are within a distance threshold from the ground truth for the feature identified by the annotations. 8. The method of claim 5 , further comprising: determining an error rate for each annotator based on the annotation clusters using the distributed data annotation server system; and grouping the annotators into annotator groups based on the determined error rate for the annotators using the distributed data annotation server system. 9. A distributed data annotation server system, comprising: a processor; and a memory configured to store a data annotation application; wherein the data annotation application configures the processor to: obtain a set of source data, where a piece of source data in the set of source data comprises at least one identifying feature; determine a training data set representative of the set of source data, where at least one piece of source data in the training data set comprises source data metadata describing the ground truth for the piece of source data, where the ground truth for a piece of source data describes at least one feature contained in the piece of source data and a correct label associated with at least one feature; obtain at least one set of annotations from a set of annotators for a portion of the training data set, where an annotation identifies one or more features within a piece of source data in the training data set; for each annotator: determine annotator recall metadata based on the set of annotations provided by the annotator for the training data set, where the annotator recall metadata comprises a measure of the number of features within a piece of source data identified with a label in the set of annotations by the annotator; and determine annotator precision metadata based on the set of annotations provided by the annotator for the training data set, where the annotator precision metadata comprises a measure of the number of correct annotations associated with each piece of source data based on the ground truth for each piece of source data; and group the annotators into annotator groups based on the annotator recall metadata and the annotator precision metadata. 10. The system of claim 9 , wherein the data annotation application further configures the processor to generate an annotation task comprising a portion of the set of source data, where the annotation task configures an annotator to annotate one or more features within the set of source data. 11. The system of claim 10 , wherein the annotation tasks are targeted toward one or more annotator groups. 12. The system of claim 9 , wherein the data annotation application further configures the processor to measure the time taken by an annotator to provide an annotation within the sets of annotations. 13. The system of claim 9 , wherein the data annotation application further configures the processor to: calculate a reward based on the annotator recall metadata and the annotator precision metadata; and provide the reward to an annotator for providing one or more annotations. 14. The system of claim 13 , wherein the processor is further configured to group annotators into annotator groups based on the calculated reward. 15. The system of claim 9 , wherein the processor is configured to cluster the obtained sets of annotations into annotation clusters based on the features within the piece of source data identified by the annotations. 16. The system of claim 15 , wherein: the set source data comprises image data; and the annotation clusters comprise annotations that are within a distance threshold from each other within the image data. 17. The system of claim 15 , wherein the annotation clusters comprise annotations that are within a distance threshold from the ground truth for the feature identified by the annotations. 18. The system of claim 15 , wherein the data annotation application further configures the processor to: determine an error rate for each annotator based on the annotation clusters; and group the annotators into annotator groups based on the determined error rate for the annotators.

Assignees

California Inst Of Techn

Inventors

Classifications

G06N20/00
Machine learning · CPC title
G06F16/215Primary
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
G06F18/41
Interactive pattern learning with a human teacher · CPC title
G06F18/2185
the supervisor being an automated module, e.g. intelligent oracle · CPC title
G06F40/169
Annotation, e.g. comment data or footnotes · CPC title

Patent family

Related publications grouped by family.

View patent family 49775286

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9898701B2 cover?: Systems and methods for determining annotator performance in the distributed annotation of source data in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a method for clustering annotators includes obtaining a set of source data, determining a training data set representative of the set of source data, obtaining sets of annotations from a set of annota…
Who is the assignee on this patent?: California Inst Of Techn
What technology area does this patent fall under?: Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).