Systems and methods for the determining annotator performance in the distributed annotation of source data
US-9898701-B2 · Feb 20, 2018 · US
US9928278B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9928278-B2 |
| Application number | US-201414198873-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 6, 2014 |
| Priority date | Oct 12, 2011 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for distributed data annotation in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a distributed data annotation server system includes a storage device configured to store source data, one or more annotators, annotation tasks and a processor, wherein a distributed data annotation application configures the processor to receive source data including one or more pieces of source data, select one or more annotators, create one or more annotation tasks for the selected annotators and source data, request one or more annotations for the source data using the annotation tasks, receive annotations, determine source data metadata for at least one piece of source data using the received annotations, generate annotator metadata for at least one annotator using the received annotations and the source data, and estimate the ground truth for the source data using the source data metadata and the annotator metadata.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a memory configured to store source data, at least one annotator, annotation tasks, and a distributed data annotation application; and a processor; wherein the distributed data annotation application configures the processor to: receive a set of source data, where the source data comprises at least one piece of source data; select at least one annotator for at least one piece of source data; create at least one annotation task for the selected annotators and the at least one piece of source data; request at least one annotations for the at least one piece of source data using the created annotation tasks; receive annotations for the at least one piece of source data; and generate annotator metadata for at least one annotator using an iterative maximum a posteriori estimation based on the received annotations and at least one piece of source data, wherein the annotator metadata describes annotator characteristics; determine source data metadata for at least one piece of source data using the received annotations, where the source data metadata comprises source data characteristics; and estimate a ground truth for at least one piece of source data using the source data metadata and the annotator metadata by iteratively: updating source data metadata for at least one piece of source data based on at least the previously determined source data metadata and annotator metadata; updating annotator metadata for at least one annotator based on at least the previously determined source data metadata and annotator metadata; and estimating the ground truth for at least one piece of source data using the updated source data metadata and the updated annotator metadata when a termination condition occurs. 2. The system of claim 1 , wherein the iterative maximum a posteriori estimation is selected from the group consisting of gradient ascent, gradient descent, and estimation-maximization. 3. The system of claim 1 , wherein the distributed data annotation application further configures the processor to determine a confidence threshold value regarding the ground truth of at least one piece of source data. 4. The system of claim 1 , wherein the distributed data annotation application further configures the processor to update source data metadata for at least one piece of source data using the received annotations and the annotator metadata. 5. The system of claim 4 , wherein the source data metadata includes a measure of the difficulty of describing the source data. 6. The system of claim 5 , wherein the source data metadata further comprises source data characteristics selected from the group consisting of annotations applied to the piece of source data, features of the source data, and annotators who have previously annotated the piece of source data. 7. The system of claim 1 , wherein the distributed data annotation application further configures the processor to update annotator metadata for at least one annotator using the received annotations and the source data metadata. 8. The system of claim 7 , wherein the annotator metadata includes a measure of competence of the annotator. 9. The system of claim 8 , wherein the annotator metadata further comprises annotator characteristics selected from a group consisting of an expertise of the annotator, bias of the annotator regarding mislabeling of source data, annotations previously provided by the annotator, and references to source data previously annotated by the annotator. 10. The system of claim 1 , wherein selecting at least one annotator for at least one piece of source data comprises selecting at least one annotator based on at least one source data characteristic in the source data metadata. 11. The system of claim 1 , wherein the distributed data annotation application further configures the processor to determine a cost for performing the annotation task. 12. The system of claim 1 , wherein: the annotation task is a human intelligence task; and the distributed data annotation application further configures the processor to request at least one annotations by submitting at least one annotation task to a human intelligence task marketplace. 13. The system of claim 1 , wherein: the annotation task is a machine intelligence task; and the distributed data annotation application further configures the processor to request at least one annotations by submitting at least one annotation task to an annotation device configured to perform machine intelligence tasks. 14. The system of claim 1 , wherein selecting at least one annotator for at least one piece of source data comprises selecting at least one annotator based on at least one annotator characteristic in the annotator metadata describing the at least one annotator. 15. A method, comprising: receiving a set of source data using a distributed data annotation server system, where the set of source data comprises at least one pieces of source data; selecting at least one annotator for at least one piece of source data using the distributed data annotation server system; creating at least one annotation tasks for the selected annotators and the at least one piece of source data using the distributed data annotation server system; requesting at least one annotations for the at least one piece of source data using the created annotation tasks and the distributed data annotation server system; receiving annotations for the at least one piece of source data using the distributed data annotation server system; and generating annotator metadata for at least one annotator using an iterative maximum a posteriori estimation based on the received annotations, at least one piece of source data, and the distributed data annotation server system, wherein the annotator metadata describes annotator characteristics; determining source data metadata for at least one piece of source data based on the received annotations using the distributed data annotation server system, where the source data metadata comprises source data characteristics; and estimate a ground truth for at least one piece of source data based on the source data metadata and the annotator metadata using the distributed data annotation server system by iteratively: updating source data metadata for at least one piece of source data based on at least the previously determined source data metadata and annotator metadata; updating annotator metadata for at least one annotator based on at least the previously determined source data metadata and annotator metadata; and estimating the ground truth for at least one piece of source data using the updated source data metadata and the updated annotator metadata when a termination condition occurs. 16. The method of claim 15 , further comprising updating source data metadata for at least one piece of source data using the received annotations, the annotator metadata, and the distributed data annotation server system. 17. The method of claim 15 , further comprising updating annotator metadata for at least one annotator using the received annotations, the source data metadata, and the distributed data annotation server system. 18. The method of claim 15 , further comprising determining a confidence threshold value regarding the ground truth of at least one piece of source data using the distributed data annotation server system. 19. The method of claim 15 , further comprising determining a cost for performing the annotation task using the distributed data annotation server system.
Collaborative creation, e.g. joint development of products or services · CPC title
Interactive pattern learning with a human teacher · CPC title
the supervisor being an automated module, e.g. intelligent oracle · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.