Systems and methods for distributed data annotation

US9928278B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9928278-B2
Application numberUS-201414198873-A
CountryUS
Kind codeB2
Filing dateMar 6, 2014
Priority dateOct 12, 2011
Publication dateMar 27, 2018
Grant dateMar 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for distributed data annotation in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a distributed data annotation server system includes a storage device configured to store source data, one or more annotators, annotation tasks and a processor, wherein a distributed data annotation application configures the processor to receive source data including one or more pieces of source data, select one or more annotators, create one or more annotation tasks for the selected annotators and source data, request one or more annotations for the source data using the annotation tasks, receive annotations, determine source data metadata for at least one piece of source data using the received annotations, generate annotator metadata for at least one annotator using the received annotations and the source data, and estimate the ground truth for the source data using the source data metadata and the annotator metadata.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a memory configured to store source data, at least one annotator, annotation tasks, and a distributed data annotation application; and a processor; wherein the distributed data annotation application configures the processor to: receive a set of source data, where the source data comprises at least one piece of source data; select at least one annotator for at least one piece of source data; create at least one annotation task for the selected annotators and the at least one piece of source data; request at least one annotations for the at least one piece of source data using the created annotation tasks; receive annotations for the at least one piece of source data; and generate annotator metadata for at least one annotator using an iterative maximum a posteriori estimation based on the received annotations and at least one piece of source data, wherein the annotator metadata describes annotator characteristics; determine source data metadata for at least one piece of source data using the received annotations, where the source data metadata comprises source data characteristics; and estimate a ground truth for at least one piece of source data using the source data metadata and the annotator metadata by iteratively: updating source data metadata for at least one piece of source data based on at least the previously determined source data metadata and annotator metadata; updating annotator metadata for at least one annotator based on at least the previously determined source data metadata and annotator metadata; and estimating the ground truth for at least one piece of source data using the updated source data metadata and the updated annotator metadata when a termination condition occurs. 2. The system of claim 1 , wherein the iterative maximum a posteriori estimation is selected from the group consisting of gradient ascent, gradient descent, and estimation-maximization. 3. The system of claim 1 , wherein the distributed data annotation application further configures the processor to determine a confidence threshold value regarding the ground truth of at least one piece of source data. 4. The system of claim 1 , wherein the distributed data annotation application further configures the processor to update source data metadata for at least one piece of source data using the received annotations and the annotator metadata. 5. The system of claim 4 , wherein the source data metadata includes a measure of the difficulty of describing the source data. 6. The system of claim 5 , wherein the source data metadata further comprises source data characteristics selected from the group consisting of annotations applied to the piece of source data, features of the source data, and annotators who have previously annotated the piece of source data. 7. The system of claim 1 , wherein the distributed data annotation application further configures the processor to update annotator metadata for at least one annotator using the received annotations and the source data metadata. 8. The system of claim 7 , wherein the annotator metadata includes a measure of competence of the annotator. 9. The system of claim 8 , wherein the annotator metadata further comprises annotator characteristics selected from a group consisting of an expertise of the annotator, bias of the annotator regarding mislabeling of source data, annotations previously provided by the annotator, and references to source data previously annotated by the annotator. 10. The system of claim 1 , wherein selecting at least one annotator for at least one piece of source data comprises selecting at least one annotator based on at least one source data characteristic in the source data metadata. 11. The system of claim 1 , wherein the distributed data annotation application further configures the processor to determine a cost for performing the annotation task. 12. The system of claim 1 , wherein: the annotation task is a human intelligence task; and the distributed data annotation application further configures the processor to request at least one annotations by submitting at least one annotation task to a human intelligence task marketplace. 13. The system of claim 1 , wherein: the annotation task is a machine intelligence task; and the distributed data annotation application further configures the processor to request at least one annotations by submitting at least one annotation task to an annotation device configured to perform machine intelligence tasks. 14. The system of claim 1 , wherein selecting at least one annotator for at least one piece of source data comprises selecting at least one annotator based on at least one annotator characteristic in the annotator metadata describing the at least one annotator. 15. A method, comprising: receiving a set of source data using a distributed data annotation server system, where the set of source data comprises at least one pieces of source data; selecting at least one annotator for at least one piece of source data using the distributed data annotation server system; creating at least one annotation tasks for the selected annotators and the at least one piece of source data using the distributed data annotation server system; requesting at least one annotations for the at least one piece of source data using the created annotation tasks and the distributed data annotation server system; receiving annotations for the at least one piece of source data using the distributed data annotation server system; and generating annotator metadata for at least one annotator using an iterative maximum a posteriori estimation based on the received annotations, at least one piece of source data, and the distributed data annotation server system, wherein the annotator metadata describes annotator characteristics; determining source data metadata for at least one piece of source data based on the received annotations using the distributed data annotation server system, where the source data metadata comprises source data characteristics; and estimate a ground truth for at least one piece of source data based on the source data metadata and the annotator metadata using the distributed data annotation server system by iteratively: updating source data metadata for at least one piece of source data based on at least the previously determined source data metadata and annotator metadata; updating annotator metadata for at least one annotator based on at least the previously determined source data metadata and annotator metadata; and estimating the ground truth for at least one piece of source data using the updated source data metadata and the updated annotator metadata when a termination condition occurs. 16. The method of claim 15 , further comprising updating source data metadata for at least one piece of source data using the received annotations, the annotator metadata, and the distributed data annotation server system. 17. The method of claim 15 , further comprising updating annotator metadata for at least one annotator using the received annotations, the source data metadata, and the distributed data annotation server system. 18. The method of claim 15 , further comprising determining a confidence threshold value regarding the ground truth of at least one piece of source data using the distributed data annotation server system. 19. The method of claim 15 , further comprising determining a cost for performing the annotation task using the distributed data annotation server system.

Assignees

Inventors

Classifications

  • G06Q10/101Primary

    Collaborative creation, e.g. joint development of products or services · CPC title

  • Interactive pattern learning with a human teacher · CPC title

  • the supervisor being an automated module, e.g. intelligent oracle · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9928278B2 cover?
Systems and methods for distributed data annotation in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a distributed data annotation server system includes a storage device configured to store source data, one or more annotators, annotation tasks and a processor, wherein a distributed data annotation application configures the processor to receive sour…
Who is the assignee on this patent?
California Inst Of Techn
What technology area does this patent fall under?
Primary CPC classification G06Q10/101. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).