Who is the assignee on this patent?

Williams Jason, Alonso Tirso, Hollister Barbara B, and 2 more

What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for crowd-sourced data labeling

US9536517B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9536517-B2
Application number	US-201113300087-A
Country	US
Kind code	B2
Filing date	Nov 18, 2011
Priority date	Nov 18, 2011
Publication date	Jan 3, 2017
Grant date	Jan 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-readable storage devices for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: requesting, via a processor, a respective transcription response associated with transcribing input speech, the respective transcription response being from each of a plurality of networked entities, wherein at least one entity of the plurality of networked entities comprises an automatic labeler that transcribes the input speech as its respective transcription response and at least one entity of the plurality of networked entities comprises a human crowd worker who transcribes the input speech as its respective transcription response; receiving, from each of the plurality of networked entities, the respective transcription response and, for a transcription response received from the human crowd worker, a number of times the respective human crowd worker listened to the input speech to provide the respective transcription response; determining a maximum number of transcription responses to receive from the plurality of networked entities; calculating, via the processor and using a regression model, an accuracy threshold for the transcription responses, wherein the accuracy threshold: (1) requires a number of matching responses among the transcription responses; (2) is based on a time of day when each of the transcription responses is received; and (3) is based on a number of times the respective human crowd worker listened to the input speech; incrementally receiving the transcription responses from the plurality of networked entities until one of the accuracy threshold is reached and the maximum number of transcription responses is received; and generating, via the processor, an output response to the input speech from the number of matching transcription responses, wherein the output response is a recognition candidate or a transcription of the input speech. 2. The method of claim 1 , wherein two of the transcription responses are automatic speech recognition output. 3. The method of claim 1 , further comprising training an automatic speech recognition engine using the output response. 4. The method of claim 1 , wherein the accuracy threshold is further based on one of a content, a size, a label, a duration, a location of a plurality of workers associated with the plurality of networked entities, an identity of the plurality of workers, an attribute, a confidence score, a difficulty, and a diversity. 5. The method of claim 1 , wherein the accuracy threshold is further based on a probability of correctness. 6. The method of claim 3 , wherein the accuracy threshold comprises n matching responses, and wherein n is one of less than the maximum number of transcription responses and equal to the maximum number of transcription responses. 7. A system comprising: a processor configured to perform automatic speech recognition; and a computer-readable storage device having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: requesting a respective transcription response associated with transcribing input speech, the respective transcription response being from each of a plurality of networked entities, wherein at least one entity of the plurality of networked entities comprises an automatic labeler that transcribes the input speech as its respective transcription response and at least one entity of the plurality of networked entities comprises a human crowd worker who transcribes the input speech as its respective transcription response; receiving, from each of the plurality of networked entities, the respective transcription response, and, for a transcription response received from the human crowd worker, a number of times the respective human crowd worker listened to the input speech to provide the respective transcription response; determining a maximum number of transcription responses to receive from the plurality of networked entities; calculating, via the processor and using a regression model, an accuracy threshold for the transcription responses, wherein the accuracy threshold: (1) requires a number of matching responses among the transcription responses; (2) is based on a time of day when each of the transcription responses is received; and (3) is based on a number of times the respective human crowd worker listened to the input speech; incrementally receiving the transcription responses from the plurality of networked entities until one of the accuracy threshold is reached and the maximum number of transcription responses is received, and generating, via the processor, an output response to the input speech from the number of matching transcription responses, wherein the output response is a recognition candidate or a transcription of the input speech. 8. The system of claim 7 , wherein the output response is used to train a video analysis algorithm. 9. The system of claim 7 , wherein the maximum number of transcription responses is based on a difficulty associated with the transcription of the utterance. 10. A computer-readable storage device having instructions stored which, when executed by a computing device configured to perform automatic speech recognition, cause the computing device to perform operations comprising: requesting, a respective transcription response associated with transcribing input speech, the respective transcription response being from each of a plurality of networked entities, wherein at least one entity of the plurality of networked entities comprises an automatic labeler that transcribes the input speech as its respective transcription response and at least one entity of the plurality of networked entities comprises a human crowd worker who transcribes the input speech as its respective transcription response; receiving, from each of the plurality of networked entities, the respective transcription response, and, for a transcription response received from the human crowd worker, a number of times the respective human crowd worker listened to the input speech to provide the respective transcription response; determining a maximum number of transcription responses to receive from the plurality of networked entities; calculating, via the processor and using a regression model, an accuracy threshold for the transcription responses, wherein the accuracy threshold: (1) requires a number of matching responses among the transcription responses; (2) is based on a time of day when each of the transcription responses is received; and (3) is based on a number of times the respective human crowd worker listened to the input speech; incrementally receiving the transcription responses from the plurality of networked entities until one of the accuracy threshold is reached and the maximum number of transcription responses is received, and generating, via the processor, an output response to the input speech from the number of matching transcription responses, wherein the output response is a recognition candidate or a transcription of the input speech. 11. The computer-readable storage device of claim 10 , wherein the output response is used to train a machine translation system. 12. The computer-readable storage device of claim 10 , wherein the accuracy threshold comprises n matching responses, and wherein n is one of less than the maximum number of transcription responses and equal to the maximum number of transcription responses.

Assignees

Inventors

Classifications

G06F16/683
using metadata automatically derived from the content · CPC title
G10L2015/0638
Interactive procedures · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/063Primary
Training · CPC title
G06N20/00Primary
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 48427769

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9536517B2 cover?: Systems, methods, and computer-readable storage devices for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until one of an accuracy threshold is reached and m responses are received, wherein the accuracy thresh…
Who is the assignee on this patent?: Williams Jason, Alonso Tirso, Hollister Barbara B, and 2 more
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and method for making user generated audio content on the spoken web navigable by community tagging

Speech recognition wake-up of a handheld portable electronic device

Systems, methods, and apparatus for generating an audio-visual presentation using characteristics of audio, visual and symbolic media objects

System and methods for generation and control of story animation

Frequently asked questions