Corrective feedback loop for automated speech recognition

US9384735B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9384735-B2
Application numberUS-201414341054-A
CountryUS
Kind codeB2
Filing dateJul 25, 2014
Priority dateApr 5, 2007
Publication dateJul 5, 2016
Grant dateJul 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a computing device in communication with an electronic data store, the computing device configured to: obtain audio data comprising speech from a client device; receive an identifier of an application from the client device, wherein the application is associated with an initial language model; generate a transcription of the speech using the initial language model; transmit the transcription to the client device for presentation to a user; receive feedback on the transcription from the client device; and based at least in part on the feedback, generate an updated language model, wherein the electronic data store is configured to store at least one of the initial language model and the updated language model. 2. The system of claim 1 , wherein the feedback comprises at least one of an affirmation of the transcription, a disapproval of the transcription, or a correction to the transcription. 3. The system of claim 1 , wherein the computing device is further configured to generate one or more alternate transcriptions of the speech using the initial language model. 4. The system of claim 2 , wherein the computing device is further configured to transmit the one or more alternate transcriptions to the client device. 5. The system of claim 4 , wherein the feedback comprises a selection of an alternate transcription. 6. The system of claim 4 , wherein the one or more alternate transcriptions each have a transcription confidence value that satisfies a threshold. 7. The system of claim 1 , wherein the electronic data store is further configured to store one or more algorithms that, when executed, implement an automatic speech recognition engine. 8. A non-transitory computer-readable medium having stored thereon a computer-executable component configured to execute in one or more processors of a computing device, the computer-executable component being further configured to: receive first audio data comprising first speech; transcribe the first speech using a first language model to generate a first transcription; provide the first transcription to a first client device; receive feedback on the first transcription from the first client device; based at least in part on the feedback on the first transcription, update the first language model; select a second language model; and based at least in part on the feedback on the transcription, update the second language model, wherein the second language model is not used to generate the first transcription. 9. The non-transitory computer-readable medium of claim 8 , wherein: the first audio data comprising speech is associated with a user of the first client device; and the first language model is associated with the user of the first client device. 10. The non-transitory computer-readable medium of claim 8 , wherein the computer-executable component is further configured to: receive second audio data comprising second speech; and transcribe the second speech with the updated first language model to generate a second transcription. 11. The non-transitory computer-readable medium of claim 8 , wherein the first audio data comprising first speech is received from the first client device. 12. The non-transitory computer-readable medium of claim 8 , wherein the first audio data comprising first speech is received from a second client device. 13. The non-transitory computer-readable medium of claim 8 , wherein the feedback comprises at least one of an affirmation of the first transcription, a disapproval of the first transcription, or a correction to the first transcription. 14. A computer-implemented method comprising: under control of one or more computing devices configured with specific computer-executable instructions, receiving audio data comprising speech from a first client device; receiving an identifier of an application from the first client device, wherein a first language model is associated with the application generating speech recognition results from the speech using the first language model; providing the speech recognition results to the first client device; receiving feedback on the speech recognition results from the first client device; and updating the first language model based at least in part on the feedback. 15. The computer-implemented method of claim 14 , wherein the audio data is received from a second client device. 16. The computer-implemented method of claim 14 , wherein the speech recognition results comprise a transcription of the speech. 17. The computer-implemented method of claim 16 , wherein the feedback relates to at least one of a letter of the transcription, a syllable of the transcription, a word of the transcription, a phrase of the transcription, or a sentence of the transcription. 18. The computer-implemented method of claim 16 , further comprising: generating a transcription identifier associated with the transcription; transmitting the identifier to the first client device with the transcription; and receiving the identifier from the first client device with the feedback on the speech recognition results. 19. The computer-implemented method of claim 14 , further comprising generating one or more alternative speech recognition results using the first language model. 20. The computer-implemented method of claim 19 , further comprising providing to the first client device an alternative speech recognition result from the one or more alternative speech recognition results with a confidence value that satisfies a threshold.

Assignees

Inventors

Classifications

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules · CPC title

  • G06F3/0236Primary

    using selection techniques to select from displayed items · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9384735B2 cover?
A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receivin…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0236. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).