Fully managed and continuously trained automatic speech recognition service

US10490183B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10490183-B2
Application numberUS-201815922495-A
CountryUS
Kind codeB2
Filing dateMar 15, 2018
Priority dateNov 22, 2017
Publication dateNov 26, 2019
Grant dateNov 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for automated speech recognition (ASR) are described. A user can upload an audio file to a storage location. The user then provides the ASR service with a reference to the audio file. An ASR engine analyzes the audio file, using an acoustic model to divide the audio data into words, and a language model to identify the words spoken in the audio file. The acoustic model can be trained using audio sentence data, enabling the transcription service to accurately transcribe lengthy audio data. The results are punctuated and normalized, and the resulting transcript is returned to the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a provider network, a request to perform automated speech recognition (ASR), the request including a reference to an audio file; authorizing the request with an authorization service using account data associated with the request; provisioning a decoder instance to generate a transcript of the audio file, the decoder instance including an ASR engine, an acoustic model, and a language model, the decoder instance deployed to a private network in the provider network; providing, using the reference, a copy of the audio file to the decoder instance; generating a transcript of the audio file by the ASR engine using the acoustic model, the language model, and the copy of the audio file, wherein the acoustic model is trained using previously processed data; monitoring an output location in the provider network to determine whether the transcript is complete; deprovisioning the decoder instance after determining that the transcript is complete; and returning a reference to the transcript of the audio file in the output location. 2. The computer-implemented method of claim 1 , further comprising: receiving, by the provider network, a second request to perform ASR, the request including a second reference to a second audio file; provisioning a second decoder instance to generate a transcript of the second audio file; and deprovisioning the second decoder instance after determining that the transcript is complete. 3. The computer-implemented method of claim 1 , wherein authorizing the request with an authorization service using account data associated with the request comprises: determining that the account data has been white listed; and determining that a number of pending requests for the account data does not exceed a threshold. 4. A computer-implemented method comprising: receiving, by a provider network, a request to perform automated speech recognition (ASR), the request including a reference to an audio file; retrieving the audio file using the reference; provisioning an instance deployed to a private network in the provider network to generate the transcript of the audio file; generating a transcript of the audio file by an ASR engine using an acoustic model, a language model, and the audio file, the ASR engine and language model implemented by the instance, wherein the acoustic model is trained using previously processed data; monitoring an output location in the provider network to determine whether the transcript is complete; deprovisioning the instance upon determining that the transcript is complete; and returning the transcript of the audio file. 5. The computer-implemented method of claim 4 , further comprising: dividing the audio file into a plurality of words using the acoustic model. 6. The computer-implemented method of claim 5 , wherein the acoustic model is trained using the previously processed audio data to identify inaccuracies in the acoustic model as a function of one or more of genres, acoustic properties, or customer-specific language. 7. The computer-implemented method of claim 4 , further comprising: sending the request to a control plane, the control plane orchestrating a plurality of requests; and executing a workflow, by the control plane, to perform ASR on the request. 8. The computer-implemented method of claim 4 , wherein the language model is one of a plurality of language models selected based on metadata received with the request. 9. The computer-implemented method of claim 4 , further comprising: authorizing the request with an authorization service to determine that a number of pending requests associated with an account associated with the request does not exceed a threshold. 10. The computer-implemented method of claim 4 , wherein returning the transcript of the audio file further comprises: uploading the transcript to a storage service provider; and returning a reference to the transcript in the storage service provider. 11. The computer-implemented method of claim 4 , wherein the instance is implemented as a container deployed in the private network. 12. A system comprising: a provider network implemented by a first one or more electronic devices; and an automated speech recognition (ASR) service implemented by a second one or more electronic devices in the provider network, the ASR service including instructions that upon execution cause the ASR service to: receive a request to perform automated speech recognition (ASR), the request including a reference to an audio file; retrieve the audio file using the reference; provision an instance deployed to a private network in the provider network to generate the transcript of the audio file; generate a transcript of the audio file by an ASR engine using an acoustic model, a language model, and the audio file, the ASR engine and language model implemented by the instance, wherein the acoustic model is trained using previously processed data; monitor an output location in the provider network to determine whether the transcript is complete; deprovision the instance upon determining that the transcript is complete; and return the transcript of the audio file. 13. The system of claim 12 , wherein the instructions, upon execution, further cause the ASR service to: divide the audio file into a plurality of words using the acoustic model. 14. The system of claim 13 , wherein wherein the acoustic model is trained using the previously processed audio data to identify inaccuracies in the acoustic model as a function of one or more of genres, acoustic properties, or customer-specific language. 15. The system of claim 12 , wherein the instructions, upon execution, further cause the ASR service to: send the request to a control plane, the control plane orchestrating a plurality of requests; and execute a workflow, by the control plane, to perform ASR on the request. 16. The system of claim 12 , wherein the language model is one of a plurality of language models selected based on metadata received with the request. 17. The system of claim 12 , wherein the instructions, upon execution, further cause the ASR service to: authorize the request with an authorization service to determine that a number of pending requests associated with an account associated with the request does not exceed a threshold. 18. The system of claim 12 , wherein the instructions, to return the transcript of the audio file, when executed further causes the ASR service to: upload the transcript to a storage service provider; and return a reference to the transcript in the storage service provider.

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • using artificial neural networks · CPC title

  • using metadata automatically derived from the content · CPC title

  • Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10490183B2 cover?
Techniques for automated speech recognition (ASR) are described. A user can upload an audio file to a storage location. The user then provides the ASR service with a reference to the audio file. An ASR engine analyzes the audio file, using an acoustic model to divide the audio data into words, and a language model to identify the words spoken in the audio file. The acoustic model can be trained…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).