Delayed incremental and adaptive provisioning of wireless services
US-2016197777-A1 · Jul 7, 2016 · US
US10490183B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10490183-B2 |
| Application number | US-201815922495-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 15, 2018 |
| Priority date | Nov 22, 2017 |
| Publication date | Nov 26, 2019 |
| Grant date | Nov 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for automated speech recognition (ASR) are described. A user can upload an audio file to a storage location. The user then provides the ASR service with a reference to the audio file. An ASR engine analyzes the audio file, using an acoustic model to divide the audio data into words, and a language model to identify the words spoken in the audio file. The acoustic model can be trained using audio sentence data, enabling the transcription service to accurately transcribe lengthy audio data. The results are punctuated and normalized, and the resulting transcript is returned to the user.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by a provider network, a request to perform automated speech recognition (ASR), the request including a reference to an audio file; authorizing the request with an authorization service using account data associated with the request; provisioning a decoder instance to generate a transcript of the audio file, the decoder instance including an ASR engine, an acoustic model, and a language model, the decoder instance deployed to a private network in the provider network; providing, using the reference, a copy of the audio file to the decoder instance; generating a transcript of the audio file by the ASR engine using the acoustic model, the language model, and the copy of the audio file, wherein the acoustic model is trained using previously processed data; monitoring an output location in the provider network to determine whether the transcript is complete; deprovisioning the decoder instance after determining that the transcript is complete; and returning a reference to the transcript of the audio file in the output location. 2. The computer-implemented method of claim 1 , further comprising: receiving, by the provider network, a second request to perform ASR, the request including a second reference to a second audio file; provisioning a second decoder instance to generate a transcript of the second audio file; and deprovisioning the second decoder instance after determining that the transcript is complete. 3. The computer-implemented method of claim 1 , wherein authorizing the request with an authorization service using account data associated with the request comprises: determining that the account data has been white listed; and determining that a number of pending requests for the account data does not exceed a threshold. 4. A computer-implemented method comprising: receiving, by a provider network, a request to perform automated speech recognition (ASR), the request including a reference to an audio file; retrieving the audio file using the reference; provisioning an instance deployed to a private network in the provider network to generate the transcript of the audio file; generating a transcript of the audio file by an ASR engine using an acoustic model, a language model, and the audio file, the ASR engine and language model implemented by the instance, wherein the acoustic model is trained using previously processed data; monitoring an output location in the provider network to determine whether the transcript is complete; deprovisioning the instance upon determining that the transcript is complete; and returning the transcript of the audio file. 5. The computer-implemented method of claim 4 , further comprising: dividing the audio file into a plurality of words using the acoustic model. 6. The computer-implemented method of claim 5 , wherein the acoustic model is trained using the previously processed audio data to identify inaccuracies in the acoustic model as a function of one or more of genres, acoustic properties, or customer-specific language. 7. The computer-implemented method of claim 4 , further comprising: sending the request to a control plane, the control plane orchestrating a plurality of requests; and executing a workflow, by the control plane, to perform ASR on the request. 8. The computer-implemented method of claim 4 , wherein the language model is one of a plurality of language models selected based on metadata received with the request. 9. The computer-implemented method of claim 4 , further comprising: authorizing the request with an authorization service to determine that a number of pending requests associated with an account associated with the request does not exceed a threshold. 10. The computer-implemented method of claim 4 , wherein returning the transcript of the audio file further comprises: uploading the transcript to a storage service provider; and returning a reference to the transcript in the storage service provider. 11. The computer-implemented method of claim 4 , wherein the instance is implemented as a container deployed in the private network. 12. A system comprising: a provider network implemented by a first one or more electronic devices; and an automated speech recognition (ASR) service implemented by a second one or more electronic devices in the provider network, the ASR service including instructions that upon execution cause the ASR service to: receive a request to perform automated speech recognition (ASR), the request including a reference to an audio file; retrieve the audio file using the reference; provision an instance deployed to a private network in the provider network to generate the transcript of the audio file; generate a transcript of the audio file by an ASR engine using an acoustic model, a language model, and the audio file, the ASR engine and language model implemented by the instance, wherein the acoustic model is trained using previously processed data; monitor an output location in the provider network to determine whether the transcript is complete; deprovision the instance upon determining that the transcript is complete; and return the transcript of the audio file. 13. The system of claim 12 , wherein the instructions, upon execution, further cause the ASR service to: divide the audio file into a plurality of words using the acoustic model. 14. The system of claim 13 , wherein wherein the acoustic model is trained using the previously processed audio data to identify inaccuracies in the acoustic model as a function of one or more of genres, acoustic properties, or customer-specific language. 15. The system of claim 12 , wherein the instructions, upon execution, further cause the ASR service to: send the request to a control plane, the control plane orchestrating a plurality of requests; and execute a workflow, by the control plane, to perform ASR on the request. 16. The system of claim 12 , wherein the language model is one of a plurality of language models selected based on metadata received with the request. 17. The system of claim 12 , wherein the instructions, upon execution, further cause the ASR service to: authorize the request with an authorization service to determine that a number of pending requests associated with an account associated with the request does not exceed a threshold. 18. The system of claim 12 , wherein the instructions, to return the transcript of the audio file, when executed further causes the ASR service to: upload the transcript to a storage service provider; and return a reference to the transcript in the storage service provider.
Training · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
using artificial neural networks · CPC title
using metadata automatically derived from the content · CPC title
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.