Language models using spoken language modeling
US-2024386885-A1 · Nov 21, 2024 · US
US9484018B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9484018-B2 |
| Application number | US-95282910-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 23, 2010 |
| Priority date | Nov 23, 2010 |
| Publication date | Nov 1, 2016 |
| Grant date | Nov 1, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client's proprietary feature extraction.
Opening claim text (preview).
We claim: 1. A method comprising: receiving, at a network-based system configured to generate acoustic models and language models, inputs from a remote client via an application program interface, the inputs comprising: a feature stream of features extracted from speech processed by the remote client using a feature extraction algorithm which operates independent of the network-based system; and a transcription of the speech; generating an acoustic model according to an acoustic feature identified within the feature stream from the features extracted from the speech; generating a language model according to the transcription; and transmitting the acoustic model and the language model to the remote client. 2. The method of claim 1 , wherein the inputs further comprise a set of parameter values describing settings of the feature extraction algorithm. 3. The method of claim 1 , wherein the inputs further comprise a specific task for the acoustic model and the language model. 4. The method of claim 1 , wherein the feature stream is further processed by the remote client using a baseline feature extraction program and a parameter value which sets feature extraction in the baseline feature extraction program. 5. The method of claim 1 , further comprising processing the inputs prior to generation of the acoustic model and the language model according to an intervention from a human expert. 6. The method of claim 1 , wherein the acoustic feature is specific to an individual. 7. The method of claim 1 , wherein generating of the acoustic model and generating of the language model are further performed according to an algorithm for one of adapting the acoustic model, estimating the language model, generating recognizer outputs, and accuracy evaluation. 8. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving, at a network-based system configured to generate acoustic models and language models, inputs from a remote client via an application program interface, the inputs comprising: a feature stream of features extracted from speech processed by the remote client using a feature extraction algorithm which operates independent of the network-based system; and a transcription of the speech; generating an acoustic model according to an acoustic feature identified within the feature stream from the features extracted from the speech; generating a language model according to the transcription; and transmitting the acoustic model and the language model to the remote client. 9. The system of claim 8 , wherein the inputs further comprise a parameter value indicating a specific task for the language model. 10. The system of claim 8 , wherein the acoustic model and the language model are transmitted via a secured connection. 11. The system of claim 10 , wherein the secured connection is encrypted. 12. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising establishing a contractual agreement regarding privacy of the inputs. 13. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising transmitting a log associated with the acoustic model and the language model. 14. The system of claim 13 , wherein the log describes events associated with creation of the acoustic model and the language model. 15. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising processing the inputs, prior to generation of the acoustic model and the language model, according to an intervention from a human expert. 16. The system of claim 15 , wherein the processing of the inputs is according to an algorithm which estimates the acoustic model and the language model. 17. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving, at a network-based system configured to generate acoustic models and language models, inputs from a remote client via an application program interface, the inputs comprising: a feature stream of features extracted from speech processed by the remote client using a feature extraction algorithm which operates independent of the network-based system; and a transcription of the speech; generating an acoustic model according to an acoustic feature identified within the feature stream from the features extracted from the speech; generating a language model according to the transcription; and transmitting the acoustic model and the language model to the remote client. 18. The computer-readable storage device of claim 17 , wherein the inputs further comprise a parameter value indicating a specific task for the language model. 19. The computer-readable storage device of claim 17 , wherein the inputs further comprise a set of parameter values describing settings of the feature extraction algorithm. 20. The computer-readable storage device of claim 17 , having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising establishing a contractual agreement with the remote client regarding privacy of the inputs.
Training · CPC title
Adaptation · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.