System and method for building and evaluating automatic speech recognition via an application programmer interface

US9484018B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9484018-B2
Application numberUS-95282910-A
CountryUS
Kind codeB2
Filing dateNov 23, 2010
Priority dateNov 23, 2010
Publication dateNov 1, 2016
Grant dateNov 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client's proprietary feature extraction.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: receiving, at a network-based system configured to generate acoustic models and language models, inputs from a remote client via an application program interface, the inputs comprising: a feature stream of features extracted from speech processed by the remote client using a feature extraction algorithm which operates independent of the network-based system; and a transcription of the speech; generating an acoustic model according to an acoustic feature identified within the feature stream from the features extracted from the speech; generating a language model according to the transcription; and transmitting the acoustic model and the language model to the remote client. 2. The method of claim 1 , wherein the inputs further comprise a set of parameter values describing settings of the feature extraction algorithm. 3. The method of claim 1 , wherein the inputs further comprise a specific task for the acoustic model and the language model. 4. The method of claim 1 , wherein the feature stream is further processed by the remote client using a baseline feature extraction program and a parameter value which sets feature extraction in the baseline feature extraction program. 5. The method of claim 1 , further comprising processing the inputs prior to generation of the acoustic model and the language model according to an intervention from a human expert. 6. The method of claim 1 , wherein the acoustic feature is specific to an individual. 7. The method of claim 1 , wherein generating of the acoustic model and generating of the language model are further performed according to an algorithm for one of adapting the acoustic model, estimating the language model, generating recognizer outputs, and accuracy evaluation. 8. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving, at a network-based system configured to generate acoustic models and language models, inputs from a remote client via an application program interface, the inputs comprising: a feature stream of features extracted from speech processed by the remote client using a feature extraction algorithm which operates independent of the network-based system; and a transcription of the speech; generating an acoustic model according to an acoustic feature identified within the feature stream from the features extracted from the speech; generating a language model according to the transcription; and transmitting the acoustic model and the language model to the remote client. 9. The system of claim 8 , wherein the inputs further comprise a parameter value indicating a specific task for the language model. 10. The system of claim 8 , wherein the acoustic model and the language model are transmitted via a secured connection. 11. The system of claim 10 , wherein the secured connection is encrypted. 12. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising establishing a contractual agreement regarding privacy of the inputs. 13. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising transmitting a log associated with the acoustic model and the language model. 14. The system of claim 13 , wherein the log describes events associated with creation of the acoustic model and the language model. 15. The system of claim 8 , the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising processing the inputs, prior to generation of the acoustic model and the language model, according to an intervention from a human expert. 16. The system of claim 15 , wherein the processing of the inputs is according to an algorithm which estimates the acoustic model and the language model. 17. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving, at a network-based system configured to generate acoustic models and language models, inputs from a remote client via an application program interface, the inputs comprising: a feature stream of features extracted from speech processed by the remote client using a feature extraction algorithm which operates independent of the network-based system; and a transcription of the speech; generating an acoustic model according to an acoustic feature identified within the feature stream from the features extracted from the speech; generating a language model according to the transcription; and transmitting the acoustic model and the language model to the remote client. 18. The computer-readable storage device of claim 17 , wherein the inputs further comprise a parameter value indicating a specific task for the language model. 19. The computer-readable storage device of claim 17 , wherein the inputs further comprise a set of parameter values describing settings of the feature extraction algorithm. 20. The computer-readable storage device of claim 17 , having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising establishing a contractual agreement with the remote client regarding privacy of the inputs.

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • Adaptation · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9484018B2 cover?
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations o…
Who is the assignee on this patent?
Bocchieri Enrico, Dimitriadis Dimitrios, Schroeter Horst J, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).