Techniques to provide a standard interface to a speech recognition platform

US10089988B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10089988-B2
Application numberUS-201715718596-A
CountryUS
Kind codeB2
Filing dateSep 28, 2017
Priority dateJun 19, 2009
Publication dateOct 2, 2018
Grant dateOct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (API). The speech recognition request may also include additional parameters. The technique further includes performing speech recognition on the audio according to the request and any specified parameters; and returning a speech recognition result as a hypertext protocol (HTTP) response. Other embodiments are described and claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of an application program interface (API), comprising: processing at least one parameter for performing speech recognition, the at least one parameter corresponding to a lack of speech data; accepting a speech recognition request comprising an audio input; performing speech recognition on the audio input according to the at least one parameter; and without performing speech recognition on all of the audio input and upon observing the length of silence in the audio input, returning speech recognition data as hypertext protocol (HTTP) responses comprising a status attribute indicating an overall success or failure of speech recognition on the audio input. 2. The method of claim 1 , wherein the speech recognition request is formatted as at least one of: an HTTP query string; an HTTP POST entity body; and at least one HTTP POST entity body part. 3. The method of claim 1 , wherein the speech recognition request comprises parameters including at least one of: an in-line grammar; an in-line audio input; a URI link to an audio input; a timeout; a finalize timeout; a confidence level; a sensitivity level; a speed level; an accuracy level; a speaker parameter; and a recognizer specific parameter. 4. The method of claim 1 , wherein the request is an inline streaming request and the audio input is in a raw format. 5. The method of claim 1 , comprising returning the speech recognition results as a streamed result. 6. The method of claim 1 , wherein the request is a streamed request. 7. An apparatus, comprising: at least one processing unit; and a speech recognition service, executed on the at least one processing unit, implementing an application program interface (API) operative to accept a speech recognition request comprising an audio input and parameters for performing speech recognition on the audio input, and without performing speech recognition on all of the audio input, returning a plurality of speech recognition results as hypertext protocol (HTTP) responses comprising a status attribute indicating an overall success or failure of speech recognition on the audio input. 8. The apparatus of claim 7 , wherein the API comprises a wrapper for building a speech recognition request. 9. The apparatus of claim 7 , wherein the speech recognition service is operative to perform at least one of: receive the speech recognition request as a streamed request; and return the recognition results as a streamed result. 10. The apparatus of claim 7 , wherein the speech recognition request comprises at least one of: an in-line grammar; an in-line audio input; a URI link to an audio input; a timeout; a finalize timeout; a confidence level; a sensitivity level; a speed level; an accuracy level; a speaker parameter; and a recognizer specific parameter. 11. The apparatus of claim 7 , wherein at least one of an in-line grammar or a grammar referred to by a URI link that includes a reference to another grammar. 12. The apparatus of claim 7 , wherein the speech recognition request is a streamed speech recognition request. 13. The apparatus of claim 7 , wherein the speech recognition service further operative to return the plurality of speech recognition results upon observing a length of silence in the audio input. 14. The apparatus of claim 7 , wherein the speech recognition service further operative to convert a first portion of the audio input to a first recognition result associated with the speech recognition request, to return the first recognition result after silence is observed for a specified duration of the audio input, to convert a second portion of the audio input to a second recognition result associated with the speech recognition request, and to return the second recognition result, wherein the first and second recognition results are returned as a hypertext protocol (HTTP) responses comprising a status flag indicating an overall success or failure of the recognition result. 15. A mobile computing device comprising at least one processing unit and a memory coupled to the at least one processing unit, the memory having at least one component comprising: a component operative to communicate a speech recognition request comprising audio input and speech recognition parameters including a lack of speech data to observe and receiving a plurality of speech recognition results returned as hypertext protocol (HTTP) responses comprising a status attribute indicating an overall success or failure of speech recognition on the audio input. 16. The device of claim 15 , wherein the component further operative to receive the HTTP responses in an XML document. 17. The device of claim 15 , wherein the speech recognition parameters comprise at least one of: an in-line grammar; an in-line audio input; a URI link to an audio input; a timeout; a finalize timeout; a confidence level; a sensitivity level; a speed level; an accuracy level; a speaker parameter; and a recognizer specific parameter. 18. The device of claim 15 , wherein the request as an inline streaming request and the audio input is in a raw format. 19. The device of claim 15 , wherein the component further operative to process the received speech recognition results as a streamed result. 20. The device of claim 15 , wherein the request comprises at least one of an HTTP query string; an HTTP POST entity body; and at least one HTTP POST entity body part.

Assignees

Inventors

Classifications

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • G10L15/30Primary

    Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Announcement of recognition results · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10089988B2 cover?
Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (API). The speech recognition request may also include additional parameters. The technique further includes performing speech …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).