System and method for performing dual mode speech recognition

US8972263B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8972263-B2
Application numberUS-201213530101-A
CountryUS
Kind codeB2
Filing dateJun 21, 2012
Priority dateNov 18, 2011
Publication dateMar 3, 2015
Grant dateMar 3, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.

First claim

Opening claim text (preview).

We claim: 1. A method for performing dual mode speech recognition, comprising: receiving a spoken query from a user; processing the spoken query, including sending the spoken query to a local recognition system on a mobile device; transmitting the spoken query to a remote recognition system via a communications link; and setting a latency timer to a preset timeout value; in the event that the spoken query is not recognized by either the local recognition system or the remote recognition system within the latency timer period, choosing the recognition failure as a final result; in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system; choosing the final result as the recognition result associated with the higher recognition score; in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result and associated score from the local recognition system; and choosing the local recognition result as the final result; in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result and associated score from the remote recognition system; and choosing the remote recognition result as the final result; taking action on behalf of the user based on the final result; and in the event that the spoken query is recognized by the remote recognition system within the latency timer period, upon determining that the remote recognition result contains vocabulary information not contained within a client vocabulary maintained within the local recognition system, requesting that the remote recognition system update the client vocabulary. 2. A system for dual mode speech recognition, comprising: a local recognition system housed in a mobile device, including: a communication module configured for communicating with a user and other devices and for receiving a spoken query; a recognition module configured for recognizing and transcribing audio content; a control module; a client vocabulary configured to describe the words or phrases available to the recognition module; and a vocabulary updater module configured for updating the client vocabulary; a remote recognition system housed in a server, including: a recognition engine configured for recognizing and transcribing audio content; a vocabulary download module configured for providing updates to the vocabulary update module; wherein the control module of the local recognition system is configured to: set a latency timer to a preset timeout value; and in the event that the spoken query is not recognized by either the local recognition system or the remote recognition system within the latency timer period, choose recognition failure as a final result. 3. The system of claim 2 , wherein the control module of the local recognition system is further configured to send the spoken query to the recognition module of the local recognition system and the remote recognition system. 4. The system of claim 2 , wherein the control module of the local recognition system is configured to: in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtain a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score; in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result and associated score from the local recognition system; and choosing the local recognition result as the final result; and in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result and associated score from the remote recognition system; and choosing the remote recognition result as the final result. 5. The system of claim 2 , wherein the vocabulary updater module is further configured to remove one or more words from the client vocabulary. 6. The system of claim 2 , wherein one or more words from the client vocabulary are assigned a priority value that indicates the word's importance, and the vocabulary updater module is further configured to remove a word selected from the client vocabulary based at least on the priority assigned the selected word. 7. The system of claim 2 , wherein one or more words from the client vocabulary are assigned a frequency value that indicates how often the word is used and a recency value that indicates when the word was last used, and the client vocabulary updater module is further configured to remove a word selected from the client vocabulary based at least on the frequency value or recency value associated with the selected word. 8. The system of claim 2 , wherein the recognition module of the local recognition system and the recognition module of the remote recognition system are each configured to output: a transcription that is an estimate for the text of what the spoken query said; and a score associated with the respective transcription that measures confidence in the accuracy of the associated transcription. 9. A system for dual mode speech recognition, comprising: a local recognition system housed in a mobile device, including: a communication module configured for communicating with a user and other devices and for receiving a spoken query; a recognition module configured for recognizing and transcribing audio content; a control module; a client vocabulary configured to describe the words or phrases available to the recognition module; and a vocabulary updater module configured for updating the client vocabulary; a remote recognition system housed in a server, including: a recognition engine configured for recognizing and transcribing audio content; a vocabulary download module configured for providing updates to the vocabulary update module; wherein the control module of the local recognition system is configured to: in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within a latency timer period, obtain a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score; in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result and associated score from the local recognition system; and choosing the local recognition result as the final result; and in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result and associated score from the remote recognition system; and choosing the remote recognition result as the final result.

Assignees

Inventors

Classifications

  • Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title

  • G10L15/30Primary

    Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • Search algorithms, e.g. Baum-Welch or Viterbi · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8972263B2 cover?
A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subje…
Who is the assignee on this patent?
Stonehocker Timothy P, Mohajer Keyvan, Mont-Reynaud Bernard, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L15/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).