Method and system for voice recognition employing multiple voice-recognition techniques

US9570076B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9570076-B2
Application numberUS-201313774398-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2013
Priority dateOct 30, 2012
Publication dateFeb 14, 2017
Grant dateFeb 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for voice recognition are disclosed. In one example embodiment, the method includes receiving voice input information by way of a receiver on a mobile device and performing, by way of at least one processing device on the mobile device, first and second processing operations respectively with respect to first and second voice input portions, respectively, which respectively correspond to and are based at least indirectly upon different respective portions of the voice input information. The first processing operation includes a speech-to-text operation and the second processing operation includes an alternate processing operation. Additionally, the method includes generating recognized voice information based at least indirectly upon results from the first and second processing operations, and performing at least one action based at least in part upon the recognized voice information, where the at least one action includes outputting at least one signal by an output device.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: receiving audio data that encodes an utterance; obtaining, as a result of performing speech-to-text voice recognition on the audio data, a first transcription of the utterance; segmenting the first transcription into two or more discrete terms; determining that a first particular term from among the two or more discrete terms is included among a predefined set of terms that are associated with a word spotting process that involves determining whether an acoustic fingerprint of a given portion of audio data is an acoustic match with one or more given terms without performing speech-to-text voice recognition; determining that the two or more discrete terms other than the first particular term are included among an additional predefined set of terms that are associated with the predefined set of terms that are associated with a word spotting process; in response to determining that the two or more discrete terms other than the first particular term are included among the additional predefined set of terms that are associated with the predefined set of terms that are associated with the word spotting process, obtaining, as a result of performing the word spotting process on a portion of the audio data that corresponds to a second particular term from among the two or more discrete terms other than the first particular term without re-performing speech-to-text voice recognition on the portion of the audio data, an indication that an acoustic fingerprint associated with the portion of the audio data that corresponds to the second particular term is an acoustic match with one or more terms of the predefined set of terms that are associated with the word spotting process; obtaining, as a result of re-performing speech-to-text voice recognition on a portion of the audio data that does not correspond to the second particular term, a second transcription of the utterance using the portion of the audio data that does not correspond to the second particular term; generating a third transcription of the utterance based at least on (i) the second transcription of the utterance that was obtained as a result of re-performing speech-to-text voice recognition on the portion of the audio data that does not correspond to the second particular term, and (ii) the one or more terms of the predefined set of terms that are indicated, as a result of performing the word spotting process on the portion of the audio data that corresponds to the second particular term without re-performing speech-to-text voice recognition of the audio data, as an acoustic match with the portion of the audio data that corresponds to the second particular term; and providing the third transcription of the utterance for output. 2. The method of claim 1 , wherein segmenting the first transcription into two or more discrete terms is based on a grammar structure of the first transcription. 3. The method of claim 1 , wherein the predefined set of terms that are associated with the word spotting process includes terms entered by a user. 4. The method of claim 1 , wherein the first transcription and the second transcription are obtained using different speech-to-text algorithms. 5. The method of claim 1 , wherein providing the third transcription of the utterance for output comprises: providing the third transcription of the utterance to an application. 6. The method of claim 1 , wherein generating the third transcription comprises: concatenating (i) the second transcription of the utterance that was obtained as a result of re-performing speech-to-text voice recognition on the portion of the audio data that does not correspond to the second particular term, and (ii) the one or more terms of the predefined set of terms that are indicated, as a result of performing the word spotting process on the portion of the audio data that corresponds to the second particular term without re-performing speech-to-text voice recognition of the audio data, as an acoustic match with the portion of the audio data that corresponds to the second particular term. 7. The method of claim 1 , comprising: performing the word spotting process on a portion of the audio data that corresponds to the second particular term. 8. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving audio data that encodes an utterance; obtaining, as a result of performing speech-to-text voice recognition on the audio data, a first transcription of the utterance; segmenting the first transcription into two or more discrete terms; determining that a first particular term from among the two or more discrete terms is included among a predefined set of terms that are associated with a word spotting process that involves determining whether an acoustic fingerprint of a given portion of audio data is an acoustic match with one or more given terms without performing speech-to-text voice recognition; determining that the two or more discrete terms other than the first particular term are included among an additional predefined set of terms that are associated with the predefined set of terms that are associated with a word spotting process; in response to determining that the two or more discrete terms other than the first particular term are included among the additional predefined set of terms that are associated with the predefined set of terms that are associated with the word spotting process, obtaining, as a result of performing the word spotting process on a portion of the audio data that corresponds to a second particular term from among the two or more discrete terms other than the first particular term without re-performing speech-to-text voice recognition on the portion of the audio data, an indication that an acoustic fingerprint associated with the portion of the audio data that corresponds to the second particular term is an acoustic match with one or more terms of the predefined set of terms that are associated with the word spotting process; obtaining, as a result of re-performing speech-to-text voice recognition on a portion of the audio data that does not correspond to the second particular term, a second transcription of the utterance using the portion of the audio data that does not correspond to the second particular term; generating a third transcription of the utterance based at least on (i) the second transcription of the utterance that was obtained as a result of re-performing speech-to-text voice recognition on the portion of the audio data that does not correspond to the second particular term, and (ii) the one or more terms of the predefined set of terms that are indicated, as a result of performing the word spotting process on the portion of the audio data that corresponds to the second particular term without re-performing speech-to-text voice recognition of the audio data, as an acoustic match with the portion of the audio data that corresponds to the second particular term; and providing the third transcription of the utterance for output. 9. The system of claim 8 , wherein segmenting the first transcription into two or more discrete terms is based on a grammar structure of the first transcription. 10. The system of claim 8 , wherein the predefined set of terms that are associated with the word spotting process includes terms entered by a user. 11. The system of claim 8 , wherein the first transcription and the second transcription are obtained using different speech-to-text algorithms. 12. Th

Assignees

Inventors

Classifications

  • with voice recognition means · CPC title

  • using natural language modelling · CPC title

  • Electricity · mapped topic

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9570076B2 cover?
A method and system for voice recognition are disclosed. In one example embodiment, the method includes receiving voice input information by way of a receiver on a mobile device and performing, by way of at least one processing device on the mobile device, first and second processing operations respectively with respect to first and second voice input portions, respectively, which respectively …
Who is the assignee on this patent?
Google Technology Holdings LLC
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).