Multiple speech locale-specific hotword classifiers for selection of a speech locale
US-9589564-B2 · Mar 7, 2017 · US
US9865265B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9865265-B2 |
| Application number | US-201514732715-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 6, 2015 |
| Priority date | Jun 6, 2015 |
| Publication date | Jan 9, 2018 |
| Grant date | Jan 9, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
Opening claim text (preview).
We currently claim: 1. A speech recognition system for resolving far-field utterances, comprising: an acoustic appliance comprising a processor, a memory and a communication connection to communicate with one or more spatially distributed acoustic appliances, wherein the memory stores instructions which, when executed by the processor, cause the system to concurrently receive over the communication connection a plurality of representations of an utterance observed by the one or more spatially distributed acoustic appliances; determine a highest-probability representation of the utterance based on the plurality of utterance representations; and determine a most-likely transcription corresponding to the highest-probability representation of the utterance. 2. The speech recognition system according to claim 1 , wherein the plurality of representations of the utterance comprises a concatenation of utterance representations and corresponding posterior probabilities. 3. The speech recognition system according to claim 1 , wherein each of the utterance representations has an associated posterior probability, and wherein the highest-probability representation of the utterance is further based in part on a combination of the plurality of posterior probabilities corresponding to the utterance representations. 4. The speech recognition system according to claim 1 , wherein utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio are more heavily weighted to determine the highest-probability representation of the utterance. 5. The speech recognition system according to claim 1 , wherein the memory further contains a recognition parameter store, wherein the instructions, when executed by the processor, further cause the system to combine one or more recognition parameters from the recognition parameter store with the highest-probability representations of the utterance. 6. The speech recognition system according to claim 5 , wherein the plurality of representations of the utterance comprises a plurality of acoustic features, each having a corresponding posterior probability, and wherein the instructions, when executed by the processor, further cause the system to identify the acoustic features having a highest-probability of correctly representing the utterance, and to combine the one or more recognition parameters from the recognition parameter store with the highest probability acoustic features. 7. The speech recognition system of claim 6 , wherein the recognition parameter store comprises an acoustic feature dictionary, a language model, or both. 8. The speech recognition system of claim 7 , wherein the plurality of utterance representations comprises a plurality of phonemes and the acoustic feature dictionary comprises a phonetic dictionary. 9. The speech recognition system according to claim 1 , wherein each representation of the utterance comprises one or more respective acoustic features and corresponding posterior probabilities, and wherein the instructions, when executed by the processor, further cause the system to aggregate the plurality of streams and corresponding posterior probabilities; and to select from the aggregated plurality of streams those acoustic features most likely to accurately reflect the utterance. 10. The speech recognition system according to claim 9 , wherein the acoustic appliance comprises a first acoustic appliance, the system further comprising a second acoustic appliance to extract acoustic features from an acoustic signal received by the-second appliance and to stream the extracted acoustic features over the communication connection to the first acoustic appliance. 11. The speech recognition system according to claim 10 , wherein the first appliance and/or the second appliance comprises a near-field acoustic-feature extractor, a far-field acoustic-feature extractor, or both. 12. The speech recognition system according to claim 10 , wherein the first appliance is configured to synchronize the plurality of received streams of acoustic features and associated posterior probabilities. 13. A speech-recognition method, comprising: over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio; selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance; determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and responsive to the most-likely transcription of the utterance, invoking one or more instructions. 14. The speech recognition method according to claim 13 , wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations. 15. The speech recognition method according to claim 14 , wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio. 16. The speech recognition method according to claim 13 , further comprising combining one or more recognition parameters from a recognition parameter store with the highest-probability utterance representation. 17. A non-transitory, computer-readable media containing instructions that, when executed by a processor, cause a computing environment to perform a speech recognition method comprising: over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio; selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance; determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and responsive to the most-likely transcription of the utterance, invoking one or more instructions. 18. The non-transitory, computer-readable media according to claim 17 , wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations. 19. The non-transitory, computer-readable media according to claim 18 , wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Microphone arrays; Beamforming · CPC title
Demisyllables, biphones or triphones being the recognition units · CPC title
Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title
using artificial neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.