What technology area does this patent fall under?

Primary CPC classification G10L15/34. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-microphone speech recognition systems and related techniques

US9865265B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9865265-B2
Application number	US-201514732715-A
Country	US
Kind code	B2
Filing date	Jun 6, 2015
Priority date	Jun 6, 2015
Publication date	Jan 9, 2018
Grant date	Jan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.

First claim

Opening claim text (preview).

We currently claim: 1. A speech recognition system for resolving far-field utterances, comprising: an acoustic appliance comprising a processor, a memory and a communication connection to communicate with one or more spatially distributed acoustic appliances, wherein the memory stores instructions which, when executed by the processor, cause the system to concurrently receive over the communication connection a plurality of representations of an utterance observed by the one or more spatially distributed acoustic appliances; determine a highest-probability representation of the utterance based on the plurality of utterance representations; and determine a most-likely transcription corresponding to the highest-probability representation of the utterance. 2. The speech recognition system according to claim 1 , wherein the plurality of representations of the utterance comprises a concatenation of utterance representations and corresponding posterior probabilities. 3. The speech recognition system according to claim 1 , wherein each of the utterance representations has an associated posterior probability, and wherein the highest-probability representation of the utterance is further based in part on a combination of the plurality of posterior probabilities corresponding to the utterance representations. 4. The speech recognition system according to claim 1 , wherein utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio are more heavily weighted to determine the highest-probability representation of the utterance. 5. The speech recognition system according to claim 1 , wherein the memory further contains a recognition parameter store, wherein the instructions, when executed by the processor, further cause the system to combine one or more recognition parameters from the recognition parameter store with the highest-probability representations of the utterance. 6. The speech recognition system according to claim 5 , wherein the plurality of representations of the utterance comprises a plurality of acoustic features, each having a corresponding posterior probability, and wherein the instructions, when executed by the processor, further cause the system to identify the acoustic features having a highest-probability of correctly representing the utterance, and to combine the one or more recognition parameters from the recognition parameter store with the highest probability acoustic features. 7. The speech recognition system of claim 6 , wherein the recognition parameter store comprises an acoustic feature dictionary, a language model, or both. 8. The speech recognition system of claim 7 , wherein the plurality of utterance representations comprises a plurality of phonemes and the acoustic feature dictionary comprises a phonetic dictionary. 9. The speech recognition system according to claim 1 , wherein each representation of the utterance comprises one or more respective acoustic features and corresponding posterior probabilities, and wherein the instructions, when executed by the processor, further cause the system to aggregate the plurality of streams and corresponding posterior probabilities; and to select from the aggregated plurality of streams those acoustic features most likely to accurately reflect the utterance. 10. The speech recognition system according to claim 9 , wherein the acoustic appliance comprises a first acoustic appliance, the system further comprising a second acoustic appliance to extract acoustic features from an acoustic signal received by the-second appliance and to stream the extracted acoustic features over the communication connection to the first acoustic appliance. 11. The speech recognition system according to claim 10 , wherein the first appliance and/or the second appliance comprises a near-field acoustic-feature extractor, a far-field acoustic-feature extractor, or both. 12. The speech recognition system according to claim 10 , wherein the first appliance is configured to synchronize the plurality of received streams of acoustic features and associated posterior probabilities. 13. A speech-recognition method, comprising: over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio; selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance; determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and responsive to the most-likely transcription of the utterance, invoking one or more instructions. 14. The speech recognition method according to claim 13 , wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations. 15. The speech recognition method according to claim 14 , wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio. 16. The speech recognition method according to claim 13 , further comprising combining one or more recognition parameters from a recognition parameter store with the highest-probability utterance representation. 17. A non-transitory, computer-readable media containing instructions that, when executed by a processor, cause a computing environment to perform a speech recognition method comprising: over a communication connection, receiving from a plurality of spatially distributed audio appliances a corresponding plurality of representations of an utterance observed by the acoustic appliances, wherein each audio appliance has one or more microphone transducers and associated circuitry to convert observed audio to an acoustic signal representative of the audio; selecting a highest-probability representation of the utterance based on the plurality of representations of the utterance; determining a most-likely transcription of the utterance in correspondence to the highest-probability representation of the utterance; and responsive to the most-likely transcription of the utterance, invoking one or more instructions. 18. The non-transitory, computer-readable media according to claim 17 , wherein each of the utterance representations has an associated posterior probability, and wherein the act of selecting the highest-probability representation comprises combining the plurality of posterior probabilities corresponding to the utterance representations. 19. The non-transitory, computer-readable media according to claim 18 , wherein the act of selecting the highest-probability representation further comprises more heavily weighting utterance representations having relatively higher signal-to-noise ratio as compared to utterance representations having relatively lower signal-to-noise ratio.

Assignees

Apple Inc

Inventors

Classifications

G10L15/20
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
G10L2021/02166
Microphone arrays; Beamforming · CPC title
G10L2015/022
Demisyllables, biphones or triphones being the recognition units · CPC title
G10L15/34Primary
Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title
G10L15/16
using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 57451986

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9865265B2 cover?: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a …
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/34. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).