Multi-Microphone Speech Recognition Systems and Related Techniques

US2016358606A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016358606-A1
Application numberUS-201514732711-A
CountryUS
Kind codeA1
Filing dateJun 6, 2015
Priority dateJun 6, 2015
Publication dateDec 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.

First claim

Opening claim text (preview).

1 . A speech recognition system for resolving impaired utterances, comprising: a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance; and a selector configured to determine a most-likely accurate transcription from among the plurality of highest-probability transcription. 2 . The speech recognition system according to claim 1 , wherein the speech recognition engine comprises a plurality of constituent recognizers, each constituent recognizer being configured to receive a respective one of the plurality of representations of the utterance and to determine one or more corresponding transcription candidates corresponding to the received representation of the utterance, and, for each transcription candidate, to determine an associated likelihood of being an accurate transcription of the utterance. 3 . The speech recognition system according to claim 2 , wherein each constituent recognizer generates intermediate transcription information corresponding to each respective transcription candidate determined by the respective constituent recognizer, wherein the recognition system further comprises a parameter updater configured to compare intermediate transcription information from among the constituent recognizers and to revise at least one intermediate transcription information responsive to a measure of quality of the at least one intermediate transcription information relative to a selected threshold measure of quality. 4 . (canceled) 5 . The speech recognition system according to claim 1 , wherein each of the highest-likelihood transcription candidates has a corresponding likelihood of being an accurate transcription of the utterance, and wherein the selector is configured to determine the most-likely accurate transcription based on a comparison of the corresponding likelihoods. 6 . The speech recognition system according to claim 5 , wherein the selector is further configured to select a transcription candidate having a largest likelihood among the plurality of transcription candidates, a transcription candidate having a largest net likelihood among the plurality of transcription candidates, a transcription candidate having a largest frequency of being a highest-likelihood transcription candidate, or a transcription candidate having a highest cumulative rank order, wherein the rank order for each transcription candidate corresponding to a given representation corresponds to a relative likelihood of the respective transcription candidate compared to a likelihood of each other transcription candidate corresponding to the given utterance. 7 . The speech recognition system according to claim 1 , wherein the speech recognition engine comprises a plurality of constituent recognizers, each constituent recognizer being configured to produce a corresponding ordered list of highest-likelihood transcription candidates and a likelihood measure for each respective transcription candidate. 8 . The speech recognition system according to claim 1 , wherein the speech recognition engine comprises a plurality of constituent recognizers, each having a plurality of recognition stages, wherein each recognition stage is configured to extract a plurality of component candidates and intermediate transcription information corresponding to each component candidate. 9 . The speech recognition system according to claim 8 , further comprising an updater configured to revise one or more selected component candidates based on a comparison of the intermediate transcription information corresponding to the one or more selected component candidates in relation to the intermediate transcription information corresponding to the other component candidates. 10 . (canceled) 11 . (canceled) 12 . (canceled) 13 . (canceled) 14 . A speech recognition method comprising: concurrently determining a plurality of highest-likelihood transcription candidates corresponding to each of a plurality of representations of an utterance; and selecting a most-likely accurate transcription from among the transcription candidates corresponding to the plurality of representations of the utterance. 15 . The speech recognition method according to claim 14 , wherein the act of concurrently determining one or more highest-likelihood transcription candidates corresponding to each of the plurality of representations of the utterance comprises concurrently processing each representation of the utterance with a respective different constituent recognizer to determine one or more corresponding highest-likelihood transcription candidates and an associated likelihood of being an accurate transcription for each transcription candidate. 16 . The speech recognition method according to claim 14 , wherein each of the highest-likelihood transcription candidates has an associated likelihood of being an accurate transcription of the utterance, and the act of selecting the most-likely accurate transcription comprises selecting from among the transcription candidates a transcription candidate having a largest likelihood among the plurality of transcription candidates, a transcription candidate having a largest net likelihood among the plurality of transcription candidates, a transcription candidate having a largest frequency of being a highest likelihood transcription candidate from each representation of the utterance, or a transcription candidate having a highest cumulative rank order among the representations of the utterance, wherein a rank order for each transcription candidate corresponding to a given representation corresponds to the relative likelihood of the respective transcription candidate compared to a likelihood of each of the other transcription candidates corresponding to the given utterance. 17 . The speech recognition method according according to claim 14 , wherein the act of concurrently determining a plurality of highest-likelihood transcription candidates comprises producing with each of a plurality of constituent recognizers an ordered list of highest-likelihood transcription candidates and a corresponding likelihood measure for each respective transcription candidate. 18 . The speech recognition method according according to claim 17 , wherein each constituent recognizer has a plurality of recognition stages, wherein each recognition stage is configured to extract a plurality of component candidates and intermediate transcription information corresponding to each component candidate, the method further comprising revising one or more selected component candidates based on a comparison of the intermediate transcription information corresponding to the one or more selected component candidates in relation to intermediate transcription information corresponding to the other component candidates. 19 . The speech recognition method according to claim 14 , further comprising generating the plurality of representations of the utterance from an output signal from each of a plurality of microphone transducers, wherein the plurality of representations comprises a selected one or more of a recording from each in the plurality of microphone transducers, a linear combination of the output signals from the plurality of microphone transducers, a plurality of linear combinations of the output signals from the plurality of microphone transducers, a filtered version of the output signal from at least one of the microphone tr

Assignees

Inventors

Classifications

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Demisyllables, biphones or triphones being the recognition units · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016358606A1 cover?
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a …
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).