Method for microphone selection and multi-talker segmentation with ambient automated speech recognition (ASR)

US10424317B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10424317-B2
Application numberUS-201715403481-A
CountryUS
Kind codeB2
Filing dateJan 11, 2017
Priority dateSep 14, 2016
Publication dateSep 24, 2019
Grant dateSep 24, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed methods and systems are directed to determining a best microphone pair and segmenting sound signals. The methods and systems may include receiving a collection of sound signals comprising speech from one or more audio sources (e.g., meeting participants) and/or background noise. The methods and systems may include calculating a TDOA and determining, based on the TDOA and via robust statistics, the best pair of microphones. The methods and systems may also include segmenting sound signals from multiple sources.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a plurality of audio signals, wherein each audio signal of the plurality of audio signals is received by one or more pairs of a plurality of microphones; determining a time delay of arrival (TDOA) for each audio signal corresponding to a difference in receipt time of the plurality of audio signals for the one or more pairs of the plurality of microphones; clustering the TDOAs to be associated with one of an audio source and interference, resulting in clustering information, wherein at least one TDOA is associated with the audio source and at least one TDOA is associated with the interference; and segmenting each audio signal of the plurality of audio signals received by the one or more pairs of the plurality of microphones using the clustering information resulting from clustering the TDOAs to identify the audio source. 2. The method of claim 1 , wherein the plurality of microphones comprises at least three microphones, the method further comprising: performing the clustering for possible pairs of the at least three microphones, resulting in additional clustering information; generating, based on the additional clustering information, a confidence measure per possible pair of the at least three microphones; and selecting, based on the confidence measure, one of the possible pairs of microphones. 3. The method of claim 2 , wherein the confidence measure is determined by: CM l =max P (θ i |τ l ), where i={1, . . . , N spk }, and where P(θ i |τ l ), is determined based on a channel selection strategy. 4. The method of claim 1 , wherein the clustering is performed using statistical models. 5. The method of claim 4 wherein the statistical models comprise a Gaussian mixture model (GMM). 6. The method of claim 5 wherein a plurality of input parameters to the GMM are determined at each small time analysis window of a plurality of small time analysis windows and wherein the GMM is determined by: arg ⁢ ⁢ max θ v ⁢ ⁢ log ⁢ ⁢ ℒ ⁡ ( θ v | τ v ) , ⁢ where v = { 1 , 2 , … ⁢ , ⌈ N TDOA - ( N w - N o ) N o ⌉ } , ⁢ τ v = { τ ( v - 1 ) · ( N w ) + 1 , τ ( v - 1 ) · ( N w ) + 2 , … ⁢ , τ ( v - 1 ) · ( N w ) + N w } , wherein N o comprises a number of overlapped frames, wherein v represents one of the small time analysis windows, and wherein N w comprises a length of each small time analysis window

Assignees

Inventors

Classifications

  • Speech recognition (G10L17/00 takes precedence) · CPC title

  • for discriminating voice from noise · CPC title

  • characterised by the type of extracted parameters · CPC title

  • G10L17/06Primary

    Decision making techniques; Pattern matching strategies · CPC title

  • Hidden Markov models [HMM] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10424317B2 cover?
Disclosed methods and systems are directed to determining a best microphone pair and segmenting sound signals. The methods and systems may include receiving a collection of sound signals comprising speech from one or more audio sources (e.g., meeting participants) and/or background noise. The methods and systems may include calculating a TDOA and determining, based on the TDOA and via robust st…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 24 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).