What technology area does this patent fall under?

Primary CPC classification G10L17/06. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 24 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Method for microphone selection and multi-talker segmentation with ambient automated speech recognition (ASR)

Patent metadata
Field	Value
Publication number	US-10424317-B2
Application number	US-201715403481-A
Country	US
Kind code	B2
Filing date	Jan 11, 2017
Priority date	Sep 14, 2016
Publication date	Sep 24, 2019
Grant date	Sep 24, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed methods and systems are directed to determining a best microphone pair and segmenting sound signals. The methods and systems may include receiving a collection of sound signals comprising speech from one or more audio sources (e.g., meeting participants) and/or background noise. The methods and systems may include calculating a TDOA and determining, based on the TDOA and via robust statistics, the best pair of microphones. The methods and systems may also include segmenting sound signals from multiple sources.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a plurality of audio signals, wherein each audio signal of the plurality of audio signals is received by one or more pairs of a plurality of microphones; determining a time delay of arrival (TDOA) for each audio signal corresponding to a difference in receipt time of the plurality of audio signals for the one or more pairs of the plurality of microphones; clustering the TDOAs to be associated with one of an audio source and interference, resulting in clustering information, wherein at least one TDOA is associated with the audio source and at least one TDOA is associated with the interference; and segmenting each audio signal of the plurality of audio signals received by the one or more pairs of the plurality of microphones using the clustering information resulting from clustering the TDOAs to identify the audio source. 2. The method of claim 1 , wherein the plurality of microphones comprises at least three microphones, the method further comprising: performing the clustering for possible pairs of the at least three microphones, resulting in additional clustering information; generating, based on the additional clustering information, a confidence measure per possible pair of the at least three microphones; and selecting, based on the confidence measure, one of the possible pairs of microphones. 3. The method of claim 2 , wherein the confidence measure is determined by: CM l =max P (θ i |τ l ), where i={1, . . . , N spk }, and where P(θ i |τ l ), is determined based on a channel selection strategy. 4. The method of claim 1 , wherein the clustering is performed using statistical models. 5. The method of claim 4 wherein the statistical models comprise a Gaussian mixture model (GMM). 6. The method of claim 5 wherein a plurality of input parameters to the GMM are determined at each small time analysis window of a plurality of small time analysis windows and wherein the GMM is determined by: arg ⁢ ⁢ max θ v ⁢ ⁢ log ⁢ ⁢ ℒ ⁡ ( θ v | τ v ) , ⁢ where v = { 1 , 2 , … ⁢ , ⌈ N TDOA - ( N w - N o ) N o ⌉ } , ⁢ τ v = { τ ( v - 1 ) · ( N w ) + 1 , τ ( v - 1 ) · ( N w ) + 2 , … ⁢ , τ ( v - 1 ) · ( N w ) + N w } , wherein N o comprises a number of overlapped frames, wherein v represents one of the small time analysis windows, and wherein N w comprises a length of each small time analysis window

Assignees

Nuance Communications Inc

Inventors

Classifications

G10L15/00
Speech recognition (G10L17/00 takes precedence) · CPC title
G10L25/84
for discriminating voice from noise · CPC title
G10L25/03
characterised by the type of extracted parameters · CPC title
G10L17/06Primary
Decision making techniques; Pattern matching strategies · CPC title
G10L17/16
Hidden Markov models [HMM] · CPC title

Patent family

Related publications grouped by family.

View patent family 61560241

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10424317B2 cover?: Disclosed methods and systems are directed to determining a best microphone pair and segmenting sound signals. The methods and systems may include receiving a collection of sound signals comprising speech from one or more audio sources (e.g., meeting participants) and/or background noise. The methods and systems may include calculating a TDOA and determining, based on the TDOA and via robust st…
Who is the assignee on this patent?: Nuance Communications Inc
What technology area does this patent fall under?: Primary CPC classification G10L17/06. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 24 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).