Multi-microphone audio source separation based on combined statistical angle distributions

US9131295B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9131295-B2
Application numberUS-201213569092-A
CountryUS
Kind codeB2
Filing dateAug 7, 2012
Priority dateAug 7, 2012
Publication dateSep 8, 2015
Grant dateSep 8, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer media for separating audio sources in a multi-microphone system are provided. A plurality of audio sample groups can be received. Each audio sample group comprises at least two samples of audio information captured by different microphones during a sample group time interval. For each audio sample group, an estimated angle between an audio source and the multi-microphone system can be estimated based on a phase difference of the samples in the group. The estimated angle can be modeled as a combined statistical distribution that is a mixture of a target audio signal statistical distribution and a noise component statistical distribution. The combined statistical distribution can be analyzed to provide an accurate characterization of each sample group as either target audio signal or noise. The target audio signal can then be resynthesized from samples identified as part of the target audio signal.

First claim

Opening claim text (preview).

We claim: 1. One or more computer-readable memory or storage devices storing instructions that, when executed by a computing device having a processor, perform a method of separating audio sources in a multi-microphone system, the method comprising: receiving audio sample groups, with an audio sample group comprising at least two samples of audio information, the at least two samples captured by different microphones during a sample group time interval; and for a plurality of audio sample groups: estimating, for the corresponding sample group time interval, an angle between a first reference line extending from an audio source to the multi-microphone system and a second reference line extending through the multi-microphone system, the estimated angle being based on a phase difference between the at least two samples in the audio sample group; modeling the estimated angle as a combined statistical distribution, the combined statistical distribution being a mixture of a target audio signal statistical distribution and a noise component statistical distribution; and determining whether the audio sample group is part of a target audio signal or a noise component based at least in part on the combined statistical distribution. 2. The one or more computer-readable memory or storage devices of claim 1 , further comprising resynthesizing a target audio signal from the audio sample groups determined to be part of the target audio signal. 3. The one or more computer-readable memory or storage devices of claim 1 , wherein the multi-microphone system is a two-microphone system, and wherein the audio sample groups are audio sample pairs. 4. The one or more computer-readable memory or storage devices of claim 1 , wherein determining whether the audio sample group is part of the target audio signal or the noise component comprises comparing the combined statistical distribution to a fixed threshold. 5. The one or more computer-readable memory or storage devices of claim 1 , wherein determining whether the audio sample group is part of the target audio signal or the noise component comprises performing statistical analysis. 6. The one or more computer-readable memory or storage devices of claim 5 , wherein the statistical analysis comprises hypothesis testing. 7. The one or more computer-readable memory or storage devices of claim 6 , wherein the hypothesis testing is maximum a posteriori (MAP) hypothesis testing. 8. The one or more computer-readable memory or storage devices of claim 6 , wherein the hypothesis testing is maximum likelihood testing. 9. The one or more computer-readable memory or storage devices of claim 1 , wherein the target audio signal statistical distribution and the noise component statistical distribution are von Mises distributions. 10. The one or more computer-readable memory or storage devices of claim 1 , wherein the combined statistical distribution is represented by the equation f T (θ)=c 0 [m]f 0 (θ)+c 1 [m]f 1 (θ), where m is a sample group index, f 0 (θ)is a noise component distribution, f 1 (θ) is a target audio signal distribution, c 0 [m] and c 1 [m] are mixture coefficients, and c 0 [m]+c 1 [m]=1. 11. The one or more computer-readable memory or storage devices of claim 1 , wherein parameters for the combined statistical distribution are obtained using an expectation maximization (EM) algorithm. 12. The one or more computer-readable memory or storage devices of claim 1 , wherein an initial threshold for distinguishing target audio signal from noise component is a pre-determined fixed value. 13. The one or more computer-readable memory or storage devices of claim 1 , wherein the second reference line is perpendicular to a third reference line extending between the first and second microphones, and wherein the first reference line and the second reference line intersect at the approximate midpoint of the third reference line. 14. The one or more computer-readable memory or storage devices of claim 1 , wherein the sample group time intervals are about approximately between 50 and 125 milliseconds. 15. A multi-microphone mobile device having audio source-separation capabilities, the mobile device comprising: a first microphone; a second microphone; a processor; an angle estimator configured to, by the processor, for a sample pair time interval, estimate an angle between a first reference line extending from an audio source to the mobile device and a second reference line extending through the mobile device, the estimated angle being based on a phase difference between a first sample and a second sample in an audio sample pair captured during the sample pair time interval, wherein the first sample is captured by the first microphone and the second sample is captured by the second microphone; a combined statistical modeler configured to model the estimated angle as a combined statistical distribution, the combined statistical distribution being a mixture of a target audio signal statistical distribution and a noise component statistical distribution; and a sample classifier configured to determine whether the audio sample pair is part of a target audio signal or a noise component based at least in part on the combined statistical distribution. 16. The multi-microphone mobile device of claim 15 , wherein the mobile device is a mobile phone. 17. The multi-microphone mobile device of claim 15 , wherein the sample classifier is further configured to determine whether the audio sample pair is part of the target audio signal or the noise component by performing statistical analysis. 18. The multi-microphone mobile device of claim 17 , wherein the statistical analysis comprises at least one of maximum a posteriori (MAP) hypothesis testing or maximum likelihood testing. 19. The multi-microphone mobile device of claim 15 , wherein the sample classifier is further configured to determine whether the audio sample pair is part of the target audio signal or the noise component by comparing the combined statistical distribution to a fixed threshold. 20. The multi-microphone mobile device of claim 15 , wherein the second reference line is perpendicular to a third reference line extending between the first and second microphones, and wherein the first reference line and the second reference line intersect at an approximate midpoint of the third reference line. 21. The multi-microphone mobile device of claim 15 , wherein the target audio signal statistical distribution and the noise component statistical distribution are von Mises distributions, and wherein the combined statistical modeler is further configured to determine parameters for the combined statistical distribution using an expectation maximization (EM) algorithm. 22. A method of providing a target audio signal through audio source separation in a two-microphone system, the method comprising: receiving audio sample pairs, with an audio sample pair comprising a first sample of audio information captured by a first microphone during a sample pair time interval and a second sample of audio information captured by a second microphone during the sample pair time interval; for a plurality of audio sample pairs: estimating, for the corresponding sample pair time interval, an angle between a first reference line extending from an audio source to the two-microphone system and a second reference line extending through the two-microphone system, the estimated angle being based on a phase difference between the first and second sa

Assignees

Inventors

Classifications

  • H04R3/005Primary

    for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • Voice signal separating · CPC title

  • Public address systems (circuits for preventing acoustic reaction H04R3/02; circuits for distributing signals to loudspeakers H04R3/12; {monitoring or testing arrangements for public address systems H04R29/007}; amplifiers H03F) · CPC title

  • Digital PA systems using, e.g. LAN or internet · CPC title

  • Signal processing in [PA] systems to enhance the speech intelligibility · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9131295B2 cover?
Systems, methods, and computer media for separating audio sources in a multi-microphone system are provided. A plurality of audio sample groups can be received. Each audio sample group comprises at least two samples of audio information captured by different microphones during a sample group time interval. For each audio sample group, an estimated angle between an audio source and the multi-mic…
Who is the assignee on this patent?
Kim Chanwoo, Khawand Charbel, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04R3/005. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 08 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).