Robust estimation of sound source localization

US10939201B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10939201-B2
Application numberUS-201313775073-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2013
Priority dateFeb 22, 2013
Publication dateMar 2, 2021
Grant dateMar 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for sound source localization in a digital system having at least two audio capture devices is provided that includes receiving audio signals from the two audio capture devices, computing a signal-to-noise ratio (SNR) for each frequency band of a plurality of frequency bands in a processing frame of the audio signals, determining a frequency band weight for each frequency band of the plurality of frequency bands based on the SNR computed for the frequency band, computing an estimated time delay of arrival (TDOA) of sound for the processing frame using the frequency band weights, and converting the estimated TDOA to an angle representing sound direction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: Receiving, with one or more processors, audio signals from two audio capture devices; Converting, with the one or more processers, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; computing, with the one or more processors, a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands; setting, with the one or more processors, a value of a frequency band weight for a corresponding one of the frequency bands to one when the SNR computed for the corresponding frequency band indicates sufficient signal power in the corresponding frequency band to meet a threshold for contribution to a sound direction estimate; setting, with the one or more processors, the value of the frequency band weight for the corresponding frequency band to zero when the SNR computed for the corresponding frequency band does not indicate sufficient signal power in the corresponding frequency band to meet a threshold for contribution to a sound direction estimate; determining, with the one or more processors, a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band; up-sampling, with the one or more processors, the weighted GCC-PHAT value for each of the frequency bands by inserting zeroes in a spectral representation of the weighted GCC-PHAT value for each of the frequency bands; converting, with the one or more processors, the up-sampled weighted GCC-PHAT value for each of the frequency bands into a time domain; computing, with the one or more processors, an estimated time delay of arrival (TDOA) of sound for the processing frame using the time domain up-sampled weighted GCC-PHAT value for each of the frequency bands; and converting, with the one or more processors, the estimated TDOA to an angle representing sound direction. 2. A method comprising: receiving, with one or more processors, audio signals from two audio capture devices; converting, with the one or more processors, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; computing, with the one or more processors, a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands in the processing frame of the audio signals; determining, with the one or more processors, a frequency band weight for each of the frequency bands based on the SNR computed for the frequency band; determining, with the one or more processors, a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band; up-sampling, with the one or more processors, the weighted GCC-PHAT value for each of the frequency bands by inserting zeroes in a spectral representation of the weighted GCC-PHAT value for each of the frequency bands; converting, with the one or more processors, the up-sampled weighted GCC-PHAT value for each of the frequency bands into a time domain; obtaining, with the one or more processors, an estimated time delay of arrival (TDOA) objective function based on the time domain up-sampled weighted GCC-PHAT value for each of the frequency bands; applying, with the one or more processors, an adaptive inter-frame filter to the TDOA objective function to obtain a filtered TDOA objective function; computing, with the one or more processors, an estimated TDOA based on the filtered TDOA objective function; and converting, with the one or more processors, the estimated TDOA to an angle representing sound direction, wherein coefficients of the adaptive inter-frame filter are respective signal powers of a plurality of processing frames preceding the processing frame. 3. A method comprising: receiving, with one or more processors, audio signals from two audio capture devices; converting, with the one or more processors, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; computing, with the one or more processors, a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands; determining, with the one or more processors, a frequency band weight for each of the frequency bands based on the SNR computed for the frequency band; determining, with the one or more processors, a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band; up-sampling, with the one or more processors, the weighted GCC-PHAT value for each of the frequency bands by inserting zeroes in a spectral representation of the weighted GCC-PHAT value for each of the frequency bands; converting, with the one or more processors, the up-sampled weighted GCC-PHAT value for each of the frequency bands into a time domain; determining, with the one or more processors, a time delay of arrival TDOA objective function for the processing frame of the audio signals based on the time domain up-sampled weighted GCC-PHAT value for each of the frequency bands; applying, with the one or more processors, an adaptive inter-frame filter to the TDOA objective function to obtain a filtered TDOA objective function, wherein coefficients of the adaptive inter-frame filter are respective signal powers of a plurality of processing frames preceding the processing frame; computing, with the one or more processors, an estimated TDOA based on the filtered TDOA objective function; and converting, with the one or more processors, the estimated TDOA to an angle representing sound direction. 4. A digital system comprising: two audio capture devices for capturing audio signals; means for converting, with the one or more processors, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; means for dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; means for computing a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands; means for determining a frequency band weight for each of the frequency bands based on the SNR computed for the frequency band; means for determining a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band;

Assignees

Inventors

Classifications

  • H04R3/005Primary

    for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants (echo suppression in two-way loud-speaking telephone systems H04M9/02; sound field processing per se H04S7/30) · CPC title

  • Conference systems · CPC title

  • Determination of the location of a subscriber · CPC title

  • Synergistic effects of band splitting and sub-band processing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10939201B2 cover?
A method for sound source localization in a digital system having at least two audio capture devices is provided that includes receiving audio signals from the two audio capture devices, computing a signal-to-noise ratio (SNR) for each frequency band of a plurality of frequency bands in a processing frame of the audio signals, determining a frequency band weight for each frequency band of the p…
Who is the assignee on this patent?
Texas Instruments Inc
What technology area does this patent fall under?
Primary CPC classification H04R3/005. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).