Open earphone
US-2024422466-A1 · Dec 19, 2024 · US
US10939201B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10939201-B2 |
| Application number | US-201313775073-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 22, 2013 |
| Priority date | Feb 22, 2013 |
| Publication date | Mar 2, 2021 |
| Grant date | Mar 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for sound source localization in a digital system having at least two audio capture devices is provided that includes receiving audio signals from the two audio capture devices, computing a signal-to-noise ratio (SNR) for each frequency band of a plurality of frequency bands in a processing frame of the audio signals, determining a frequency band weight for each frequency band of the plurality of frequency bands based on the SNR computed for the frequency band, computing an estimated time delay of arrival (TDOA) of sound for the processing frame using the frequency band weights, and converting the estimated TDOA to an angle representing sound direction.
Opening claim text (preview).
What is claimed is: 1. A method comprising: Receiving, with one or more processors, audio signals from two audio capture devices; Converting, with the one or more processers, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; computing, with the one or more processors, a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands; setting, with the one or more processors, a value of a frequency band weight for a corresponding one of the frequency bands to one when the SNR computed for the corresponding frequency band indicates sufficient signal power in the corresponding frequency band to meet a threshold for contribution to a sound direction estimate; setting, with the one or more processors, the value of the frequency band weight for the corresponding frequency band to zero when the SNR computed for the corresponding frequency band does not indicate sufficient signal power in the corresponding frequency band to meet a threshold for contribution to a sound direction estimate; determining, with the one or more processors, a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band; up-sampling, with the one or more processors, the weighted GCC-PHAT value for each of the frequency bands by inserting zeroes in a spectral representation of the weighted GCC-PHAT value for each of the frequency bands; converting, with the one or more processors, the up-sampled weighted GCC-PHAT value for each of the frequency bands into a time domain; computing, with the one or more processors, an estimated time delay of arrival (TDOA) of sound for the processing frame using the time domain up-sampled weighted GCC-PHAT value for each of the frequency bands; and converting, with the one or more processors, the estimated TDOA to an angle representing sound direction. 2. A method comprising: receiving, with one or more processors, audio signals from two audio capture devices; converting, with the one or more processors, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; computing, with the one or more processors, a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands in the processing frame of the audio signals; determining, with the one or more processors, a frequency band weight for each of the frequency bands based on the SNR computed for the frequency band; determining, with the one or more processors, a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band; up-sampling, with the one or more processors, the weighted GCC-PHAT value for each of the frequency bands by inserting zeroes in a spectral representation of the weighted GCC-PHAT value for each of the frequency bands; converting, with the one or more processors, the up-sampled weighted GCC-PHAT value for each of the frequency bands into a time domain; obtaining, with the one or more processors, an estimated time delay of arrival (TDOA) objective function based on the time domain up-sampled weighted GCC-PHAT value for each of the frequency bands; applying, with the one or more processors, an adaptive inter-frame filter to the TDOA objective function to obtain a filtered TDOA objective function; computing, with the one or more processors, an estimated TDOA based on the filtered TDOA objective function; and converting, with the one or more processors, the estimated TDOA to an angle representing sound direction, wherein coefficients of the adaptive inter-frame filter are respective signal powers of a plurality of processing frames preceding the processing frame. 3. A method comprising: receiving, with one or more processors, audio signals from two audio capture devices; converting, with the one or more processors, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; computing, with the one or more processors, a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands; determining, with the one or more processors, a frequency band weight for each of the frequency bands based on the SNR computed for the frequency band; determining, with the one or more processors, a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band; up-sampling, with the one or more processors, the weighted GCC-PHAT value for each of the frequency bands by inserting zeroes in a spectral representation of the weighted GCC-PHAT value for each of the frequency bands; converting, with the one or more processors, the up-sampled weighted GCC-PHAT value for each of the frequency bands into a time domain; determining, with the one or more processors, a time delay of arrival TDOA objective function for the processing frame of the audio signals based on the time domain up-sampled weighted GCC-PHAT value for each of the frequency bands; applying, with the one or more processors, an adaptive inter-frame filter to the TDOA objective function to obtain a filtered TDOA objective function, wherein coefficients of the adaptive inter-frame filter are respective signal powers of a plurality of processing frames preceding the processing frame; computing, with the one or more processors, an estimated TDOA based on the filtered TDOA objective function; and converting, with the one or more processors, the estimated TDOA to an angle representing sound direction. 4. A digital system comprising: two audio capture devices for capturing audio signals; means for converting, with the one or more processors, the audio signals into a processing frame by: splitting the audio signals into overlapping blocks; applying a windowing function to the overlapping blocks; and storing the offset between the windowed blocks as the processing frame; means for dividing the audio signals, in the processing frame, into multiple specified continuous frequency bands, each of the frequency bands including multiple frequency components; means for computing a generalized cross-correlation with phase transform (GCC-PHAT) and a signal-to-noise ratio (SNR) for each of the frequency bands; means for determining a frequency band weight for each of the frequency bands based on the SNR computed for the frequency band; means for determining a weighted GCC-PHAT value for each of the frequency bands based on the GCC-PHAT for the respective frequency band and the frequency band weight for the respective frequency band;
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants (echo suppression in two-way loud-speaking telephone systems H04M9/02; sound field processing per se H04S7/30) · CPC title
Conference systems · CPC title
Determination of the location of a subscriber · CPC title
Synergistic effects of band splitting and sub-band processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.