Dynamic selection of appropriate far-field signal separation algorithms
US-2024257825-A1 · Aug 1, 2024 · US
US9916840B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9916840-B1 |
| Application number | US-201615370271-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 6, 2016 |
| Priority date | Dec 6, 2016 |
| Publication date | Mar 13, 2018 |
| Grant date | Mar 13, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A technology for estimating a delay between a far-end audio signal and a near-end audio signal for acoustic echo cancellation is disclosed. A copy of the far-end signal is stored in a speaker buffer and organized in chunks, and a copy of the near-end signal is stored in a microphone buffer and organized in chunks. Cross correlation is performed on each pair of speaker chunks and microphone chunks based on β-PHAse Transform (“PHAT”) generalized cross correlation (“GCC”). A peak correlation value can be obtained for each pair of the chunks. Offset values corresponding to the peak correlation values are collected and clustered. A best cluster is selected and the offset value represented by the selected cluster is identified as the estimated delay. Acoustic echo cancellation can be performed on the near-end signal based on the estimated delay.
Opening claim text (preview).
What is claimed is: 1. An system, comprising: one or more processors; at least one speaker; at least one microphone; and one or more non-transitory computer-readable storage media having instructions stored thereupon which are executable by the one or more processors and which, when executed, cause the system to: obtain a far-end audio signal to be played out through the at least one speaker, obtain a near-end audio signal captured through the at least one microphone, store the far-end audio signal in a speaker buffer and store the near-end audio signal in a microphone buffer, perform cross correlation between a chunk of data stored in the speaker buffer and a chunk of data stored in the microphone buffer using a β-PHAse Transform (“PHAT”) generalized cross correlation (“GCC”) to generate a set of cross correlation values, identify a peak correlation value in the set of cross correlation values and a corresponding offset value, add the offset value into a group of offset values, divide the group of offset values into a plurality of clusters and identify a cluster containing the most offset values as a best cluster, identify a delay value corresponding to the best cluster as an estimated delay, and cause echo cancellation to be performed on the near-end audio signal using the estimated delay. 2. The system of claim 1 , wherein the one or more non-transitory computer-readable storage media have further instructions stored thereupon to cause the system to determine that the peak correlation value is reliable by determining that the peak correlation value is higher than a peak correlation threshold, and wherein the offset value corresponding to the peak correlation value is added to the group of offset values in response to a determination that the peak correlation value is reliable. 3. The system of claim 2 , wherein determining that the peak correlation value is reliable further comprises determining that a ratio of the peak correlation value to a second highest value in the set of cross correlation values is higher than a second correlation value threshold. 4. The system of claim 1 , wherein the one or more non-transitory computer-readable storage media have further instructions stored thereupon to cause the system to: determine that the best cluster is reliable by determining that a number of offset values in the cluster is higher than a size threshold, and that the highest peak value corresponding to the offset values in the cluster is higher than a quality threshold; in response to determining that the best cluster is not reliable, cause the echo cancellation to be performed on the near-end audio signal using a previously estimated delay; and in response to determining that the best cluster is reliable, identify the delay value corresponding to the best cluster. 5. The system of claim 1 , wherein the one or more non-transitory computer-readable storage media have further instructions stored thereupon to cause the system to: determine that the system has acoustic echo cancellation functionality based at least in part by determining that the best cluster is not reliable; and prevent an additional acoustic echo canceller from being implemented on the system. 6. A non-transitory computer-readable storage media having instructions stored thereupon that are executable by one or more processors and which, when executed, cause the one or more processors to: perform cross correlation between a chunk of data stored in a speaker buffer and a chunk of data stored in a microphone buffer to generate a set of cross correlation values, the speaker buffer configured to store far-end audio signals to be played back through a speaker, and the microphone buffer configured to store near-end audio signals captured through a microphone; identify a peak correlation value in the set of cross correlation values and a corresponding offset value, and add the offset value into a group of offset values; divide the group of offset values into a plurality of clusters and identify a cluster as a best cluster; identify a delay value corresponding to the best cluster as an estimated delay; and cause acoustic echo cancellation to be performed on the near-end audio signal using the estimated delay. 7. The computer-readable storage media of claim 6 , wherein the cross correlation between the chunk of data stored in the speaker buffer and the chunk of data stored in the microphone buffer is performed by calculating a β-PHAse Transform (“PHAT”) generalized cross correlation (“GCC”). 8. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to determine that the peak correlation value is reliable by determining that the peak correlation value is higher than a peak correlation threshold, and wherein the offset value corresponding to the peak correlation value is added into the group of offset values in response to a determination that the peak correlation value is reliable. 9. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to: determine that the best cluster is reliable by determining that a number of offset values in the cluster is higher than a size threshold, and that the highest peak value corresponding to the offset values in the cluster is higher than a quality threshold; in response to determining that the best cluster is not reliable, cause the echo cancellation to be performed on the near-end audio signal using a previously estimated delay; and in response to determining that the best cluster is reliable, identify the delay value corresponding to the best cluster. 10. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to apply a filter on a plurality of previously estimated delays and the currently estimated delay. 11. The computer-readable storage media of claim 6 , wherein the chunk of data stored in the speaker buffer comprises a segment of the far-end audio signal and a segment of a zero energy signal, and wherein the chunk of data stored in the microphone buffer comprises a segment of the near-end audio signal and a segment of the zero energy signal. 12. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to: detect that a periodic signal is present based on the set of cross correlation values; and cause the acoustic echo cancellation to be performed on the near-end audio signal using a previously estimated delay. 13. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to determine that data contained in the speaker buffer has enough activity by determining that an energy of the data in the speaker buffer is higher than an energy threshold, wherein the cross correlation is performed in response to determining that data contained in the speaker buffer has enough activity. 14. A computer-implemented method for estimating a delay between two audio signals, the method comprising: performing a cross correlation between a chunk of a first audio signal and a chunk of a second audio signal to generate a set of cross correlation values; identifying a peak correlation value in the set of cross correlation values and a corresponding offset value, and adding the offset value into a group of offset values obtained based on the first audio signal and the second audio signal; clustering the group of delay values into a plurality of clusters; selecting a c
characterised by the method used for estimating noise · CPC title
the noise being echo, reverberation of the speech · CPC title
the extracted parameters being correlation coefficients · CPC title
for comparison or discrimination · CPC title
characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.