Delay estimation for acoustic echo cancellation

US9916840B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9916840-B1
Application numberUS-201615370271-A
CountryUS
Kind codeB1
Filing dateDec 6, 2016
Priority dateDec 6, 2016
Publication dateMar 13, 2018
Grant dateMar 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technology for estimating a delay between a far-end audio signal and a near-end audio signal for acoustic echo cancellation is disclosed. A copy of the far-end signal is stored in a speaker buffer and organized in chunks, and a copy of the near-end signal is stored in a microphone buffer and organized in chunks. Cross correlation is performed on each pair of speaker chunks and microphone chunks based on β-PHAse Transform (“PHAT”) generalized cross correlation (“GCC”). A peak correlation value can be obtained for each pair of the chunks. Offset values corresponding to the peak correlation values are collected and clustered. A best cluster is selected and the offset value represented by the selected cluster is identified as the estimated delay. Acoustic echo cancellation can be performed on the near-end signal based on the estimated delay.

First claim

Opening claim text (preview).

What is claimed is: 1. An system, comprising: one or more processors; at least one speaker; at least one microphone; and one or more non-transitory computer-readable storage media having instructions stored thereupon which are executable by the one or more processors and which, when executed, cause the system to: obtain a far-end audio signal to be played out through the at least one speaker, obtain a near-end audio signal captured through the at least one microphone, store the far-end audio signal in a speaker buffer and store the near-end audio signal in a microphone buffer, perform cross correlation between a chunk of data stored in the speaker buffer and a chunk of data stored in the microphone buffer using a β-PHAse Transform (“PHAT”) generalized cross correlation (“GCC”) to generate a set of cross correlation values, identify a peak correlation value in the set of cross correlation values and a corresponding offset value, add the offset value into a group of offset values, divide the group of offset values into a plurality of clusters and identify a cluster containing the most offset values as a best cluster, identify a delay value corresponding to the best cluster as an estimated delay, and cause echo cancellation to be performed on the near-end audio signal using the estimated delay. 2. The system of claim 1 , wherein the one or more non-transitory computer-readable storage media have further instructions stored thereupon to cause the system to determine that the peak correlation value is reliable by determining that the peak correlation value is higher than a peak correlation threshold, and wherein the offset value corresponding to the peak correlation value is added to the group of offset values in response to a determination that the peak correlation value is reliable. 3. The system of claim 2 , wherein determining that the peak correlation value is reliable further comprises determining that a ratio of the peak correlation value to a second highest value in the set of cross correlation values is higher than a second correlation value threshold. 4. The system of claim 1 , wherein the one or more non-transitory computer-readable storage media have further instructions stored thereupon to cause the system to: determine that the best cluster is reliable by determining that a number of offset values in the cluster is higher than a size threshold, and that the highest peak value corresponding to the offset values in the cluster is higher than a quality threshold; in response to determining that the best cluster is not reliable, cause the echo cancellation to be performed on the near-end audio signal using a previously estimated delay; and in response to determining that the best cluster is reliable, identify the delay value corresponding to the best cluster. 5. The system of claim 1 , wherein the one or more non-transitory computer-readable storage media have further instructions stored thereupon to cause the system to: determine that the system has acoustic echo cancellation functionality based at least in part by determining that the best cluster is not reliable; and prevent an additional acoustic echo canceller from being implemented on the system. 6. A non-transitory computer-readable storage media having instructions stored thereupon that are executable by one or more processors and which, when executed, cause the one or more processors to: perform cross correlation between a chunk of data stored in a speaker buffer and a chunk of data stored in a microphone buffer to generate a set of cross correlation values, the speaker buffer configured to store far-end audio signals to be played back through a speaker, and the microphone buffer configured to store near-end audio signals captured through a microphone; identify a peak correlation value in the set of cross correlation values and a corresponding offset value, and add the offset value into a group of offset values; divide the group of offset values into a plurality of clusters and identify a cluster as a best cluster; identify a delay value corresponding to the best cluster as an estimated delay; and cause acoustic echo cancellation to be performed on the near-end audio signal using the estimated delay. 7. The computer-readable storage media of claim 6 , wherein the cross correlation between the chunk of data stored in the speaker buffer and the chunk of data stored in the microphone buffer is performed by calculating a β-PHAse Transform (“PHAT”) generalized cross correlation (“GCC”). 8. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to determine that the peak correlation value is reliable by determining that the peak correlation value is higher than a peak correlation threshold, and wherein the offset value corresponding to the peak correlation value is added into the group of offset values in response to a determination that the peak correlation value is reliable. 9. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to: determine that the best cluster is reliable by determining that a number of offset values in the cluster is higher than a size threshold, and that the highest peak value corresponding to the offset values in the cluster is higher than a quality threshold; in response to determining that the best cluster is not reliable, cause the echo cancellation to be performed on the near-end audio signal using a previously estimated delay; and in response to determining that the best cluster is reliable, identify the delay value corresponding to the best cluster. 10. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to apply a filter on a plurality of previously estimated delays and the currently estimated delay. 11. The computer-readable storage media of claim 6 , wherein the chunk of data stored in the speaker buffer comprises a segment of the far-end audio signal and a segment of a zero energy signal, and wherein the chunk of data stored in the microphone buffer comprises a segment of the near-end audio signal and a segment of the zero energy signal. 12. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to: detect that a periodic signal is present based on the set of cross correlation values; and cause the acoustic echo cancellation to be performed on the near-end audio signal using a previously estimated delay. 13. The computer-readable storage media of claim 6 , having further instructions stored thereupon to cause the one or more processors to determine that data contained in the speaker buffer has enough activity by determining that an energy of the data in the speaker buffer is higher than an energy threshold, wherein the cross correlation is performed in response to determining that data contained in the speaker buffer has enough activity. 14. A computer-implemented method for estimating a delay between two audio signals, the method comprising: performing a cross correlation between a chunk of a first audio signal and a chunk of a second audio signal to generate a set of cross correlation values; identifying a peak correlation value in the set of cross correlation values and a corresponding offset value, and adding the offset value into a group of offset values obtained based on the first audio signal and the second audio signal; clustering the group of delay values into a plurality of clusters; selecting a c

Assignees

Inventors

Classifications

  • characterised by the method used for estimating noise · CPC title

  • the noise being echo, reverberation of the speech · CPC title

  • the extracted parameters being correlation coefficients · CPC title

  • for comparison or discrimination · CPC title

  • characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9916840B1 cover?
A technology for estimating a delay between a far-end audio signal and a near-end audio signal for acoustic echo cancellation is disclosed. A copy of the far-end signal is stored in a speaker buffer and organized in chunks, and a copy of the near-end signal is stored in a microphone buffer and organized in chunks. Cross correlation is performed on each pair of speaker chunks and microphone chun…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L21/0216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).