Delay estimation for acoustic echo cancellation
US-9916840-B1 · Mar 13, 2018 · US
US11418655B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11418655-B2 |
| Application number | US-201917260219-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 17, 2019 |
| Priority date | Jul 18, 2018 |
| Publication date | Aug 16, 2022 |
| Grant date | Aug 16, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method (400) includes receiving a microphone audio signal (132) and a playout audio signal (112), and determining a frequency representation (324) of the microphone audio signal and a frequency representation of the playout audio signal. For each frequency representation, the method also includes determining features (302) based on the frequency representation. Each feature corresponds to a pair of frequencies (342) of the frequency representation and a period of time between the pair of frequencies. The method also includes determining that a match (212) occurs between a first feature based on the frequency representation of the microphone audio signal and a second feature based on the frequency representation of the playout audio signal, and determining that a delay value (222) between the first feature and the second feature corresponds to an echo within the microphone audio signal.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, at data processing hardware, a microphone audio signal and a playout audio signal; determining, by the data processing hardware, a first frequency representation corresponding to the microphone audio signal and a second frequency representation corresponding to the playout audio signal; determining, by the data processing hardware, a first feature based on a first pair of frequencies of the first frequency representation and a period of time between the first pair of frequencies; determining, by the data processing hardware, a second feature based on a second pair of frequencies of the second frequency representation and a period of time between the second pair of frequencies; determining, by the data processing hardware, that a match occurs between the first feature based on the first frequency representation corresponding to the microphone audio signal and a second feature based on the second frequency representation corresponding to the playout audio signal; and determining, by the data processing hardware, that a delay value between the first feature and the second feature corresponds to an echo within the microphone audio signal. 2. The method of claim 1 , wherein determining that the delay value corresponds to the echo comprises determining that the delay value between the first feature and the second feature satisfies an echo threshold, the echo threshold representing a count of a particular delay value predictive of a respective echo. 3. The method of claim 1 , wherein the first pair of frequencies corresponds to a first peak frequency and a second peak frequency of the first frequency representation, the second peak frequency adjacent to the first peak frequency and within a threshold frequency difference from the first peak frequency, the threshold frequency difference corresponding to a frequency tolerance from the first peak frequency. 4. The method of claim 1 , wherein the first frequency representation is a spectrogram. 5. The method of claim 1 , wherein receiving the microphone audio signal comprises receiving the microphone audio signal as an echo reduced signal from an echo reduction device, the echo reduction device configured to reduce echo between the microphone audio signal and the playout audio signal. 6. The method of claim 1 , further comprising down-sampling, by the data processing hardware, each of the received microphone audio signal and the received playout audio signal. 7. The method of claim 1 , wherein determining the first frequency representation corresponding to the microphone audio signal and the second frequency representation corresponding to the playout audio signal comprises, for each audio signal of the microphone audio signal and the playout audio signal: dividing the audio signal into sample blocks; and determining coefficients of the respective frequency representation based on a frequency transformation of each sample block. 8. The method of claim 1 , wherein the first pair of frequencies and the second pair of frequencies each satisfy a feature frequency threshold. 9. The method of claim 1 , wherein receiving the microphone audio signal and the playout audio signal, determining the first frequency representation corresponding to the microphone audio signal and the second frequency representation corresponding to the playout audio signal, determining the first feature and the second feature for each respective frequency representation, determining that the match occurs between the first feature and the second feature, and determining that the delay value between the first feature and the second feature corresponds to the echo occur contemporaneously in real-time. 10. The method of claim 1 , further comprising removing, by the data processing hardware, the received microphone audio signal and the received playout audio signal based on determining the delay value between the first feature and the second feature corresponds to the echo. 11. A method comprising: receiving, at data processing hardware in real-time, a microphone audio signal and a playout audio signal; determining, by the data processing hardware in real-time, a first set of playout features from the playout audio signal, the first set of playout features representing a predetermined block of time from the playout audio signal, each playout feature corresponding to a pair of playout audio signal frequencies and a period of time between the pair of playout audio signal frequencies; determining, by the data processing hardware in real-time, microphone features corresponding to the received microphone audio signal, each microphone feature corresponding to a pair of microphone audio signal frequencies and a period of time between the pair of microphone audio signal frequencies; determining, by the data processing hardware in real-time, whether a match occurs between a playout feature of the first set of playout features and a first microphone feature; and when no match occurs: determining, by the data processing hardware in real time, a second set of playout audio features based the playout audio signal, the second set of playout features representing the predetermined block of time adjacent to the first set of playout features from the playout audio signal; determining, by the data processing hardware in real time, that a respective playout feature from the second set of playout features matches a second microphone feature; and identifying, by the data processing hardware in real time, that the matched second microphone feature is an echo within the microphone audio signal. 12. The method of claim 11 , wherein receiving the microphone audio signal further comprises receiving the microphone audio signal as an echo reduced signal from an echo reduction device, the echo reduction device configured to reduce echo between the microphone audio signal and the playout audio signal. 13. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving a microphone audio signal and a playout audio signal; determining a first frequency representation corresponding to the microphone audio signal and a second frequency representation corresponding to the playout audio signal; determining a first feature based on a first pair of frequencies of the first frequency representation and a period of time between the pair of frequencies; determining a second feature based on a second pair of frequencies of the second frequency representation and a period of time between the second pair of frequencies; matching the first feature based on the first frequency representation corresponding to the microphone audio signal and the second feature based on the second frequency representation corresponding to the playout audio signal; and determining that a delay value between the first feature and the second feature corresponds to an echo within the microphone audio signal. 14. The system of claim 13 , wherein determining that the delay value corresponds to the echo further determining that the delay value between the matched first feature and the matched second feature satisfies an echo threshold, the echo threshold representing a count of a particular delay value predictive of a respective echo. 15. The system of claim 13 , wherein the first pair of frequencies corresponds to a first peak frequency and a second peak frequency of the first frequency
using echo cancellers (echo cancellers per se H04B3/23) · CPC title
the noise being echo, reverberation of the speech · CPC title
Noise filtering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.