Sound source localization using wave decomposition

US11425495B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11425495-B1
Application numberUS-202117234233-A
CountryUS
Kind codeB1
Filing dateApr 19, 2021
Priority dateApr 19, 2021
Publication dateAug 23, 2022
Grant dateAug 23, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system that performs sound source localization (SSL) using acoustic wave decomposition (AWD) or an approximation. When a device detects a wakeword represented in audio data, the device performs SSL processing in order to determine a position of the user relative to the device (e.g., estimate angle of the user). The device calculates noise statistics based on first audio data representing the wakeword and second audio data preceding the wakeword. Thus, upon detecting the wakeword, the device calculates the noise statistics and a signal quality metric corresponding to the wakeword. In addition, the device uses Multi-Channel Linear Prediction Coding (MCLPC) coefficients to average out the room impulse response. Using the noise statistics, the MCLPC coefficients, and the audio data, the device performs AWD processing to decompose the sound field to disjoint acoustic plane waves, enabling the device to identify the most likely direction for the line-of-sight component of speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, the method comprising: receiving first audio data, a first portion of the first audio data corresponding to a first microphone of a device and a second portion of the first audio data corresponding to a second microphone of the device; determining first coefficient data associated with the first audio data, the first coefficient data corresponding to the first microphone and the second microphone; detecting speech represented during a first period of time within the first audio data, the speech generated by a user; determining first energy data associated with a second period of time within the first audio data, the second period of time preceding the first period of time; determining, using the first audio data, first weight data; determining, using the first coefficient data, second weight data; and determining, using the first weight data, the second weight data, and the first energy data, that the user is in a first direction relative to the device. 2. The computer-implemented method of claim 1 , wherein determining that the user is in the first direction further comprises: determining first signal quality metric data using the first energy data and second energy data, the second energy data associated with a first portion of the first period of time; and generating, using the first weight data and the first signal quality metric data, first data, the first data indicating that the first direction corresponds to a first local maxima of a first function. 3. The computer-implemented method of claim 2 , wherein determining that the user is in the first direction further comprises: determining second signal quality metric data using the first energy data and third energy data, the third energy data associated with a second portion of the first period of time; generating, using the first weight data and the second signal quality metric data, second data, the second data indicating that a second direction corresponds to a second local maxima of a second function; and determining, based on the first data and the second data, that the user is in the first direction. 4. The computer-implemented method of claim 1 , wherein determining that the user is in the first direction further comprises: determining first signal quality metric data using the first energy data and second energy data, the second energy data associated with the first period of time; generating, using the first weight data and the first signal quality metric data, first data, the first data indicating that the first direction corresponds to a first local maxima of a first function; determining first variance data corresponding to the first data; and determining, based on the first data and the first variance data, that the user is in the first direction. 5. The computer-implemented method of claim 4 , wherein determining that the user is in the first direction further comprises: generating, using the second weight data and the first signal quality metric data, second data, the second data indicating that a second direction corresponds to a second local maxima of a second function; determining second variance data corresponding to the second data; and determining, using the first data, the first variance data, the second data, and the second variance data, that the user is in the first direction. 6. The computer-implemented method of claim 1 , further comprises: determining that a beginning of the first period of time corresponds to a beginning of the speech; determining second energy data associated with the first period of time; and determining signal quality metric data using the first energy data and the second energy data. 7. The computer-implemented method of claim 1 , further comprising: determining first signal quality metric data using the first energy data and second energy data, the second energy data associated with the first period of time; generating, using the second weight data and the first signal quality metric data, first data, the first data including a first mean value and a first variance value; determining a first signal quality metric value using the first signal quality metric data; determining that the first signal quality metric value is below a threshold value; determining a second variance value by multiplying the first variance value by a first value; and determining, based on the first mean value and the second variance value, that the user is in the first direction. 8. The computer-implemented method of claim 1 , further comprising: receiving image data from a camera associated with the device; detecting an object represented in the image data, the object being in a second direction relative to the device; generating a weighting vector that associates the second direction with a first value and remaining directions with a second value; and determining, based on the first weight data, the second weight data, the first energy data, and the weighting vector, that the user is in the first direction relative to the device. 9. The computer-implemented method of claim 1 , further comprising: receiving first sensor data indicating that the device is in a first orientation; determining first acoustic characteristics data corresponding to the first orientation; determining the first weight data using the first acoustic characteristics data and the first audio data, the first weight data associated with a first portion of the first period of time; receiving second sensor data indicating that the device is in a second orientation; determining second acoustic characteristics data corresponding to the second orientation; determining third weight data using the second acoustic characteristics data and the first audio data, the third weight data associated with a second portion of the first period of time; and determining, using the third weight data, that the user is in a second direction relative to the device during the second portion of the first period of time. 10. A system comprising: at least one processor; and memory including instructions operable to be executed by the at least one processor to cause the system to: receive first audio data, a first portion of the first audio data corresponding to a first microphone of a device and a second portion of the first audio data corresponding to a second microphone of the device; determine first coefficient data associated with the first audio data, the first coefficient data corresponding to the first microphone and the second microphone; detect speech represented during a first period of time within the first audio data, the speech generated by a user; determine first energy data associated with a second period of time within the first audio data, the second period of time preceding the first period of time; determine, using the first audio data, first weight data; determine, using the first coefficient data, second weight data; and determine, using the first weight data, the second weight data, and the first energy data, that the user is in a first direction relative to the device. 11. The system of claim 10 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine first signal quality metric data using the first energy data and second energy data, the second energy data associated with a first portion of the first period of time; and generate, using the first weight data and the first signal quality metric data, first data, the first data indicating that the first direction corresponds to a first local maxima of a first function. 1

Assignees

Inventors

Classifications

  • Hearing devices using active noise cancellation · CPC title

  • Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title

  • Communication between hearing aids and external devices via a network for data exchange · CPC title

  • H04R1/406Primary

    microphones · CPC title

  • Use of position data from wide-area or local-area positioning systems in hearing devices, e.g. program or information selection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11425495B1 cover?
A system that performs sound source localization (SSL) using acoustic wave decomposition (AWD) or an approximation. When a device detects a wakeword represented in audio data, the device performs SSL processing in order to determine a position of the user relative to the device (e.g., estimate angle of the user). The device calculates noise statistics based on first audio data representing the …
Who is the assignee on this patent?
Amazon Tech Inc, Amazon Tech
What technology area does this patent fall under?
Primary CPC classification H04R1/406. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).