Concept for generating an enhanced sound-field description or a modified sound field description using a depth-extended dirac technique or other techniques
US-2020154229-A1 · May 14, 2020 · US
US12490040B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12490040-B2 |
| Application number | US-202318204630-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 1, 2023 |
| Priority date | Jun 1, 2023 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Devices and methods for determining a direction of audio arrival from Ambisonics channels using azimuth and elevation segments is described herein. A method includes generating multiple blocks of samples from Ambisonics signals for a time interval, determining an azimuth angle estimate and an elevation angle estimate for the time interval when a defined number of blocks in the multiple blocks of samples are valid, generating the azimuth angle estimate based on maximum number of azimuth angle estimates present in an azimuth segment amongst a defined number of azimuth segments, and generating the elevation angle estimates based on maximum number of elevation angle estimates present in an elevation segment amongst a defined number of elevation segments, where the direction of arrival of the Ambisonics signals is based on the azimuth angle estimate and the elevation angle estimate.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: generating multiple blocks of samples from Ambisonics signals for a time interval; determining an azimuth angle estimate and an elevation angle estimate for the time interval when a defined number of blocks in the multiple blocks of samples are valid; generating the azimuth angle estimate based on maximum number of azimuth angle estimates present in an azimuth segment amongst a defined number of azimuth segments; generating the elevation angle estimate based on maximum number of elevation angle estimates present in an elevation segment amongst a defined number of elevation segments; determining, for the azimuth angle estimate, an azimuth angle estimation confidence level based on a ratio of the maximum number of azimuth angle estimates present in the azimuth segment and a total number of azimuth angle estimates present in the defined number of azimuth segments; and determining, for the elevation angle estimation, an elevation angle estimation confidence level based on a ratio of the maximum number of elevation angle estimates present in the elevation segment and a total number of elevation angle estimates present in the defined number of elevation segments, wherein a direction of arrival of the Ambisonics signals is based on the azimuth angle estimate and the elevation angle estimate. 2 . The method of claim 1 , further comprising: transforming, for a block of samples, at least the block of samples into a defined number of frequency domain bins; and performing, in a subset of the defined number of frequency domain bins, a spectral analysis to determine whether the block of samples is valid. 3 . The method of claim 2 , wherein a frequency domain bin has a spectral value which is representative of an energy level in the frequency domain bin. 4 . The method of claim 3 , wherein the performing is further comprising: designating the block of samples as valid if, for a W channel in the Ambisonics signals, a total energy across the subset of the defined number of frequency domain bins exceeds a defined active signal threshold. 5 . The method of claim 4 , wherein the performing is further comprising: determining, for an X channel in the Ambisonics signals, an X signal threshold based on an X channel energy contribution to a W channel energy; determining, for a Y channel in the Ambisonics signals, a Y signal threshold based on an Y channel energy contribution to the W channel energy; determining, for a Z channel in the Ambisonics signals, a Z signal threshold based on an Z channel energy contribution to the W channel energy; and designating a frequency domain bin in the subset of the defined number of frequency domain bins as a key frequency domain bin if at least one of an X channel energy value exceeds the X signal threshold, a Y channel energy value exceeds the Y signal threshold, or a Z channel energy value exceeds the Z signal threshold. 6 . The method of claim 1 , further comprising: transforming a block of samples into a defined number of frequency domain bins; and designating the block of samples as valid if, for a W channel in the Ambisonics signals, a total energy across a subset of the defined number of frequency domain bins exceeds a defined active signal threshold. 7 . The method of claim 1 , wherein the defined number of azimuth segments defines a 360° azimuth space. 8 . The method of claim 7 , wherein the defined number of elevation segments defines an elevation space between +90° to −90°. 9 . The method of claim 1 , further comprising: applying rotation parameters to the Ambisonics signals prior to the determining the azimuth angle estimate and the elevation angle estimate when the method is performed separately or remote from an image capture device. 10 . An image capture device, comprising: a plurality of microphones; and a processor configured to receive Ambisonics signals for a time interval from the plurality of microphones, wherein the processor is further configured to: divide the time interval into a plurality of blocks; determine an azimuth angle estimate and an elevation angle estimate for the time interval when a subset of blocks in the plurality of blocks are valid; identify the azimuth angle estimate based on a maximum number of azimuth angle estimates present in an azimuth segment from a plurality of azimuth segments; identify the elevation angle estimate based on a maximum number of elevation angle estimates present in an elevation segment from a plurality of elevation segments; report, for the azimuth angle estimate, an azimuth angle estimate confidence level based on a ratio of the maximum number of azimuth angle estimates present in the azimuth segment and a total number of azimuth angle estimates present in the plurality of azimuth segments; and report, for the elevation angle estimate, an elevation angle estimate confidence level based on a ratio of the maximum number of elevation angle estimates present in the elevation segment and a total number of elevation angle estimates present in the plurality of elevation segments, wherein a direction of arrival of the Ambisonics signals is based on the azimuth angle estimate and the elevation angle estimate. 11 . The image capture device of claim 10 , wherein the processor is further configured to: generate a plurality of frequency domain bins from a block; and determine whether the block is valid by performing an energy analysis in a subset of the plurality of frequency domain bins. 12 . The image capture device of claim 11 , wherein the processor is further configured to: determine that the block is valid if, for a W channel in the Ambisonics signals, a total energy across the subset of the plurality of frequency domain bins exceeds an active signal threshold. 13 . The image capture device of claim 11 , wherein the processor is further configured to: determine, for an X channel in the Ambisonics signals, an X signal threshold based on an X channel energy contribution to a W channel energy; determine, for a Y channel in the Ambisonics signals, a Y signal threshold based on an Y channel energy contribution to the W channel energy; determine, for a Z channel in the Ambisonics signals, a Z signal threshold based on an Z channel energy contribution to the W channel energy; and designate a frequency domain bin in the subset of the plurality of frequency domain bins as a key frequency domain bin if at least one of an X channel energy value exceeds the X signal threshold, a Y channel energy value exceeds the Y signal threshold, or a Z channel energy value exceeds the Z signal threshold. 14 . The image capture device of claim 11 , wherein the processor is further configured to: generate a plurality of frequency domain bins from a block; and determine that the block is valid if, for a W channel in the Ambisonics signals, a total energy across a subset of the plurality of frequency domain bins exceeds an active signal threshold. 15 . The image capture device of claim 11 , wherein the plurality of azimuth segments defines a 360° azimuth space. 16 . The image capture device of claim 11 , wherein the plurality of elevation segments defines an elevation space between +90° to −90°. 17 . The image capture device of claim 11 , wherein the processor is further configured to: forego application of rotation parameters to the Ambisonics signals prior to determination of the azimuth angle estimate and the elevation angle estimate when processing is performed on the image capture device.
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
Aspects of sound capture and related signal processing for recording or reproduction · CPC title
Application of ambisonics in stereophonic audio systems · CPC title
Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved · CPC title
Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.