Spatial audio capture and analysis with depth

US12501209B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12501209-B2
Application numberUS-202418643040-A
CountryUS
Kind codeB2
Filing dateApr 23, 2024
Priority dateOct 10, 2019
Publication dateDec 16, 2025
Grant dateDec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Spatial audio signals can include audio objects that can be respectively encoded and rendered at each of multiple different depths. In an example, a method for encoding a spatial audio signal can include receiving audio scene information from an audio capture source in an environment, and receiving a depth characteristic of a first object in the environment. The depth characteristic can be determined using information from a depth sensor. A correlation can be identified between at least a portion of the audio scene information and the first object. The spatial audio signal can be encoded using the portion of the audio scene and the depth characteristic of the first object.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A method comprising: receiving audio scene information from an audio capture source in an environment; parsing the audio scene information into one or more audio components; identifying dominant directions for a first physical object and a second physical object in each audio component; receiving, from a depth sensor, depth characteristic information about the first physical object and the second physical object in the environment; associating the depth characteristic information with the dominant directions; and providing a confidence indication that the audio scene information corresponds to at least one of the first physical object or the second physical object. 2 . The method of claim 1 , wherein the at least one of the one or more audio components is determined using information about signal contributions to a time-frequency representation of the received audio scene information. 3 . The method of claim 1 , further comprising determining a first direction and a reference depth, relative to the audio capture source, for the at least one of the one or more audio components. 4 . The method of claim 3 , wherein the depth characteristic information includes a measure of a confidence used to determine a correlation between the at least one of the one or more audio components and at least one of the first physical object and second physical object. 5 . The method of claim 4 , further comprising providing a first depth for the at least one of the one or more audio components using the measured confidence. 6 . The method of claim 5 , wherein providing the first depth includes: when the confidence is high, providing the first depth based on information from the depth sensor; when the confidence is low, providing the first depth as the reference depth; and when the confidence is intermediate, providing the first depth as a depth that is between the reference depth and a depth determined using the depth sensor. 7 . The method of claim 4 , wherein the confidence is determined based on a computer vision processor to classify objects identified in the environment and to determine whether the at least one of the one or more audio components includes, or is likely to include, audio from at least one of the first physical object or the second physical object. 8 . The method of claim 4 , wherein the confidence is determined based on identifying one or more data clusters in the depth characteristic information from the depth sensor, and correlating the first direction of the at least one of the one or more audio components to the identified one or more data clusters. 9 . The method of claim 1 , further comprising encoding a depth-extended ambisonic signal based on the audio scene information and the depth characteristic information for the at least one of the first physical object or the second physical object. 10 . The method of claim 1 , further comprising: determining a classification of the first physical object using an image-based object classifier. 11 . A system comprising: an audio capture source configured to capture an audio scene in an environment; a depth sensor configured to provide depth characteristic information about multiple objects in the environment relative to a reference location of the depth sensor; and a processor circuit configured to: receive audio scene information from an audio capture source in an environment; parse the audio scene information into one or more audio components; identify dominant directions for a first physical object and a second physical object in each audio component; receive, from the depth sensor, depth characteristic information about the first physical object and the second physical object in the environment; associating the depth characteristic information with the dominant directions; and providing a confidence indication that the audio scene information corresponds to at least one of the first physical object or the second physical object. 12 . The system of claim 11 , wherein the audio capture source comprises one or more of a multiple-transducer microphone, a sound field microphone, a microphone array, and an ambisonic microphone, and wherein the depth sensor comprises one or more of a laser, a modulated light source, a stereoscopic camera, a depth probe, an infrared sensor, and a camera array. 13 . The system of claim 11 , wherein the processor circuit is configured to encode a spatial audio signal as a depth-extended ambisonic signal based on the audio scene and the depth characteristic information about the first physical object. 14 . The system of claim 13 , wherein the processor circuit is configured to use a weighted combination of depth information about the multiple objects to encode the spatial audio signal. 15 . The system of claim 14 , wherein the processor circuit is configured to determine a confidence that information from the audio scene corresponds to the first physical object from among the multiple objects in the environment, and wherein the processor circuit is configured to encode the spatial audio signal based on the determined confidence meeting or exceeding a specified confidence threshold. 16 . The system of claim 15 , further comprising an object classifier circuit configured to determine a classification for the first physical object and the second physical object; and wherein the processor circuit is configured to determine a correspondence between the classification of the first physical object or the second physical object and at least one audio component of the one or more audio components, and wherein the processor circuit is configured to encode the spatial audio signal based on a value of the determined correspondence in response to the determined correspondence meeting a threshold correspondence condition. 17 . An audio signal encoder device comprising: a processor and a non-transitory computer-readable medium operably coupled thereto, the non-transitory computer-readable medium comprising instructions stored in associated therewith that are accessible to, and executable by, the processor, wherein the instructions comprise: instructions that, when executed, receive audio scene information from an audio capture source in an environment instructions that, when executed, parse the audio scene information into one or more audio components; instructions that, when executed identify dominant directions for a first physical object and a second physical object in each audio component; instructions that, when executed, receive, from a depth sensor, depth characteristic information about the first physical object and the second physical object in the environment; instructions that, when executed, associate the depth characteristic information with the dominant directions; and instructions that, when executed, provide a confidence indication that the audio scene information corresponds to at least one of the first physical object or the second physical object. 18 . The audio signal encoder device of claim 17 , further comprising instructions that, when executed, conditionally encode a spatial audio signal, including instructions that, when executed: encode the spatial audio signal based on depth information about the first physical object in the environment when the audio characteristic corresponds to a first audio component identified in the audio scene information; and encode the spatial audio signal based on a reference depth when the audio characteristic does not correspond to the firs

Assignees

Inventors

Classifications

  • Direction finding using differential microphone array [DMA] · CPC title

  • Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • G10L19/008Primary

    Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title

  • H04N13/271Primary

    wherein the generated image signals comprise depth maps or disparity maps · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12501209B2 cover?
Spatial audio signals can include audio objects that can be respectively encoded and rendered at each of multiple different depths. In an example, a method for encoding a spatial audio signal can include receiving audio scene information from an audio capture source in an environment, and receiving a depth characteristic of a first object in the environment. The depth characteristic can be dete…
Who is the assignee on this patent?
Dts Inc
What technology area does this patent fall under?
Primary CPC classification G10L19/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).