Using classified sounds and localized sound sources to operate an autonomous vehicle
US-2020241552-A1 · Jul 30, 2020 · US
US12501209B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12501209-B2 |
| Application number | US-202418643040-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 23, 2024 |
| Priority date | Oct 10, 2019 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Spatial audio signals can include audio objects that can be respectively encoded and rendered at each of multiple different depths. In an example, a method for encoding a spatial audio signal can include receiving audio scene information from an audio capture source in an environment, and receiving a depth characteristic of a first object in the environment. The depth characteristic can be determined using information from a depth sensor. A correlation can be identified between at least a portion of the audio scene information and the first object. The spatial audio signal can be encoded using the portion of the audio scene and the depth characteristic of the first object.
Opening claim text (preview).
The invention claimed is: 1 . A method comprising: receiving audio scene information from an audio capture source in an environment; parsing the audio scene information into one or more audio components; identifying dominant directions for a first physical object and a second physical object in each audio component; receiving, from a depth sensor, depth characteristic information about the first physical object and the second physical object in the environment; associating the depth characteristic information with the dominant directions; and providing a confidence indication that the audio scene information corresponds to at least one of the first physical object or the second physical object. 2 . The method of claim 1 , wherein the at least one of the one or more audio components is determined using information about signal contributions to a time-frequency representation of the received audio scene information. 3 . The method of claim 1 , further comprising determining a first direction and a reference depth, relative to the audio capture source, for the at least one of the one or more audio components. 4 . The method of claim 3 , wherein the depth characteristic information includes a measure of a confidence used to determine a correlation between the at least one of the one or more audio components and at least one of the first physical object and second physical object. 5 . The method of claim 4 , further comprising providing a first depth for the at least one of the one or more audio components using the measured confidence. 6 . The method of claim 5 , wherein providing the first depth includes: when the confidence is high, providing the first depth based on information from the depth sensor; when the confidence is low, providing the first depth as the reference depth; and when the confidence is intermediate, providing the first depth as a depth that is between the reference depth and a depth determined using the depth sensor. 7 . The method of claim 4 , wherein the confidence is determined based on a computer vision processor to classify objects identified in the environment and to determine whether the at least one of the one or more audio components includes, or is likely to include, audio from at least one of the first physical object or the second physical object. 8 . The method of claim 4 , wherein the confidence is determined based on identifying one or more data clusters in the depth characteristic information from the depth sensor, and correlating the first direction of the at least one of the one or more audio components to the identified one or more data clusters. 9 . The method of claim 1 , further comprising encoding a depth-extended ambisonic signal based on the audio scene information and the depth characteristic information for the at least one of the first physical object or the second physical object. 10 . The method of claim 1 , further comprising: determining a classification of the first physical object using an image-based object classifier. 11 . A system comprising: an audio capture source configured to capture an audio scene in an environment; a depth sensor configured to provide depth characteristic information about multiple objects in the environment relative to a reference location of the depth sensor; and a processor circuit configured to: receive audio scene information from an audio capture source in an environment; parse the audio scene information into one or more audio components; identify dominant directions for a first physical object and a second physical object in each audio component; receive, from the depth sensor, depth characteristic information about the first physical object and the second physical object in the environment; associating the depth characteristic information with the dominant directions; and providing a confidence indication that the audio scene information corresponds to at least one of the first physical object or the second physical object. 12 . The system of claim 11 , wherein the audio capture source comprises one or more of a multiple-transducer microphone, a sound field microphone, a microphone array, and an ambisonic microphone, and wherein the depth sensor comprises one or more of a laser, a modulated light source, a stereoscopic camera, a depth probe, an infrared sensor, and a camera array. 13 . The system of claim 11 , wherein the processor circuit is configured to encode a spatial audio signal as a depth-extended ambisonic signal based on the audio scene and the depth characteristic information about the first physical object. 14 . The system of claim 13 , wherein the processor circuit is configured to use a weighted combination of depth information about the multiple objects to encode the spatial audio signal. 15 . The system of claim 14 , wherein the processor circuit is configured to determine a confidence that information from the audio scene corresponds to the first physical object from among the multiple objects in the environment, and wherein the processor circuit is configured to encode the spatial audio signal based on the determined confidence meeting or exceeding a specified confidence threshold. 16 . The system of claim 15 , further comprising an object classifier circuit configured to determine a classification for the first physical object and the second physical object; and wherein the processor circuit is configured to determine a correspondence between the classification of the first physical object or the second physical object and at least one audio component of the one or more audio components, and wherein the processor circuit is configured to encode the spatial audio signal based on a value of the determined correspondence in response to the determined correspondence meeting a threshold correspondence condition. 17 . An audio signal encoder device comprising: a processor and a non-transitory computer-readable medium operably coupled thereto, the non-transitory computer-readable medium comprising instructions stored in associated therewith that are accessible to, and executable by, the processor, wherein the instructions comprise: instructions that, when executed, receive audio scene information from an audio capture source in an environment instructions that, when executed, parse the audio scene information into one or more audio components; instructions that, when executed identify dominant directions for a first physical object and a second physical object in each audio component; instructions that, when executed, receive, from a depth sensor, depth characteristic information about the first physical object and the second physical object in the environment; instructions that, when executed, associate the depth characteristic information with the dominant directions; and instructions that, when executed, provide a confidence indication that the audio scene information corresponds to at least one of the first physical object or the second physical object. 18 . The audio signal encoder device of claim 17 , further comprising instructions that, when executed, conditionally encode a spatial audio signal, including instructions that, when executed: encode the spatial audio signal based on depth information about the first physical object in the environment when the audio characteristic corresponds to a first audio component identified in the audio scene information; and encode the spatial audio signal based on a reference depth when the audio characteristic does not correspond to the firs
Direction finding using differential microphone array [DMA] · CPC title
Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups · CPC title
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title
wherein the generated image signals comprise depth maps or disparity maps · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.