Audio-driven viewport selection
US-2019005986-A1 · Jan 3, 2019 · US
US10469968B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10469968-B2 |
| Application number | US-201715782252-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 12, 2017 |
| Priority date | Oct 12, 2017 |
| Publication date | Nov 5, 2019 |
| Grant date | Nov 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In general, techniques are described for adapting higher order ambisonic audio data to include three degrees of freedom plus effects. An example device configured to perform the techniques includes a memory, and a processor coupled to the memory. The memory may be configured to store higher order ambisonic audio data representative of a soundfield. The processor may be configured to obtain a translational distance representative of a translational head movement of a user interfacing with the device. The processor may further be configured to adapt, based on the translational distance, higher order ambisonic audio data to provide three degrees of freedom plus effects that adapt the soundfield to account for the translational head movement, and generate speaker feeds based on the adapted higher order ambient audio data.
Opening claim text (preview).
What is claimed is: 1. A device comprising: a memory configured to store higher order ambisonic audio data representative of a soundfield; and one or more processors coupled to the memory, and configured to: receive an original reference distance; determine reference points, wherein the reference points are anchor points, positioned at the original reference distance relative to the head of a user prior to a translational movement of the user interfacing with the device; obtain a translational distance representative of a translational head movement of a user interfacing with the device; determine, after the translational distance has been obtained, updated distances between each of the anchor points, positioned at the original reference distance, and the head of the user; determine, prior to application of a renderer, based on each of the updated distances, an effects matrix; generate higher order ambisonic audio data, based on the effects matrix, to adapt the soundfield to account for the translational distance obtained; and generate speaker feeds based on the adapted generated higher order ambisonic audio data. 2. The device of claim 1 , wherein the translational distance representative of the translational head movement is defined in either a two-dimensional spatial coordinate space or a three-dimensional spatial coordinate space. 3. The device of claim 1 , wherein the one or more processors are further configured to multiply the effects matrix by a rendering matrix to obtain an updated rendering matrix, wherein the one or more processors are configured to apply the updated rendering matrix to the higher order ambisonic audio data to 1) provide one or more effects that adapt the soundfield to account for the translational distance and 2) generate the speaker feeds. 4. The device of claim 1 , wherein the plurality of anchor points are uniformly distributed on a surface of a sphere having a radius equal to the original reference distance. 5. The device of claim 1 , wherein three-dimensional video data including a depth map is associated with the higher order ambisonic audio data, and wherein the one or more processors are configured to determine the original reference distance of each of the anchor points based on the depth map prior to determining each of the updated distances between each of the anchor points, positioned at the original reference distance, and the head of the user. 6. The device of claim 1 , wherein three-dimensional video data including a depth map is associated with the higher order ambisonic audio data, and wherein the one or more processors are configured to obtain, based on the depth map, the updated distances between each of the anchor points, positioned at the original reference distance, and the head of the user. 7. The device of claim 1 , wherein the one or more processors are coupled to speakers of the device, wherein the device is a wearable device, wherein the one or more processors are configured to apply, the renderer, wherein the renderer is a binaural renderer to the adapted higher order ambisonic audio data to generate the speaker feeds, and wherein the one or more processors are further configured to output the speaker feeds to the speakers. 8. The device of claim 7 , wherein the wearable device is a watch, glasses, headphones, an augmented reality (AR) headset, a virtual reality (VR) headset, or an extended reality (XR) headset. 9. The device of claim 1 , wherein the one or more processors are-further configured to obtain a rotation indication indicative of a rotational head movement of the user interfacing with the device, and wherein the one or more processors are-configured to, based on the translational distance and the rotation indication, adapt the soundfield to account for the translational distance and the rotational indication. 10. The device of claim 1 , wherein the one or more processors are configured to obtain the translational distance using motion sensors that sense the translational head movement. 11. The device of claim 1 , wherein the one or more processors are configured to obtain the translational distance based on images captured by a camera coupled to the one or more processors. 12. The device of claim 1 , wherein the higher order ambisonic audio data comprises higher order ambisonic coefficients associated with spherical basis functions having an order of one or less, higher order ambisonic coefficients associated with spherical basis functions having a mixed order and suborder, or higher order ambisonic coefficients associated with spherical basis functions having an order greater than one. 13. The device of claim 1 , wherein the receive an original reference distance is configurable for playback of higher order ambisonic audio data, and, is based on a user input. 14. The device of claim 1 , wherein the original reference distance is static. 15. The device of claim 1 , wherein the original reference distance is dynamic. 16. The device of claim 1 , wherein the receive an original reference distance is received as a syntax element of a bitstream, to be decoded by a playback device. 17. A method comprising: receiving an original reference distance; determining reference points, wherein the reference points are anchor points, positioned at the original reference distance relative to the head of a user prior to a translational movement of the user interfacing with a device; obtaining a translational distance representative of a translational head movement of the user interfacing with the device; determining, after the translational distance has been obtained, updated distances between each of the anchor points, positioned at the original reference distance, and the head of the user; determining, prior to application of a renderer, based on each of the updated distances, an effects matrix; generating higher order ambisonic audio data, based on the effects matrix, to adapt a soundfield represented by the higher order ambisonic audio data to account for the translational distance obtained; and generating speaker feeds based on the generated higher order ambisonic audio data. 18. The method of claim 17 , wherein the translational distance representative of the translational head movement is defined in either a two-dimensional spatial coordinate space or a three dimensional spatial coordinate space. 19. The method of claim 17 , further comprising multiplying the effects matrix by a rendering matrix to obtain an updated rendering matrix, generating the speaker feeds comprises applying the updated rendering matrix to the higher order ambisonic audio data to 1) provide one or more effects that adapt the soundfield to account for the translational distance and 2) generate the speaker feeds. 20. The method of claim 17 , wherein the plurality of anchor points are uniformly distributed on a surface of a sphere having a radius equal to the original reference distance. 21. The method of claim 17 , wherein three-dimensional video data including a depth map is associated with the higher order ambisonic audio data, and wherein the determining the original reference distance of each of the anchor points is based on the depth map prior to determining each of the updated distances between each of the anchor points, positioned at the original reference distance, and the head of the user. 22. The method of claim 17 , wherein three-dimensional video data including a depth map is associated with the
Tracking of listener position or orientation · CPC title
Application of parametric coding in stereophonic audio systems · CPC title
Positioning of individual sound objects, e.g. moving airplane, within a sound field (H04S2420/13 takes precedence) · CPC title
Aspects of sound capture and related signal processing for recording or reproduction · CPC title
Head tracking input arrangements · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.