Method and apparatus for compressing and decompressing a higher order ambisonics representation
US-2016088415-A1 · Mar 24, 2016 · US
US9412385B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9412385-B2 |
| Application number | US-201414288219-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 27, 2014 |
| Priority date | May 28, 2013 |
| Publication date | Aug 9, 2016 |
| Grant date | Aug 9, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In general, techniques are described by which to perform spatial masking with respect to spherical harmonic coefficients. As one example, an audio encoding device comprising a processor may perform various aspects of the techniques. The processor may be configured to perform spatial analysis based on the spherical harmonic coefficients describing a three-dimensional sound field to identify a spatial masking threshold. The processor may further be configured to render the multi-channel audio data from the plurality of spherical harmonic coefficients, and compress the multi-channel audio data based on the identified spatial masking threshold to generate a bitstream.
Opening claim text (preview).
The invention claimed is: 1. A method of compressing multi-channel audio data comprising: performing a spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify a spatial masking threshold; rendering multi-channel audio data from the plurality of spherical harmonic coefficients, wherein the multi-channel audio data is rendered for a dense speaker geometry such that the multi-channel audio data has a number of channels greater than a number of channels for playback via one or more speakers; and compressing the rendered multi-channel audio data based on the identified spatial masking threshold to generate a bitstream. 2. The method of claim 1 , further comprising determining a target bitrate for the bitstream, wherein compressing the rendered multi-channel audio data comprises performing, based on the target bitrate, either i) parametric inter-channel audio encoding and spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding to generate a bitstream representative of the compressed audio data. 3. The method of claim 2 , wherein performing either i) the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding comprises: determining that the target bitrate is below a threshold bitrate; and in response to determining that the target bitrate is below the threshold bitrate, performing the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold to generate the bitstream. 4. The method of claim 2 , wherein performing either i) the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding comprises: determining that the target bitrate is below a threshold bitrate; and in response to determining that the target bitrate is below the threshold bitrate, performing the spatial masking using the spatial masking threshold with respect to one or more base channels of the multi-channel audio data and performing the parametric inter-channel audio encoding with respect to the multi-channel audio data to generate the bitstream. 5. The method of claim 1 , wherein rendering the multi-channel audio data from the spherical harmonic coefficients comprises rendering 32 channels of the multi-channel audio data for 32 speakers in the dense speaker geometry from the spherical harmonic coefficients. 6. The method of claim 1 , wherein the dense speaker geometry comprises a dense T-design speaker geometry, and wherein rendering the multi-channel audio data from the spherical harmonic coefficients comprises rendering 32 channels of the multi-channel audio data corresponding to 32 speakers arranged in the dense T-design speaker geometry from the spherical harmonic coefficients. 7. The method of claim 1 , wherein compressing the rendered multi-channel audio data comprises allocating bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold. 8. The method of claim 1 , wherein compressing the rendered multi-channel audio data comprises allocating bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold and a temporal masking threshold. 9. The method of claim 1 , wherein compressing the rendered multi-channel audio data comprises performing entropy encoding based on the identified spatial masking threshold. 10. The method of claim 1 , further comprising transforming the plurality of spherical harmonic coefficients from the time domain to the frequency domain so as to generate a transformed plurality of spherical harmonic coefficients, wherein rendering the multi-channel audio data comprises rendering the multi-channel audio data from the transformed plurality of spherical harmonic coefficients. 11. An audio encoding device comprising: one or more processors configured to perform a spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify spatial masking thresholds, render multi-channel audio data from the plurality of spherical harmonic coefficients, wherein the multi-channel audio data is rendered for a dense speaker geometry such that the multi-channel audio data has a number of channels greater than a number of channels for playback via one or more speakers, and compress the rendered multi-channel audio data based on the identified spatial masking thresholds to generate a bitstream. 12. The audio encoding device of claim 11 , wherein the one or more processors are further configured to determine a target bitrate for the bitstream, and wherein the one or more processors are configured to perform, based on the target bitrate, either i) parametric inter-channel audio encoding and spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding to generate a bitstream representative of the compressed audio data. 13. The audio encoding device of claim 12 , wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold to generate the bitstream. 14. The audio encoding device of claim 12 , wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the spatial masking using the spatial masking threshold with respect to one or more base channels of the multi-channel audio data and performing the parametric inter-channel audio encoding with respect to the multi-channel audio data to generate the bitstream. 15. The audio encoding device of claim 11 , wherein the one or more processors are further configured to render 32 channels of the multi-channel audio data for 32 speakers arranged in the dense speaker geometry from the spherical harmonic coefficients. 16. The audio encoding device of claim 11 , wherein the dense speaker geometry comprises a dense T-design speaker geometry, and wherein the one or more processors are further configured to render 32 channels of the multi-channel audio data corresponding to 32 speakers arranged in the dense T-design from the spherical harmonic coefficients. 17. The audio encoding device of claim 11 , wherein the one or more processors are further configured to allocate bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold. 18. The audio encoding device of claim 11 , wherein the one or more processors are further configured to allocate bits in the bitstream for either a time-based representation of
using orthogonal transformation · CPC title
Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.