Performing spatial masking with respect to spherical harmonic coefficients

US9412385B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9412385-B2
Application numberUS-201414288219-A
CountryUS
Kind codeB2
Filing dateMay 27, 2014
Priority dateMay 28, 2013
Publication dateAug 9, 2016
Grant dateAug 9, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In general, techniques are described by which to perform spatial masking with respect to spherical harmonic coefficients. As one example, an audio encoding device comprising a processor may perform various aspects of the techniques. The processor may be configured to perform spatial analysis based on the spherical harmonic coefficients describing a three-dimensional sound field to identify a spatial masking threshold. The processor may further be configured to render the multi-channel audio data from the plurality of spherical harmonic coefficients, and compress the multi-channel audio data based on the identified spatial masking threshold to generate a bitstream.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of compressing multi-channel audio data comprising: performing a spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify a spatial masking threshold; rendering multi-channel audio data from the plurality of spherical harmonic coefficients, wherein the multi-channel audio data is rendered for a dense speaker geometry such that the multi-channel audio data has a number of channels greater than a number of channels for playback via one or more speakers; and compressing the rendered multi-channel audio data based on the identified spatial masking threshold to generate a bitstream. 2. The method of claim 1 , further comprising determining a target bitrate for the bitstream, wherein compressing the rendered multi-channel audio data comprises performing, based on the target bitrate, either i) parametric inter-channel audio encoding and spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding to generate a bitstream representative of the compressed audio data. 3. The method of claim 2 , wherein performing either i) the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding comprises: determining that the target bitrate is below a threshold bitrate; and in response to determining that the target bitrate is below the threshold bitrate, performing the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold to generate the bitstream. 4. The method of claim 2 , wherein performing either i) the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding comprises: determining that the target bitrate is below a threshold bitrate; and in response to determining that the target bitrate is below the threshold bitrate, performing the spatial masking using the spatial masking threshold with respect to one or more base channels of the multi-channel audio data and performing the parametric inter-channel audio encoding with respect to the multi-channel audio data to generate the bitstream. 5. The method of claim 1 , wherein rendering the multi-channel audio data from the spherical harmonic coefficients comprises rendering 32 channels of the multi-channel audio data for 32 speakers in the dense speaker geometry from the spherical harmonic coefficients. 6. The method of claim 1 , wherein the dense speaker geometry comprises a dense T-design speaker geometry, and wherein rendering the multi-channel audio data from the spherical harmonic coefficients comprises rendering 32 channels of the multi-channel audio data corresponding to 32 speakers arranged in the dense T-design speaker geometry from the spherical harmonic coefficients. 7. The method of claim 1 , wherein compressing the rendered multi-channel audio data comprises allocating bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold. 8. The method of claim 1 , wherein compressing the rendered multi-channel audio data comprises allocating bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold and a temporal masking threshold. 9. The method of claim 1 , wherein compressing the rendered multi-channel audio data comprises performing entropy encoding based on the identified spatial masking threshold. 10. The method of claim 1 , further comprising transforming the plurality of spherical harmonic coefficients from the time domain to the frequency domain so as to generate a transformed plurality of spherical harmonic coefficients, wherein rendering the multi-channel audio data comprises rendering the multi-channel audio data from the transformed plurality of spherical harmonic coefficients. 11. An audio encoding device comprising: one or more processors configured to perform a spatial analysis based on a plurality of spherical harmonic coefficients that describe a three-dimensional sound field to identify spatial masking thresholds, render multi-channel audio data from the plurality of spherical harmonic coefficients, wherein the multi-channel audio data is rendered for a dense speaker geometry such that the multi-channel audio data has a number of channels greater than a number of channels for playback via one or more speakers, and compress the rendered multi-channel audio data based on the identified spatial masking thresholds to generate a bitstream. 12. The audio encoding device of claim 11 , wherein the one or more processors are further configured to determine a target bitrate for the bitstream, and wherein the one or more processors are configured to perform, based on the target bitrate, either i) parametric inter-channel audio encoding and spatial masking using the spatial masking threshold or ii) the spatial masking using the spatial masking threshold without performing the parametric inter-channel audio encoding to generate a bitstream representative of the compressed audio data. 13. The audio encoding device of claim 12 , wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the parametric inter-channel audio encoding and the spatial masking using the spatial masking threshold to generate the bitstream. 14. The audio encoding device of claim 12 , wherein the one or more processors are configured to determine that the target bitrate is below a threshold bitrate, and in response to determining that the target bitrate is below the threshold bitrate, perform the spatial masking using the spatial masking threshold with respect to one or more base channels of the multi-channel audio data and performing the parametric inter-channel audio encoding with respect to the multi-channel audio data to generate the bitstream. 15. The audio encoding device of claim 11 , wherein the one or more processors are further configured to render 32 channels of the multi-channel audio data for 32 speakers arranged in the dense speaker geometry from the spherical harmonic coefficients. 16. The audio encoding device of claim 11 , wherein the dense speaker geometry comprises a dense T-design speaker geometry, and wherein the one or more processors are further configured to render 32 channels of the multi-channel audio data corresponding to 32 speakers arranged in the dense T-design from the spherical harmonic coefficients. 17. The audio encoding device of claim 11 , wherein the one or more processors are further configured to allocate bits in the bitstream for either a time-based representation of the multi-channel audio data or a frequency-based representation of the multi-channel audio data based on the spatial masking threshold. 18. The audio encoding device of claim 11 , wherein the one or more processors are further configured to allocate bits in the bitstream for either a time-based representation of

Assignees

Inventors

Classifications

  • using orthogonal transformation · CPC title

  • G10L19/008Primary

    Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9412385B2 cover?
In general, techniques are described by which to perform spatial masking with respect to spherical harmonic coefficients. As one example, an audio encoding device comprising a processor may perform various aspects of the techniques. The processor may be configured to perform spatial analysis based on the spherical harmonic coefficients describing a three-dimensional sound field to identify a sp…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G10L19/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).