Systems, methods, apparatus, and computer-readable media for backward-compatible audio coding
US-9788133-B2 · Oct 10, 2017 · US
US9397771B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9397771-B2 |
| Application number | US-201113333461-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 21, 2011 |
| Priority date | Dec 21, 2010 |
| Publication date | Jul 19, 2016 |
| Grant date | Jul 19, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Representations of spatial audio scenes using higher-order Ambisonics HOA technology typically require a large number of coefficients per time instant. This data rate is too high for most practical applications that require real-time transmission of audio signals. According to the invention, the compression is carried out in spatial domain instead of HOA domain. The (N+1) 2 input HOA coefficients are transformed into (N+1) 2 equivalent signals in spatial domain, and the resulting (N+1) 2 time-domain signals are input to a bank of parallel perceptual codecs. At decoder side, the individual spatial-domain signals are decoded, and the spatial-domain coefficients are transformed back into HOA domain in order to recover the original HOA representation.
Opening claim text (preview).
The invention claimed is: 1. A method for carrying out an encoding on received successive frames of a higher-order Ambisonics representation of a 2- or 3-dimensional sound field, denoted as_HOA coefficients, said method comprising: transforming a number of O=(N+1) 2 input HOA coefficients of a frame into a number of O spatial domain signals representing a regular distribution of reference points on a sphere, wherein N is an order of said input HOA coefficients and is greater or equal to 3, and each one of said O spatial domain signals represents a set of plane waves which come from associated directions in space; encoding each one of said O spatial domain signals using perceptual compression encoding steps or stages, thereby using encoding parameters selected such that a coding error is inaudible; and multiplexing the encoded spatial domain signals of the frame into a joint bit stream for providing improved lossy compression of HOA representations of audio scenes. 2. The method according to claim 1 , wherein a masking used in said perceptual compression encoding is a psycho-acoustic masking and is a combination of time-frequency masking and spatial masking. 3. The method according to claim 1 , wherein said transforming into O spatial domain signals is plane wave decomposition. 4. The method according to claim 1 , wherein said encoding of each of said O spatial domain signals corresponds to the MPEG-1 Audio Layer III or AAC or Dolby AC-3 standard. 5. An apparatus for carrying out an encoding on received successive frames of a higher order Ambisonics representation of a 2- or 3-dimensional sound field, denoted as HOA coefficients, said apparatus comprising: a transformer configured to transform a number O=(N+1) 2 input HOA coefficients of a frame into a number of O spatial domain signals representing a regular distribution of reference points on a sphere, wherein N is an order of said input HOA coefficients and is greater or equal to 3, and each one of said spatial domain signals represents a set of plane waves which come from associated directions in space; encoders configured to encode each one of said O spatial domain signals using perceptual compression encoding steps or stages, thereby using encoding parameters selected such that a coding error is inaudible; and a hardware multiplexer configured to multiplex the encoded spatial domain signals of the frame into a joint bit stream for providing improved lossy compression of HOA representations of audio scenes. 6. The apparatus according to claim 5 , wherein a masking used in said perceptual compression encoding is a psycho-acoustic masking and is a combination of time-frequency masking and spatial masking. 7. The apparatus according to claim 5 , wherein said transformation is a plane wave decomposition. 8. The apparatus according to claim 5 , wherein said perceptual encoding corresponds to the MPEG-1 Audio Layer III or AAC or Dolby AC-3 standard. 9. A method for decoding received successive frames of a perceptual compression encoded higher-order Ambisonics representation of a 2- or 3-dimensional sound field, which was encoded according to claim 1 , said decoding comprising: de-multiplexing a received joint bit stream into a number of O=(N+1) 2 perceptual compression encoded spatial domain signals; decoding each one of said O encoded spatial domain signals into a corresponding decoded spatial domain signal using perceptual compression decoding steps or stages corresponding to a selected encoding type and using decoding parameters matching the encoding parameters, wherein said O decoded spatial domain signals represent a regular distribution of reference points on a sphere; and transforming said O decoded spatial domain signals into O output HOA coefficients of a frame, wherein N is an order of said output HOA coefficients for providing improved lossy compression of HOA representations of audio scenes. 10. The method according to claim 9 , wherein said decoding of each one of said O encoded spatial domain signals corresponds to the MPEG-1 Audio Layer III or AAC or Dolby AC-3 standard. 11. An apparatus for decoding received successive frames of a perceptual compression encoded higher-order Ambisonics representation of a 2- or 3-dimensional sound field, which was encoded according to claim 1 , said apparatus comprising: a hardware demultiplexer which demultiplexes a received joint bit stream into O=(N+1) 2 perceptual compression encoded spatial domain signals; decoders which decode each one of said O encoded spatial domain signals into a corresponding decoded spatial domain signal using perceptual compression decoding steps or stages corresponding to a selected encoding type and using decoding parameters matching the encoding parameters, wherein said O decoded spatial domain signals represent a regular distribution of reference points on a sphere; and a transformer transforming said O decoded spatial domain signals into O output HOA coefficients of a frame, wherein N is an order of said output HOA coefficients for providing improved lossy compression of HOA representations of audio scenes. 12. The apparatus according to claim 11 , wherein said decoding of each one of said O encoded spatial domain signals corresponds to the MPEG-1 Audio Layer III or AAC or Dolby AC-3 standard. 13. An apparatus for carrying out an encoding on received successive frames of a higher order Ambisonics representation of a 2- or 3-dimensional sound field, denoted as HOA coefficients, said apparatus comprising: a means for transforming a number O=(N+1) 2 input HOA coefficients of a frame into a number of O spatial domain signals representing a regular distribution of reference points on a sphere, wherein N is an order of said input HOA coefficients and is greater or equal to 3, and each one of said spatial domain signals represents a set of plane waves which come from associated directions in space; a means for encoding each one of said O spatial domain signals using perceptual compression encoding steps or stages, thereby using encoding parameters selected such that a coding error is inaudible; and a means for multiplexing the encoded spatial domain signals of the frame into a joint bit stream for providing improved lossy compression of HOA representations of audio scenes. 14. The apparatus according to claim 13 , wherein a means for masking used in said perceptual compression encoding is a psycho-acoustic masking and is a combination of time-frequency masking and spatial masking. 15. The apparatus according to claim 13 , wherein said means for transforming is a plane wave decomposition. 16. The apparatus according to claim 13 , wherein a means for said perceptual compression encoding corresponds to the MPEG-1 Audio Layer III or AAC or Dolby AC-3 standard. 17. An apparatus for decoding received successive frames of a perceptual compression encoded higher-order Ambisonics representation of a 2- or 3-dimensional sound field, which was encoded according to claim 1 , said apparatus comprising: a means for demultiplexing a received joint bit stream into O=(N+1) 2 perceptual compression encoded spatial domain signals; a means for decoding each one of said O encoded spatial domain signals into a corresponding decoded spatial domain signal using perceptual compression decoding steps or stages corresponding to a selected encoding type and using decoding parameters matching the encoding parameters, wherein said O decoded spatial domain signals represent a regular distribution of reference points on a sphere; and a means for transforming said O decod
Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title
using three or more audio channels, e.g. triphonic or quadraphonic · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.