Methods and devices for generating or decoding a bitstream comprising immersive audio signals

US12020718B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12020718-B2
Application numberUS-201917251940-A
CountryUS
Kind codeB2
Filing dateJul 2, 2019
Priority dateJul 2, 2018
Publication dateJun 25, 2024
Grant dateJun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present document describes a method ( 500 ) for generating a bitstream ( 101 ), wherein the bitstream ( 101 ) comprises a sequence of superframes ( 400 ) for a sequence of frames of an immersive audio signal ( 111 ). The method ( 500 ) comprises, repeatedly for the sequence of superframes ( 400 ), inserting ( 501 ) coded audio data ( 206 ) for one or more frames of one or more downmix channel signals ( 203 ) derived from the immersive audio signal ( 111 ), into data fields ( 411, 421, 412, 422 ) of a superframe ( 400 ); and inserting ( 502 ) metadata ( 202, 205 ) for reconstructing one or more frames of the immersive audio signal ( 111 ) from the coded audio data ( 206 ), into a metadata field ( 403 ) of the superframe ( 400 ).

First claim

Opening claim text (preview).

It is claimed: 1. A method for generating a bitstream; wherein the bitstream comprises a sequence of superframes for a sequence of frames of an ambisonic immersive audio signal; wherein the method comprises, repeatedly for the sequence of superframes, inserting coded audio data for two or more frames of one or more downmix channel signals derived from the ambisonic immersive audio signal, into data fields of a superframe in the sequence of superframes in the bitstream, wherein each frame in the two or more frames in the superframe is coded with a respective data field designated to support frame level re-synchronization operations by a recipient device of the ambisonic immersive audio signal in case of bit errors in the ambisonic immersive audio signal; and inserting metadata for reconstructing one or more frames of the ambisonic immersive audio signal from the coded audio data into a metadata field of the superframe. 2. The method of claim 1 , wherein a. the method comprises inserting a header field into the superframe; and b. the header field is indicative of a size of the metadata field of the superframe. 3. The method of claim 2 , wherein a. the metadata field has a data size no greater than a maximum size; b. the header field is indicative of an adjustment value; and c. the size of the metadata field of the superframe corresponds to the maximum size minus the adjustment value. 4. The method of claim 2 , wherein a. the header field comprises a size indicator for the size of the metadata field; and b. the size indicator exhibits a different resolution for different size ranges of the size of the metadata field. 5. The method of claim 4 , wherein a. the metadata for reconstructing the one or more frames of the ambisonic immersive audio signal exhibits a statistical size distribution of the size of the metadata; and b. the resolution of the size indicator is dependent on the size distribution of the metadata. 6. The method of claim 1 , wherein a. the method comprises inserting a header field into the superframe; and b. the header field is indicative of whether or not the superframe comprises a configuration information field, and/or c. the header field is indicative of the presence of a configuration information field. 7. The method of claim 1 , wherein a. the method comprises inserting a configuration information field into the superframe; and b. the configuration information field is indicative of a number of downmix channel signals represented by the data fields of the superframe. 8. The method of claim 1 , wherein a. the method comprises inserting a configuration information field into the superframe; and b. the configuration information field is indicative of a maximum size of the metadata field. 9. The method of claim 1 , wherein a. the method comprises inserting a configuration information field into the superframe; and b. the configuration information field is indicative of an order of a soundfield representation signal comprised within the ambisonic immersive audio signal. 10. The method of claim 1 , wherein a. the method comprises inserting a configuration information field into the superframe; and b. the configuration information field is indicative of a frame type and/or a coding mode used for coding each one of the one or more downmix channel signals. 11. The method of claim 1 , wherein a. the method comprises inserting a header field into the superframe; and b. the header field is indicative of whether or not the superframe comprises an extension field for additional information regarding the ambisonic immersive audio signal. 12. The method of claim 1 , wherein a. the coded audio data of a frame of a downmix channel signal is generated using a multi-mode and/or multi-rate speech or audio codec; and/or b. the metadata is generated using a multi-mode and/or multi-rate immersive metadata coding scheme. 13. The method of claim 1 , wherein the coded audio data of a frame of a downmix channel signal is encoded using an Enhanced Voice Services encoder. 14. The method of claim 1 , wherein the superframe constitutes at least a part of a data element transmitted using a transmission protocol, notably DASH, RTSP or RTP, or stored in a file according to a storage format, notably ISOBMFF. 15. The method of claim 1 , wherein a. a header field is indicative that no configuration information field is present; and b. the method comprises conveying configuration information in a previous superframe of the sequence of superframes or using an out-of-band signaling scheme. 16. The method of claim 1 , wherein the one or more downmix channels signals include a first downmix channel signal and a second downmix channel signal; wherein the first downmix channel signal and the second downmix channel signal are derived from the ambisonic immersive audio signal; wherein the first downmix channel signal is encoded using a first encoder, and wherein the second downmix channel signal is encoded using a second encoder; wherein the method comprises providing configuration information regarding the first encoder and the second encoder within the superframe, within a previous superframe of the sequence of superframes or using an out-of-band signaling scheme. 17. The method of claim 1 , wherein the method comprises a. extracting one or more audio objects from the immersive audio, referred to as IA, signal; wherein an audio object comprises an object signal and object metadata indicating a position of the audio object; b. determining a residual signal based on the IA signal and based on the one or more audio objects; c. providing a downmix signal based on the IA signal, notably such that a number of downmix channel signals of the downmix signal is smaller than a number of channel signals of the IA signal; d. determining joint coding metadata for enabling upmixing of the downmix signal to one or more reconstructed audio object signals corresponding to the one or more audio objects and/or to a reconstructed residual signal corresponding to the residual signal; e. performing waveform coding of the downmix signal to provide coded audio data for a sequence of frames of the one or more downmix channel signals; and f. performing entropy coding of the joint coding metadata and of the object metadata of the one or more audio objects to provide the metadata to be inserted into the metadata fields of the sequence of superframes. 18. A method for deriving data regarding an ambisonic immersive audio signal from a bitstream; wherein the bitstream comprises a sequence of superframes for a sequence of frames of the ambisonic immersive audio signal; wherein the method comprises, repeatedly for the sequence of superframes, a. extracting coded audio data for two or more frames of one or more downmix channel signals derived from the ambisonic immersive audio signal, from data fields of a superframe in the sequence of superframes in the bitstream, wherein each frame in the two or more frames in the superframe is coded with a respective data field designated to support frame level re-synchronization operations by a recipient device of the ambisonic immersive audio signal in case of bit errors in the ambisonic immersive audio signal; and b. extracting metadata for reconstructing one or more frames of the immersive ambisonic audio signal from the coded audio data from a metadata field of the superframe. 19. The method of claim 18 , further comprising a. deriving one or more reconstructed audio objects from the coded audio

Assignees

Inventors

Classifications

  • Application of parametric coding in stereophonic audio systems · CPC title

  • Application of ambisonics in stereophonic audio systems · CPC title

  • G10L19/167Primary

    Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes · CPC title

  • G10L19/008Primary

    Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title

  • Dials; Mounting of dials · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12020718B2 cover?
The present document describes a method ( 500 ) for generating a bitstream ( 101 ), wherein the bitstream ( 101 ) comprises a sequence of superframes ( 400 ) for a sequence of frames of an immersive audio signal ( 111 ). The method ( 500 ) comprises, repeatedly for the sequence of superframes ( 400 ), inserting ( 501 ) coded audio data ( 206 ) for one or more frames of one or more downmix chann…
Who is the assignee on this patent?
Dolby Int Ab, Dolby Laboratories Licensing Corp
What technology area does this patent fall under?
Primary CPC classification G10L19/167. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).