What technology area does this patent fall under?

Primary CPC classification G10L19/24. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 05 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Seamless scalable decoding of channels, objects, and HOA audio content

US12380904B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12380904-B2
Application number	US-202118246024-A
Country	US
Kind code	B2
Filing date	Sep 10, 2021
Priority date	Sep 25, 2020
Publication date	Aug 5, 2025
Grant date	Aug 5, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are methods and systems for decoding immersive audio content encoded by an adaptive number of scene elements for channels, audio objects, higher-order ambisonics (HOA), and/or other sound field representations. The decoded audio is rendered to the speaker configuration of a playback device. For bit streams that represent audio scenes with a different mixture of channels, objects, and/or HOA in consecutive frames, fade-in of the new frame and fade-out of the old frame may be performed. Crossfading between consecutive frames happen in the speaker layout after rendering, in the spatially decoded content type before rendering, or between the transport channels as the output of the baseline decoder but before spatial decoding and rendering. Crossfading may use an immediate fade-in and fade-out frame (IFFF) for the transition frame or may use an overlap-add synthesis technique such as time-domain aliasing cancellation (TDAC) of MDCT.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of decoding audio content, the method comprising: receiving, by a decoding device, frames of the audio content, the audio content being represented by a plurality of content types, the frames containing audio streams encoding the audio content using an adaptive number of scene elements in the plurality of content types; generating decoded audio streams by processing two consecutive frames containing the audio streams encoding the audio content using a different mixture of the adaptive number of the scene elements in the plurality of content types; and generating crossfading of the decoded audio streams in the two consecutive frames based on a speaker configuration of the decoding device to drive a plurality of speakers. 2. The method of claim 1 , wherein generating the decoded audio streams comprises: generating spatially decoded audio streams for the plurality of content types having at least one scene element for each of the two consecutive frames; and rendering the spatially decoded audio streams for the plurality of content types to generate speaker output signals for the plurality of content types for each of the two consecutive frames based on the speaker configuration of the decoding device; and wherein generating the crossfading of the decoded audio streams comprises: generating crossfading of the speaker output signals for the plurality of content types from an earlier frame to a later frame of the two consecutive frames; and mixing the crossfading of the speaker output signals for the plurality of content types to drive the plurality of speakers. 3. The method of claim 2 , further comprising: transmitting the spatially decoded audio streams and time-synchronized metadata for the plurality of content types to a second device for rendering based on a speaker configuration of the second device. 4. The method of claim 1 , wherein generating the decoded audio streams comprises: generating spatially decoded audio streams for the plurality of content types having at least one scene elements for each of the two consecutive frames, and wherein generating the crossfading of the decoded audio streams comprises: generating crossfading of the spatially decoded audio streams for the plurality of content types from an earlier frame to a later frame of the two consecutive frames; rendering the crossfading of the spatially decoded audio streams for the plurality of content types to generate speaker output signals for the plurality of content types based on the speaker configuration of the decoding device; and mixing the speaker output signals for the plurality of content types to drive the plurality of speakers. 5. The method of claim 4 , further comprising: transmitting the crossfading of the spatially decoded audio streams and time-synchronized metadata for the plurality of content types to a second device for rendering based on a speaker configuration of the second device. 6. The method of claim 4 , further comprising: transmitting the spatially decoded audio streams and time-synchronized metadata for the plurality of content types to a second device for crossfading and rendering based on a speaker configuration of the second device. 7. The method of claim 1 , wherein a later frame of the two consecutive frames comprises an immediate fade-in and fade-out frame (IFFF) used for generating the crossfading of the decoded audio streams, wherein the IFFF contains bit streams that encode the audio content of the later frame for immediate fade-in and encode the audio content of an earlier frame of the two consecutive frames for immediate fade-out. 8. The method of claim 7 , wherein generating the decoded audio streams comprises: generating decoded audio streams for the plurality of content types having at least one scene elements for each of the two consecutive frames, wherein the decoded audio streams for the two consecutive frames have a different mixture of the adaptive number of the scene elements in the plurality of content types, and wherein generating the crossfading of the decoded audio streams in the two consecutive frames comprise: generating a transition frame based on the IFFF, wherein the transition frame comprises an immediate fade-in of the decoded audio streams for the plurality of content types for the later frame and an immediate fade-out of the decoded audio streams for the plurality of content types for the earlier frame. 9. The method of claim 7 , wherein the IFFF comprises a first frame of a current packet and the earlier frame comprises a last frame of a previous packet. 10. The method of claim 9 , wherein the IFFF further comprises an independent frame that is decoded into the decoded audio streams for the first frame of the current packet. 11. The method of claim 9 , wherein the IFFF further comprises a predictive-coding frame and one or more previous frames that enable the IFFF to be decoded into the decoded audio streams for the first frame of the current packet, wherein the one or more previous frames start with an independent frame. 12. The method of claim 9 , wherein for time-domain aliasing cancellation (TDAC) of modified discrete cosine transform (MDCT), the IFFF further comprises one or more previous frames that enable the IFFF to be decoded into the decoded audio streams for the first frame of the current packet, wherein the one or more previous frames start with an independent frame. 13. The method of claim 9 , wherein the IFFF further comprises a plurality of frames of the current packet and a plurality of frames of the earlier packet to enable a plurality of transition frames when generating the crossfading of the decoded audio streams. 14. The method of claim 1 , wherein generating the crossfading of the decoded audio streams in the two consecutive frames comprises: performing a fade-in of the decoded audio streams for a later frame of the two consecutive frames and a fade-out of the decoded audio streams for an earlier frame of the two consecutive frames based on a windowing function associated with time-domain aliasing cancellation (TDAC) of modified discrete cosine transform (MDCT). 15. The method of claim 1 , wherein generating the decoded audio streams comprises: generating baseline decoded audio streams for the plurality of content types having at least one scene elements for each of the two consecutive frames, and wherein generating the crossfading of the decoded audio streams comprises: generating crossfading of the baseline decoded audio streams for the plurality of content types from an earlier frame to a later frame of the two consecutive frames between transport channels; generating spatially decoded audio streams of the crossfading of the baseline decoded audio streams for the plurality of content types; rendering the spatially decoded audio streams for the plurality of content types to generate speaker output signals for the plurality of content types based on the speaker configuration of the decoding device; and mixing the speaker output signals for the plurality of content types to drive the plurality of speakers. 16. The method of claim 15 , further comprising: transmitting the spatially decoded audio streams of the crossfading of the baseline decoded audio streams for the plurality of content types and their time-synchronized metadata to a second device for rendering based on a speaker configuration of the second device. 17. The method of claim 15 , wherein generating the crossfading of the baseline decoded audio streams for the plurality of content types from

Assignees

Apple Inc

Inventors

Classifications

G10L19/008
Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title
H04S2420/11
Application of ambisonics in stereophonic audio systems · CPC title
H04S2420/03
Application of parametric coding in stereophonic audio systems · CPC title
G10L19/24Primary
Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding · CPC title
H04S3/02
of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other · CPC title

Patent family

Related publications grouped by family.

View patent family 78087532

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12380904B2 cover?: Disclosed are methods and systems for decoding immersive audio content encoded by an adaptive number of scene elements for channels, audio objects, higher-order ambisonics (HOA), and/or other sound field representations. The decoded audio is rendered to the speaker configuration of a playback device. For bit streams that represent audio scenes with a different mixture of channels, objects, and/…
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G10L19/24. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 05 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Audio decoder, apparatus for generating encoded audio output data and methods permitting initializing a decoder

Methods, apparatus and systems for generation, transportation and processing of immediate playout frames (IPFs)

Hierarchical spatial resolution codec

Frequently asked questions