What technology area does this patent fall under?

Primary CPC classification H04S3/008. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for generating an intermediate audio format from an input multichannel audio signal

US12574696B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12574696-B2
Application number	US-202118247600-A
Country	US
Kind code	B2
Filing date	Oct 14, 2021
Priority date	Oct 17, 2020
Publication date	Mar 10, 2026
Grant date	Mar 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a method for training a machine learning algorithm. The method may comprise receiving a first input multichannel audio signal. The method may comprise generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal. The method may comprise rendering the intermediate audio signal into a first output multichannel audio signal. Further, the method may comprise improving the machine learning algorithm based on a difference between the first input multichannel audio signal and the first output multichannel audio signal. Described herein are further an apparatus for generating an intermediate audio format from an input multichannel audio signal as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method for training a machine learning algorithm, the method comprising: receiving a first input multichannel audio signal, generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal, wherein the intermediate audio signal comprises one or more audio objects, and wherein each of the audio objects comprises an audio track and position metadata, rendering the intermediate audio signal into a first output multichannel audio signal, and improving the machine learning algorithm based on a difference between the first input multichannel audio signal and the first output multichannel audio signal. 2 . The method according to claim 1 , wherein the receiving comprises: receiving a reference intermediate audio signal, and rendering the reference intermediate audio signal into the first input multichannel audio signal. 3 . The method according to claim 2 , wherein the reference intermediate audio signal has the same format as the intermediate audio signal. 4 . The method according to claim 2 , wherein the reference intermediate audio signal comprises one or more audio objects. 5 . The method according to claim 4 , wherein the intermediate audio signal further comprises a bed channel residual, wherein the bed channel residual is a multichannel audio signal having the same format as the first input multichannel audio signal, and wherein the number of audio objects of the reference intermediate audio signal is larger than the number of audio objects of the intermediate audio signal. 6 . The method according to claim 2 , further comprising: rendering a second input multichannel audio signal from the reference intermediate audio signal, rendering the intermediate audio signal into a second output multichannel audio signal, and improving the machine learning algorithm based a first difference between the first input multichannel audio signal and the first output multichannel audio signal, and based on a second difference between the second input multichannel audio signal and the second output multichannel audio signal. 7 . The method according to claim 6 , wherein the second input multichannel audio signal has the same format as the second output multichannel audio signal. 8 . The method according to claim 1 , wherein the first input multichannel audio signal has the same format as the first output multichannel audio signal. 9 . The method according to claim 1 , wherein improving the machine learning algorithm includes comparing the first input multichannel audio signal and the first output multichannel audio signal using a loss function. 10 . The method according to claim 9 , wherein the comparing of the first input multichannel audio signal and the first output multichannel audio signal is performed in the waveform domain or in the spectrogram domain. 11 . The method according to claim 9 , wherein the comparing of the first input multichannel audio signal and the first output multichannel audio signal involves at least one of: a mean squared error, a mean absolute error, and a mean squared logarithmic error. 12 . The method according to claim 1 , wherein the intermediate audio signal further comprises a bed channel residual, wherein the bed channel residual is a multichannel audio signal having the same format as the first input multichannel audio signal. 13 . The method according to claim 12 , wherein the improving further comprises minimizing a cost function term involving a correlation between audio tracks of two different audio objects and/or between the audio track of an audio object and the bed channel residual. 14 . The method according to claim 1 , wherein the first input multichannel audio signal comprises a 2.0, 3.1, 5.1 or 7.1 multichannel audio signal, and the first output multichannel audio signal comprises a 2.0, 3.1, 5.1, 7.1, 9.1, 5.1.2, 7.1.4, or 9.1.6 multichannel audio signal. 15 . The method according to claim 1 , wherein generating the intermediate audio signal using the machine learning algorithm further comprises: generating, using the machine learning algorithm, a multichannel object based on the first input multichannel audio signal, and determining, using a de-panning algorithm, position meta data of an audio object of the intermediate audio signal based on the multichannel object. 16 . The method according to claim 15 , wherein the de-panning algorithm is based on a further machine learning algorithm, and the method further comprises: jointly improving the de-panning algorithm and the machine learning algorithm based on the difference between the first input multichannel audio signal and the first output multichannel audio signal. 17 . The method according to claim 1 , wherein the machine learning algorithm comprises a deep neural network or a combination of a deep neural network and a digital signal processing algorithm. 18 . The method according to claim 1 , wherein the improving further comprises minimizing a cost function term involving a position, a motion, or an acceleration of an audio object. 19 . Apparatus for generating an intermediate audio format from an input multichannel audio signal, wherein the apparatus includes a processor configured to perform the steps of the method according to claim 1 . 20 . A computer program product comprising a non-transitory computer-readable storage medium with instructions adapted to cause a device to carry out the method according to claim 1 when executed by the device having processing capability.

Assignees

Dolby Int Ab

Inventors

Classifications

H04S2400/11
Positioning of individual sound objects, e.g. moving airplane, within a sound field (H04S2420/13 takes precedence) · CPC title
H04S2400/01
Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved · CPC title
H04S7/302
Electronic adaptation of stereophonic sound system to listener position or orientation (H04S7/301 takes precedence) · CPC title
G06N20/00
Machine learning · CPC title
H04S3/008Primary
in which the audio signals are in digital form, i.e. employing more than two discrete digital channels (data reduction aspects thereof based on psychoacoustics G10L19/02) · CPC title

Patent family

Related publications grouped by family.

View patent family 78087395

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12574696B2 cover?: Described herein is a method for training a machine learning algorithm. The method may comprise receiving a first input multichannel audio signal. The method may comprise generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal. The method may comprise rendering the intermediate audio signal into a first output multichanne…
Who is the assignee on this patent?: Dolby Int Ab
What technology area does this patent fall under?: Primary CPC classification H04S3/008. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).