Method and apparatus for generating an intermediate audio format from an input multichannel audio signal

US12574696B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12574696-B2
Application numberUS-202118247600-A
CountryUS
Kind codeB2
Filing dateOct 14, 2021
Priority dateOct 17, 2020
Publication dateMar 10, 2026
Grant dateMar 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a method for training a machine learning algorithm. The method may comprise receiving a first input multichannel audio signal. The method may comprise generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal. The method may comprise rendering the intermediate audio signal into a first output multichannel audio signal. Further, the method may comprise improving the machine learning algorithm based on a difference between the first input multichannel audio signal and the first output multichannel audio signal. Described herein are further an apparatus for generating an intermediate audio format from an input multichannel audio signal as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method for training a machine learning algorithm, the method comprising: receiving a first input multichannel audio signal, generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal, wherein the intermediate audio signal comprises one or more audio objects, and wherein each of the audio objects comprises an audio track and position metadata, rendering the intermediate audio signal into a first output multichannel audio signal, and improving the machine learning algorithm based on a difference between the first input multichannel audio signal and the first output multichannel audio signal. 2 . The method according to claim 1 , wherein the receiving comprises: receiving a reference intermediate audio signal, and rendering the reference intermediate audio signal into the first input multichannel audio signal. 3 . The method according to claim 2 , wherein the reference intermediate audio signal has the same format as the intermediate audio signal. 4 . The method according to claim 2 , wherein the reference intermediate audio signal comprises one or more audio objects. 5 . The method according to claim 4 , wherein the intermediate audio signal further comprises a bed channel residual, wherein the bed channel residual is a multichannel audio signal having the same format as the first input multichannel audio signal, and wherein the number of audio objects of the reference intermediate audio signal is larger than the number of audio objects of the intermediate audio signal. 6 . The method according to claim 2 , further comprising: rendering a second input multichannel audio signal from the reference intermediate audio signal, rendering the intermediate audio signal into a second output multichannel audio signal, and improving the machine learning algorithm based a first difference between the first input multichannel audio signal and the first output multichannel audio signal, and based on a second difference between the second input multichannel audio signal and the second output multichannel audio signal. 7 . The method according to claim 6 , wherein the second input multichannel audio signal has the same format as the second output multichannel audio signal. 8 . The method according to claim 1 , wherein the first input multichannel audio signal has the same format as the first output multichannel audio signal. 9 . The method according to claim 1 , wherein improving the machine learning algorithm includes comparing the first input multichannel audio signal and the first output multichannel audio signal using a loss function. 10 . The method according to claim 9 , wherein the comparing of the first input multichannel audio signal and the first output multichannel audio signal is performed in the waveform domain or in the spectrogram domain. 11 . The method according to claim 9 , wherein the comparing of the first input multichannel audio signal and the first output multichannel audio signal involves at least one of: a mean squared error, a mean absolute error, and a mean squared logarithmic error. 12 . The method according to claim 1 , wherein the intermediate audio signal further comprises a bed channel residual, wherein the bed channel residual is a multichannel audio signal having the same format as the first input multichannel audio signal. 13 . The method according to claim 12 , wherein the improving further comprises minimizing a cost function term involving a correlation between audio tracks of two different audio objects and/or between the audio track of an audio object and the bed channel residual. 14 . The method according to claim 1 , wherein the first input multichannel audio signal comprises a 2.0, 3.1, 5.1 or 7.1 multichannel audio signal, and the first output multichannel audio signal comprises a 2.0, 3.1, 5.1, 7.1, 9.1, 5.1.2, 7.1.4, or 9.1.6 multichannel audio signal. 15 . The method according to claim 1 , wherein generating the intermediate audio signal using the machine learning algorithm further comprises: generating, using the machine learning algorithm, a multichannel object based on the first input multichannel audio signal, and determining, using a de-panning algorithm, position meta data of an audio object of the intermediate audio signal based on the multichannel object. 16 . The method according to claim 15 , wherein the de-panning algorithm is based on a further machine learning algorithm, and the method further comprises: jointly improving the de-panning algorithm and the machine learning algorithm based on the difference between the first input multichannel audio signal and the first output multichannel audio signal. 17 . The method according to claim 1 , wherein the machine learning algorithm comprises a deep neural network or a combination of a deep neural network and a digital signal processing algorithm. 18 . The method according to claim 1 , wherein the improving further comprises minimizing a cost function term involving a position, a motion, or an acceleration of an audio object. 19 . Apparatus for generating an intermediate audio format from an input multichannel audio signal, wherein the apparatus includes a processor configured to perform the steps of the method according to claim 1 . 20 . A computer program product comprising a non-transitory computer-readable storage medium with instructions adapted to cause a device to carry out the method according to claim 1 when executed by the device having processing capability.

Assignees

Inventors

Classifications

  • Positioning of individual sound objects, e.g. moving airplane, within a sound field (H04S2420/13 takes precedence) · CPC title

  • Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved · CPC title

  • Electronic adaptation of stereophonic sound system to listener position or orientation (H04S7/301 takes precedence) · CPC title

  • Machine learning · CPC title

  • H04S3/008Primary

    in which the audio signals are in digital form, i.e. employing more than two discrete digital channels (data reduction aspects thereof based on psychoacoustics G10L19/02) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12574696B2 cover?
Described herein is a method for training a machine learning algorithm. The method may comprise receiving a first input multichannel audio signal. The method may comprise generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal. The method may comprise rendering the intermediate audio signal into a first output multichanne…
Who is the assignee on this patent?
Dolby Int Ab
What technology area does this patent fall under?
Primary CPC classification H04S3/008. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).