Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus
US-2019392802-A1 · Dec 26, 2019 · US
US12574696B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12574696-B2 |
| Application number | US-202118247600-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 14, 2021 |
| Priority date | Oct 17, 2020 |
| Publication date | Mar 10, 2026 |
| Grant date | Mar 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein is a method for training a machine learning algorithm. The method may comprise receiving a first input multichannel audio signal. The method may comprise generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal. The method may comprise rendering the intermediate audio signal into a first output multichannel audio signal. Further, the method may comprise improving the machine learning algorithm based on a difference between the first input multichannel audio signal and the first output multichannel audio signal. Described herein are further an apparatus for generating an intermediate audio format from an input multichannel audio signal as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.
Opening claim text (preview).
The invention claimed is: 1 . A computer-implemented method for training a machine learning algorithm, the method comprising: receiving a first input multichannel audio signal, generating, using the machine learning algorithm, an intermediate audio signal based on the first input multichannel audio signal, wherein the intermediate audio signal comprises one or more audio objects, and wherein each of the audio objects comprises an audio track and position metadata, rendering the intermediate audio signal into a first output multichannel audio signal, and improving the machine learning algorithm based on a difference between the first input multichannel audio signal and the first output multichannel audio signal. 2 . The method according to claim 1 , wherein the receiving comprises: receiving a reference intermediate audio signal, and rendering the reference intermediate audio signal into the first input multichannel audio signal. 3 . The method according to claim 2 , wherein the reference intermediate audio signal has the same format as the intermediate audio signal. 4 . The method according to claim 2 , wherein the reference intermediate audio signal comprises one or more audio objects. 5 . The method according to claim 4 , wherein the intermediate audio signal further comprises a bed channel residual, wherein the bed channel residual is a multichannel audio signal having the same format as the first input multichannel audio signal, and wherein the number of audio objects of the reference intermediate audio signal is larger than the number of audio objects of the intermediate audio signal. 6 . The method according to claim 2 , further comprising: rendering a second input multichannel audio signal from the reference intermediate audio signal, rendering the intermediate audio signal into a second output multichannel audio signal, and improving the machine learning algorithm based a first difference between the first input multichannel audio signal and the first output multichannel audio signal, and based on a second difference between the second input multichannel audio signal and the second output multichannel audio signal. 7 . The method according to claim 6 , wherein the second input multichannel audio signal has the same format as the second output multichannel audio signal. 8 . The method according to claim 1 , wherein the first input multichannel audio signal has the same format as the first output multichannel audio signal. 9 . The method according to claim 1 , wherein improving the machine learning algorithm includes comparing the first input multichannel audio signal and the first output multichannel audio signal using a loss function. 10 . The method according to claim 9 , wherein the comparing of the first input multichannel audio signal and the first output multichannel audio signal is performed in the waveform domain or in the spectrogram domain. 11 . The method according to claim 9 , wherein the comparing of the first input multichannel audio signal and the first output multichannel audio signal involves at least one of: a mean squared error, a mean absolute error, and a mean squared logarithmic error. 12 . The method according to claim 1 , wherein the intermediate audio signal further comprises a bed channel residual, wherein the bed channel residual is a multichannel audio signal having the same format as the first input multichannel audio signal. 13 . The method according to claim 12 , wherein the improving further comprises minimizing a cost function term involving a correlation between audio tracks of two different audio objects and/or between the audio track of an audio object and the bed channel residual. 14 . The method according to claim 1 , wherein the first input multichannel audio signal comprises a 2.0, 3.1, 5.1 or 7.1 multichannel audio signal, and the first output multichannel audio signal comprises a 2.0, 3.1, 5.1, 7.1, 9.1, 5.1.2, 7.1.4, or 9.1.6 multichannel audio signal. 15 . The method according to claim 1 , wherein generating the intermediate audio signal using the machine learning algorithm further comprises: generating, using the machine learning algorithm, a multichannel object based on the first input multichannel audio signal, and determining, using a de-panning algorithm, position meta data of an audio object of the intermediate audio signal based on the multichannel object. 16 . The method according to claim 15 , wherein the de-panning algorithm is based on a further machine learning algorithm, and the method further comprises: jointly improving the de-panning algorithm and the machine learning algorithm based on the difference between the first input multichannel audio signal and the first output multichannel audio signal. 17 . The method according to claim 1 , wherein the machine learning algorithm comprises a deep neural network or a combination of a deep neural network and a digital signal processing algorithm. 18 . The method according to claim 1 , wherein the improving further comprises minimizing a cost function term involving a position, a motion, or an acceleration of an audio object. 19 . Apparatus for generating an intermediate audio format from an input multichannel audio signal, wherein the apparatus includes a processor configured to perform the steps of the method according to claim 1 . 20 . A computer program product comprising a non-transitory computer-readable storage medium with instructions adapted to cause a device to carry out the method according to claim 1 when executed by the device having processing capability.
Positioning of individual sound objects, e.g. moving airplane, within a sound field (H04S2420/13 takes precedence) · CPC title
Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved · CPC title
Electronic adaptation of stereophonic sound system to listener position or orientation (H04S7/301 takes precedence) · CPC title
Machine learning · CPC title
in which the audio signals are in digital form, i.e. employing more than two discrete digital channels (data reduction aspects thereof based on psychoacoustics G10L19/02) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.