Method and apparatus for generating 3d audio content from two-channel stereo content
US-2018270600-A1 · Sep 20, 2018 · US
US10341802B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10341802-B2 |
| Application number | US-201615768695-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 11, 2016 |
| Priority date | Nov 13, 2015 |
| Publication date | Jul 2, 2019 |
| Grant date | Jul 2, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Currently there is no simple and satisfying way to create 3D audio from existing 2D content. The conversion from 2D to 3D sound should spatially redistribute the sound from existing channels. From a multi-channel 2D audio input signal (x(k)(t)) a 3D sound representation is generated which includes an HOA representation Formula (I) and channel object signals Formula (II) scaled from channels of the 2D audio input signal. Additional signals Formula (III) placed in the 3D space are generated by scaling (21, 222; 41, 422; Formula (IV)) channels from the 2D audio input signal and by decorrelating (24, 25; 44, 45, 451; Formula (V)) a scaled version of a mix of channels from the 2D audio input signal, whereby spatial positions for the additional signals are predetermined. The additional signals Formula (III) are converted (27; 47) to a HOA representation Formula (I).
Opening claim text (preview).
The invention claimed is: 1. A method for generating from a multi-channel 2D audio input signal a 3D sound representation which includes a Higher Order Ambisonics (HOA) representation and channel object signals, wherein said 3D sound representation is suited for a presentation with loudspeakers after rendering said HOA representation and combination with said channel object signals, said method including: generating each of said channel object signals by selecting and scaling one channel signal of said multi-channel 2D audio input signal; generating additional signals in a 3D space by scaling non-selected channels from said multi-channel 2D audio input signal or by decorrelating a scaled version of a mix of channels from said multi-channel 2D audio input signal, wherein spatial positions for the additional signals are predetermined; converting the additional signals to said HOA representation using the spatial positions corresponding to the additional signals. 2. The method according to claim 1 , wherein said spatial positions can vary over time and a number corresponding to the spatial positions can vary over time. 3. The method according to claim 1 , wherein said scaling is carried out by applying time-varying gain factors. 4. The method according to claim 1 , wherein said scaling is adjusted such that said 3D sound representation can be rendered with a loudness of said multi-channel 2D audio input signal. 5. The method according to claim 3 , wherein said gain factors are applied before said decorrelating. 6. The method according to claim 1 , wherein the multi-channel 2D audio input signal is replaced by multiple multi-channel 2D audio input signals, each representing one complementary component of a mixed multi-channel 2D audio input signal, and wherein each multi-channel 2D audio input signal is converted to an individual 3D sound representation signal using individual conversion parameters, and wherein the 3D sound representations are superposed to a final mixed 3D sound representation. 7. The method according to claim 1 , wherein multiple decorrelated signals are generated from one channel signal, or a mix of channel signals, of the multi-channel 2D audio input signal based on frequency domain processing, for example by fast convolution using at least one of an FFT and a filter bank, and wherein a frequency analysis of a common input signal is carried out only once and said frequency domain processing and frequency synthesis is applied for each output channel separately. 8. The method of claim 1 , wherein the additional signals are generated by scaling non-selected channels from said multi-channel 2D audio input signal or by de-correlating the scaled version of the mix of channels from said multi-channel 2D audio input signal. 9. An apparatus for generating from a multi-channel 2D audio input signal a 3D sound representation which includes a Higher Order Ambisonics (HOA) representation and channel object signals, wherein said 3D sound representation is suited for a presentation with loudspeakers after rendering said HOA representation and combination with said channel object signals, said apparatus comprising: a processor configured to generate each of said channel object signals by selecting and scaling one channel signal of said multi-channel 2D audio input signal; wherein the processor is further configured to generate additional signals for placing them in a 3D space by scaling non-selected channels from said multi-channel 2D audio input signal or by decorrelating a scaled version of a mix of channels from said multi-channel 2D audio input signal, wherein spatial positions for said additional signals are predetermined; wherein the processor is further configured to convert said additional signals to said HOA representation using corresponding spatial positions. 10. The apparatus of claim 9 , the processor is further configured to generate the additional signals by scaling non-selected channels from said multi-channel 2D audio input signal or by de-correlating the scaled version of the mix of channels from said multi-channel 2D audio input signal. 11. The apparatus of claim 9 , wherein the processor is further configured to generate additional signals for placing them in the 3D space by scaling remaining non-selected channels from said multi-channel 2D audio input signal or by de-correlating the scaled version of the mix of channels from said multi-channel 2D audio input signal, wherein spatial positions for said additional signals are predetermined. 12. The apparatus according to claim 10 , wherein said spatial positions can vary over time and a number corresponding to the spatial positions can vary over time. 13. The apparatus according to claim 10 , wherein said scaling is carried out by applying time-varying gain factors. 14. The apparatus according to claim 9 , wherein the scaling is adjusted such that said 3D sound representation can be rendered with a loudness of said multi-channel 2D audio input signal. 15. The apparatus according to claim 9 , wherein said gain factors are applied before said decorrelating. 16. The apparatus according to claim 9 , wherein the multi-channel 2D audio input signal is replaced by multiple multi-channel 2D audio input signals, each representing one complementary component of a mixed multi-channel 2D audio input signal, and wherein each multi-channel 2D audio input signal is converted to an individual 3D sound representation signal using individual conversion parameters, and wherein the 3D sound representations are superposed to a final mixed 3D sound representation. 17. The apparatus according to claim 9 , wherein multiple decorrelated signals are generated from one channel signal, or a mix of channel signals, of the multi-channel 2D audio input signal based on frequency domain processing, for example by fast convolution using at least an FFT and a filter bank, and a frequency analysis of a common input signal is carried out only once and said frequency domain processing and frequency synthesis is applied for each output channel separately. 18. A non-transitory computer-readable storage medium storing instructions which, when executed by a processor, perform the method according to claim 1 .
Application of ambisonics in stereophonic audio systems · CPC title
Positioning of individual sound objects, e.g. moving airplane, within a sound field (H04S2420/13 takes precedence) · CPC title
Tracking of listener position or orientation · CPC title
in which the audio signals are in digital form, i.e. employing more than two discrete digital channels (data reduction aspects thereof based on psychoacoustics G10L19/02) · CPC title
Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.