Automatic generation of metadata for audio dominance effects

US9552845B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9552845-B2
Application numberUS-201013501086-A
CountryUS
Kind codeB2
Filing dateOct 5, 2010
Priority dateOct 9, 2009
Publication dateJan 24, 2017
Grant dateJan 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Metadata comprising a set of gain values for creating a dominance effect is automatically generated. Automatically generating the metadata includes receiving multiple audio streams and a dominance criterion for at least one of the audio streams. A set of gains is computed for one or more audio streams based on the dominance criterion for the at least one audio stream and metadata is generated with the set of gains.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: receiving a first audio stream and a second audio stream; determining whether a first energy level for the first audio stream individually meets or exceeds an energy level threshold; determining whether a second energy level for the second audio stream individually meets or exceeds the energy level threshold; in response to determining that the first energy level for the first audio stream individually meets or exceeds the energy level threshold and determining that the second energy level for the second audio stream individually meets or exceeds the energy level threshold, computing a set of gains for at least one of the first audio stream and the second audio stream to create a dominance effect of the second audio stream over the first audio stream; generating metadata comprising the set of gains for at least one of the first audio stream and the second audio stream; generating an output audio signal comprising the first audio stream, the second audio stream, and the metadata comprising the set of gains; wherein the method is performed by a computing device, which comprises a processor. 2. The method as recited in claim 1 , wherein computing the set of gains includes: receiving a dominance criterion of the second audio stream over the first audio stream; wherein the set of gains is computed based on the dominance criterion of the second audio stream over the first audio stream. 3. The method as recited in claim 2 , wherein the dominance criterion comprises a loudness ratio between the first audio stream and the second audio stream. 4. The method as recited in claim 2 , wherein the dominance criterion comprises a power ratio between the first audio stream and the second audio stream. 5. The method as recited in claim 1 , wherein the set of gains is derived under the constraint that the loudness of the combined first audio stream and second audio stream, after application of said set of gains, does not exceed the larger of the loudness values of the first audio stream and the second audio stream. 6. The method as recited in claim 1 , wherein computing the set of gains includes: receiving an intelligibility criterion of the second audio; wherein the set of gains is computed such that the intelligibility of the second audio stream is urged above the intelligibility criterion. 7. The method as recited in claim 6 , wherein the computation of the set of gains comprises deriving a measure of speech intelligibility of speech in the second audio stream. 8. The method as recited in claim 7 , wherein the measure of speech intelligibility is a speech intelligibility index. 9. The method as recited in claim 1 , further comprising: determining whether the first audio stream comprises speech; wherein the set of gains is based at least on the result of said determining. 10. The method as recited in claim 1 , further comprising: determining whether the first audio stream comprises speech or non-speech content during an interval when both the first audio stream and the second audio stream are active; wherein the set of gains for the first interval is based at least on whether the first audio stream comprises speech or non-speech content during the interval. 11. The method as recited in claim 1 , further comprising: determining a confidence level that the first audio stream comprises speech during an interval when both the first audio stream and the second audio stream are active; wherein the set of gains for the interval is based at least on the confidence level that the first audio stream comprises speech during the interval. 12. The method as recited in claim 1 , wherein the set of gains is calculated based on frequency sub-bands, wherein a first set of gains associated with a first frequency sub-band is different than a second set of gains associated with a second frequency sub-band. 13. The method as recited in claim 12 , further comprising: receiving a first dominance criterion for the first frequency sub-band, wherein the first set of gains is computed based on the first dominance criterion; receiving a second dominance criterion for the second frequency sub-band, wherein the second set of gains is computed based on the second dominance criterion. 14. The method as recited in claim 1 , further comprising one or more of: transmitting the first audio stream, the second audio stream, and the metadata; or mixing the first audio stream and the second audio stream based on the metadata. 15. The method as recited in claim 1 , wherein the set of gains is computed when the first audio stream and the second audio stream are active. 16. The method as recited in claim 1 , further comprising detecting overlapping signal time intervals when both the first audio stream and the second audio stream are active. 17. The method as recited in claim 1 , wherein the first audio stream comprises primary audio associated with media content, and wherein the second audio stream comprises descriptive audio associated with the media content. 18. The method as recited in claim 1 , wherein the first audio stream comprises a first set of one or more channels in a multi-channel program, wherein the second audio stream comprises a second set of one or more channels in the multi-channel program. 19. A method for processing an encoded audio signal generated according to the method of claim 1 , comprising: receiving the encoded audio signal; extracting, from the encoded audio signal, (a) the first audio stream, (b) the second audio stream, and (c) the metadata comprising the set of gains; decoding the first audio stream and the second audio stream; and mixing the first audio stream and the second audio stream based on the set of gains to create an output audio signal in which one of the first and second audio streams dominates the other of the first and second audio streams. 20. A non-transitory computer readable storage medium, comprising a set of instructions, which when executed by a processing or computing device cause, control or program the device to execute or perform a process, wherein the process comprises the steps of: receiving a first audio stream and a second audio stream; determining whether a first energy level for the first audio stream individually meets or exceeds an energy level threshold; determining whether a second energy level for the second audio stream individually meets or exceeds the energy level threshold; in response to determining that the first energy level for the first audio stream individually meets or exceeds the energy level threshold and determining that the second energy level for the second audio stream individually meets or exceeds the energy level threshold, computing a set of gains for at least one of the first audio stream and the second audio stream to create a dominance effect of the second audio stream over the first audio stream; generating metadata comprising the set of gains for at least one of the first audio stream and the second audio stream; generating an output audio signal comprising the first audio stream, the second audio stream, and the metadata comprising the set of gains. 21. An apparatus comprising: a processor; and a non-transitory computer readable storage medium, comprising a set of instructions, which when executed by the processor cause, control or program the apparatus, or the processor thereof, to perform a process that comprises the steps of: receiving a first audio stream and a

Assignees

Inventors

Classifications

  • Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title

  • of operating discs · CPC title

  • by using information signals recorded by the same method as the main recording {(G11B27/22 takes precedence)} · CPC title

  • Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes · CPC title

  • used signal is digitally coded · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9552845B2 cover?
Metadata comprising a set of gain values for creating a dominance effect is automatically generated. Automatically generating the metadata includes receiving multiple audio streams and a dominance criterion for at least one of the audio streams. A set of gains is computed for one or more audio streams based on the dominance criterion for the at least one audio stream and metadata is generated w…
Who is the assignee on this patent?
Riedmiller Jeffrey C, Radhakrishnan Regunathan, Muesch Hannes, and 1 more
What technology area does this patent fall under?
Primary CPC classification G11B27/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).