Object clustering for rendering object-based audio content based on perceptual criteria

US9805725B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9805725-B2
Application numberUS-201314654460-A
CountryUS
Kind codeB2
Filing dateNov 25, 2013
Priority dateDec 21, 2012
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed a method of rendering object-based audio comprising determining an initial spatial position of objects having object audio data and associated metadata, determining a perceptual importance of the objects, and grouping the audio objects into a number of clusters based on the determined perceptual importance of the objects, such that a spatial error caused by moving an object from an initial spatial position to a second spatial position in a cluster is minimized for objects with a relatively high perceptual importance. The perceptual importance is based at least in part by a partial loudness of an object and content semantics of the object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of compressing object-based audio data comprising: determining a perceptual importance of objects in an audio scene, wherein the objects comprise object audio data and associated metadata; combining certain audio objects into clusters of audio objects based on the determined perceptual importance of the audio objects, wherein a number of clusters is less than an original number of audio objects in the audio scene, and wherein said combining certain audio objects into clusters comprises selecting centroids for the clusters that correspond to the audio objects having the highest perceptual importance and distributing at least one of the remaining audio objects over more than one of the clusters by panning techniques. 2. The method of claim 1 wherein the perceptual importance is derived from the object audio data of the audio objects. 3. The method of claim 1 wherein the perceptual importance is a value derived from at least one of a loudness value and a content type of a respective audio object, and wherein the content type is selected from the group consisting of: dialog, music, sound effects, ambiance, and noise. 4. The method of claim 3 wherein the content type is determined by an audio classification process, and wherein the loudness value is obtained by a perceptual model. 5. The method of claim 4 wherein the perceptual model is based on a calculation of excitation levels in critical frequency bands of the input audio signal, and wherein the method further comprises: defining a centroid for a cluster around a first audio object of the audio objects; aggregating all excitations of the audio objects; and, optionally smoothing the excitation levels, the loudness or properties derived thereof based on a time constant derived by a relative perceptual importance of a grouped audio object. 6. The method of claim 3 wherein the loudness value is dependent at least in part on spatial proximity of a respective audio object to the other audio objects, and optionally wherein the spatial proximity is defined at least in part by a position metadata value of the associated metadata for the respective audio object. 7. The method of claim 1 wherein the determined perceptual importance of the audio objects depends on a relative spatial location of the audio objects in the audio scene, and wherein the step of combining comprises: determining a number of centroids, each centroid comprising a center of a cluster for grouping a plurality of audio objects, the centroid positions being dependent on the perceptual importance of one or more audio objects relative to other audio objects; and grouping the audio objects into one or more clusters by distributing audio object signals across the clusters. 8. The method of claim 1 wherein cluster metadata is determined by one or more audio objects of a high perceptual importance. 9. The method of claim 1 wherein the combining causes certain spatial errors associated with each clustered audio object, and further wherein the method further comprises clustering the audio objects such that a spatial error is minimized for audio objects of relatively high perceptual importance. 10. A non-transitory storage medium comprising a software program, which when executed on a computing device, causes the computing device to perform the method of claim 1 . 11. The method of claim 1 , wherein combining certain audio objects into clusters further comprises: combining waveforms embodying the audio data for constituent audio objects within the same cluster together to form a replacement audio object having a combined waveform of the constituent audio objects; and combining the metadata for the constituent audio objects within the same cluster together to form a replacement set of metadata for the constituent audio objects. 12. A method of processing object-based audio comprising: determining a first spatial location of each audio object relative to the other audio objects of the plurality of audio objects; determining a relative importance of each audio object of the plurality of audio objects, said relative importance depending on the relative spatial locations of audio objects, by at least determining a partial loudness of each audio object of the plurality of audio objects, wherein the partial loudness of an audio object is based at least in part on a masking effect of one or more other audio objects; determining a number of centroids, each centroid comprising a center of a cluster for grouping a plurality of audio objects, the centroid positions being dependent on the relative importance of one or more audio objects; combining waveforms embodying the audio data for constituent audio objects within the same cluster together to form a replacement audio object having a combined waveform of the constituent audio objects; and combining the metadata for the constituent audio objects within the same cluster Nether to form a replacement set of metadata for the constituent audio objects. 13. The method of claim 12 further comprising determining a content type and associated content type importance of each audio object of the plurality of audio objects. 14. The method of claim 13 further comprising combining the partial loudness and the content type of each audio object to determine the relative importance of a respective audio object, and optionally wherein the content type is selected from the group consisting of: dialog, music, sound effects, ambiance, and noise. 15. The method of claim 12 wherein the partial loudness is obtained by a perceptual model that is based on a calculation of excitation levels in critical frequency bands of the input audio signal, and wherein the method further comprises: defining a centroid for a cluster around a first audio object of the audio objects; and aggregating all excitations of the audio objects. 16. The method of claim 12 wherein grouping the audio objects causes certain spatial errors associated with each clustered audio object, and wherein the method further comprises grouping the audio objects such that a spatial error is minimized for audio objects of relatively high perceptual importance. 17. The method of claim 16 further comprising one of: selecting the audio object having the highest perceptual importance as a cluster centroid for a cluster containing the audio object having the highest perceptual importance, or selecting an audio object that has a maximum loudness as a cluster centroid for a cluster containing the audio object that has the maximum loudness. 18. A non-transitory storage medium comprising a software program, which when executed on a computing device, causes the computing device to perform the method of claim 12 . 19. An apparatus for compressing object-based audio data, comprising one or more processors configured to: determine a perceptual importance of objects in an audio scene, wherein the objects comprise object audio data and associated metadata; combine certain audio objects into clusters of audio objects based on the determined perceptual importance of the audio objects, wherein a number of clusters is less than an original number of audio objects in the audio scene, and wherein said combining certain audio objects into clusters comprises selecting centroids for the clusters that correspond to the audio objects having the highest perceptual importance and distributing at least one of the remaining audio objects over more than one of the clusters by panning techniques. 20. An apparatus for proc

Assignees

Inventors

Classifications

  • G10L19/008Primary

    Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

  • Application of parametric coding in stereophonic audio systems · CPC title

  • Aspects of volume control, not necessarily automatic, in stereophonic sound systems · CPC title

  • using spectral analysis, e.g. transform vocoders or subband vocoders · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9805725B2 cover?
Embodiments are directed a method of rendering object-based audio comprising determining an initial spatial position of objects having object audio data and associated metadata, determining a perceptual importance of the objects, and grouping the audio objects into a number of clusters based on the determined perceptual importance of the objects, such that a spatial error caused by moving an ob…
Who is the assignee on this patent?
Dolby Laboratories Licensing Corp
What technology area does this patent fall under?
Primary CPC classification G10L19/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).