Enhanced chroma extraction from an audio codec

US9697840B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9697840-B2
Application numberUS-201214359697-A
CountryUS
Kind codeB2
Filing dateNov 28, 2012
Priority dateNov 30, 2011
Publication dateJul 4, 2017
Grant dateJul 4, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present document relates to methods and systems for music information retrieval (MIR). In particular, the present document relates to methods and systems for extracting a chroma vector from an audio signal. A method ( 900 ) for determining a chroma vector ( 100 ) for a block of samples of an audio signal ( 301 ) is described. The method ( 900 ) comprises receiving ( 901 ) a corresponding block of frequency coefficients derived from the block of samples of the audio signal ( 301 ) from a core encoder ( 412 ) of a spectral band replication based audio encoder ( 410 ) adapted to generate an encoded bitstream ( 305 ) of the audio signal ( 301 ) from the block of frequency coefficients; and determining ( 904 ) the chroma vector ( 100 ) for the block of samples of the audio signal ( 301 ) based on the received block of frequency coefficients.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for processing a block of samples of an audio signal, the method being performed at a spectral band replication based audio encoder which includes a core encoder adapted to derive a block of frequency coefficients from the block of samples of the audio signal and to generate an encoded bitstream of the audio signal from the block of frequency coefficients, and the method comprising: receiving the block of frequency coefficients from the core encoder of the spectral band replication based audio encoder; determining a chroma vector for the block of samples of the audio signal based on the received block of frequency coefficients, wherein determining the chroma vector comprises applying frequency dependent psychoacoustic processing to the received block of frequency coefficients or to one or more frequency coefficients which are determined on the basis of the received block of frequency coefficients; determining melodic and/or harmonic content of the block of samples of the audio signal based on the chroma vector for the block of samples of the audio signal; and storing the melodic and/or harmonic content on media or transferring the melodic and/or harmonic content via a network. 2. The method of claim 1 , wherein the block of samples of the audio signal comprises N succeeding short-blocks of M samples each, respectively; the received block of frequency coefficients comprises N corresponding short-blocks of M frequency coefficients each, respectively, and wherein the method further comprises: estimating a long-block of frequency coefficients corresponding to the block of samples of the audio signal from the N short-blocks of M frequency coefficients; wherein the estimated long-block of frequency coefficients has an increased frequency resolution compared to the N short-blocks of frequency coefficients; and determining the chroma vector for the block of samples of the audio signal based on the estimated long-block of frequency coefficients. 3. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises interleaving corresponding frequency coefficients of the N short-blocks of frequency coefficients, thereby yielding an interleaved long-block of frequency coefficients. 4. The method of claim 3 , wherein estimating the long-block of frequency coefficients comprises decorrelating the N corresponding frequency coefficients of the N short-blocks of frequency coefficients by applying a transform with energy compaction property to the interleaved long-block of frequency coefficients. 5. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises: forming a plurality of sub-sets of the N short-blocks of frequency coefficients; wherein the number of short-blocks per sub-set is selected based on the audio signal; for each sub-set, interleaving corresponding frequency coefficients of the short-blocks of frequency coefficients, thereby yielding an interleaved intermediate-block of frequency coefficients of the sub-set; and for each sub-set, applying a transform with energy compaction property, e.g. a DCT-II transform, to the interleaved intermediate-block of frequency coefficients of the sub-set, thereby yielding a plurality of estimated intermediate-blocks of frequency coefficients for the plurality of sub-sets. 6. The method of claim 5 , wherein the frequency dependent psychoacoustic processing is applied to one of the plurality of estimated intermediate-blocks of frequency coefficients. 7. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises applying a polyphase conversion to the N short-blocks of M frequency coefficients, wherein the polyphase conversion is based on a conversion matrix for mathematically transforming the N short-blocks of M frequency coefficients to an accurate long-block of N×M frequency coefficients; and the polyphase conversion makes use of an approximation of the conversion matrix with a fraction of conversion matrix coefficients set to zero. 8. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises: forming a plurality of sub-sets of the N short-blocks of frequency coefficients; wherein the number L of short-blocks per sub-set is selected based on the audio signal, L<N; applying an intermediate polyphase conversion to the plurality of sub-sets, thereby yielding a plurality of estimated intermediate-blocks of frequency coefficients; wherein the intermediate polyphase conversion is based on an intermediate conversion matrix for mathematically transforming L short-blocks of M frequency coefficients to an accurate intermediate-block of L×M frequency coefficients; and wherein the intermediate polyphase conversion makes use of an approximation of the intermediate conversion matrix with a fraction of intermediate conversion matrix coefficients set to zero. 9. The method of claim 2 , further comprising: estimating a super long-block of frequency coefficients corresponding to a plurality of blocks of samples from a corresponding plurality of long-blocks of frequency coefficients; wherein the estimated super long-block of frequency coefficients has an increased frequency resolution compared to the plurality of long-blocks of frequency coefficients. 10. The method of claim 9 , wherein the frequency dependent psychoacoustic processing is applied to the estimated super long-block of frequency coefficients. 11. The method of claim 2 , wherein the frequency dependent psychoacoustic processing is applied to the estimated long-block of frequency coefficients. 12. The method of claim 1 , wherein applying frequency dependent psychoacoustic processing comprises: comparing a value derived from at least one frequency coefficient of the received block of frequency coefficients or from at least one frequency coefficient being determined on the basis of the received block of frequency coefficients to a frequency dependent energy threshold; and setting the frequency coefficient to zero if the frequency coefficient is below the energy threshold. 13. The method of claim 12 , wherein the derived value corresponds to an average energy derived from a plurality of frequency coefficients for a corresponding plurality of frequencies. 14. The method of claim 1 , wherein determining the chroma vector comprises: classifying plural frequency coefficients of the received block of frequency coefficients or being determined on the basis of the received block of frequency coefficients to tone classes of the chroma vector; and determining cumulated energies for the tone classes of the chroma vector based on the classified frequency coefficients. 15. An audio encoder adapted to encode an audio signal, the audio encoder comprising: a core encoder adapted to encode a downsampled component of the audio signal, wherein the core encoder is adapted to encode a block of samples of the downsampled component of the audio signal by transforming the block of samples of the downsampled component of the audio signal from the time domain into the frequency domain, thereby yielding a corresponding block of frequency coefficients in the frequency domain; and a processor adapted to determine a chroma vector of the block of samples of the downsampled component of the audio signal based on the block of frequency coefficients received from the core encoder, wherein the processor is further adapted to determine the chroma vector by applying frequency dependent psychoacoustic processing to the received block of frequency coefficients or to one or more frequency coeffi

Assignees

Inventors

Classifications

  • G10L25/54Primary

    for retrieval · CPC title

  • for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental · CPC title

  • MDCT [Modified discrete cosine transform], i.e. based on a DCT of overlapping data · CPC title

  • Associated control or indicating means · CPC title

  • Details of processing therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9697840B2 cover?
The present document relates to methods and systems for music information retrieval (MIR). In particular, the present document relates to methods and systems for extracting a chroma vector from an audio signal. A method ( 900 ) for determining a chroma vector ( 100 ) for a block of samples of an audio signal ( 301 ) is described. The method ( 900 ) comprises receiving ( 901 ) a corresponding bl…
Who is the assignee on this patent?
Dolby Int Ab
What technology area does this patent fall under?
Primary CPC classification G10L25/54. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).