Media consumption history
US-2024037137-A1 · Feb 1, 2024 · US
US9697840B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9697840-B2 |
| Application number | US-201214359697-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 28, 2012 |
| Priority date | Nov 30, 2011 |
| Publication date | Jul 4, 2017 |
| Grant date | Jul 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present document relates to methods and systems for music information retrieval (MIR). In particular, the present document relates to methods and systems for extracting a chroma vector from an audio signal. A method ( 900 ) for determining a chroma vector ( 100 ) for a block of samples of an audio signal ( 301 ) is described. The method ( 900 ) comprises receiving ( 901 ) a corresponding block of frequency coefficients derived from the block of samples of the audio signal ( 301 ) from a core encoder ( 412 ) of a spectral band replication based audio encoder ( 410 ) adapted to generate an encoded bitstream ( 305 ) of the audio signal ( 301 ) from the block of frequency coefficients; and determining ( 904 ) the chroma vector ( 100 ) for the block of samples of the audio signal ( 301 ) based on the received block of frequency coefficients.
Opening claim text (preview).
The invention claimed is: 1. A method for processing a block of samples of an audio signal, the method being performed at a spectral band replication based audio encoder which includes a core encoder adapted to derive a block of frequency coefficients from the block of samples of the audio signal and to generate an encoded bitstream of the audio signal from the block of frequency coefficients, and the method comprising: receiving the block of frequency coefficients from the core encoder of the spectral band replication based audio encoder; determining a chroma vector for the block of samples of the audio signal based on the received block of frequency coefficients, wherein determining the chroma vector comprises applying frequency dependent psychoacoustic processing to the received block of frequency coefficients or to one or more frequency coefficients which are determined on the basis of the received block of frequency coefficients; determining melodic and/or harmonic content of the block of samples of the audio signal based on the chroma vector for the block of samples of the audio signal; and storing the melodic and/or harmonic content on media or transferring the melodic and/or harmonic content via a network. 2. The method of claim 1 , wherein the block of samples of the audio signal comprises N succeeding short-blocks of M samples each, respectively; the received block of frequency coefficients comprises N corresponding short-blocks of M frequency coefficients each, respectively, and wherein the method further comprises: estimating a long-block of frequency coefficients corresponding to the block of samples of the audio signal from the N short-blocks of M frequency coefficients; wherein the estimated long-block of frequency coefficients has an increased frequency resolution compared to the N short-blocks of frequency coefficients; and determining the chroma vector for the block of samples of the audio signal based on the estimated long-block of frequency coefficients. 3. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises interleaving corresponding frequency coefficients of the N short-blocks of frequency coefficients, thereby yielding an interleaved long-block of frequency coefficients. 4. The method of claim 3 , wherein estimating the long-block of frequency coefficients comprises decorrelating the N corresponding frequency coefficients of the N short-blocks of frequency coefficients by applying a transform with energy compaction property to the interleaved long-block of frequency coefficients. 5. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises: forming a plurality of sub-sets of the N short-blocks of frequency coefficients; wherein the number of short-blocks per sub-set is selected based on the audio signal; for each sub-set, interleaving corresponding frequency coefficients of the short-blocks of frequency coefficients, thereby yielding an interleaved intermediate-block of frequency coefficients of the sub-set; and for each sub-set, applying a transform with energy compaction property, e.g. a DCT-II transform, to the interleaved intermediate-block of frequency coefficients of the sub-set, thereby yielding a plurality of estimated intermediate-blocks of frequency coefficients for the plurality of sub-sets. 6. The method of claim 5 , wherein the frequency dependent psychoacoustic processing is applied to one of the plurality of estimated intermediate-blocks of frequency coefficients. 7. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises applying a polyphase conversion to the N short-blocks of M frequency coefficients, wherein the polyphase conversion is based on a conversion matrix for mathematically transforming the N short-blocks of M frequency coefficients to an accurate long-block of N×M frequency coefficients; and the polyphase conversion makes use of an approximation of the conversion matrix with a fraction of conversion matrix coefficients set to zero. 8. The method of claim 2 , wherein estimating the long-block of frequency coefficients comprises: forming a plurality of sub-sets of the N short-blocks of frequency coefficients; wherein the number L of short-blocks per sub-set is selected based on the audio signal, L<N; applying an intermediate polyphase conversion to the plurality of sub-sets, thereby yielding a plurality of estimated intermediate-blocks of frequency coefficients; wherein the intermediate polyphase conversion is based on an intermediate conversion matrix for mathematically transforming L short-blocks of M frequency coefficients to an accurate intermediate-block of L×M frequency coefficients; and wherein the intermediate polyphase conversion makes use of an approximation of the intermediate conversion matrix with a fraction of intermediate conversion matrix coefficients set to zero. 9. The method of claim 2 , further comprising: estimating a super long-block of frequency coefficients corresponding to a plurality of blocks of samples from a corresponding plurality of long-blocks of frequency coefficients; wherein the estimated super long-block of frequency coefficients has an increased frequency resolution compared to the plurality of long-blocks of frequency coefficients. 10. The method of claim 9 , wherein the frequency dependent psychoacoustic processing is applied to the estimated super long-block of frequency coefficients. 11. The method of claim 2 , wherein the frequency dependent psychoacoustic processing is applied to the estimated long-block of frequency coefficients. 12. The method of claim 1 , wherein applying frequency dependent psychoacoustic processing comprises: comparing a value derived from at least one frequency coefficient of the received block of frequency coefficients or from at least one frequency coefficient being determined on the basis of the received block of frequency coefficients to a frequency dependent energy threshold; and setting the frequency coefficient to zero if the frequency coefficient is below the energy threshold. 13. The method of claim 12 , wherein the derived value corresponds to an average energy derived from a plurality of frequency coefficients for a corresponding plurality of frequencies. 14. The method of claim 1 , wherein determining the chroma vector comprises: classifying plural frequency coefficients of the received block of frequency coefficients or being determined on the basis of the received block of frequency coefficients to tone classes of the chroma vector; and determining cumulated energies for the tone classes of the chroma vector based on the classified frequency coefficients. 15. An audio encoder adapted to encode an audio signal, the audio encoder comprising: a core encoder adapted to encode a downsampled component of the audio signal, wherein the core encoder is adapted to encode a block of samples of the downsampled component of the audio signal by transforming the block of samples of the downsampled component of the audio signal from the time domain into the frequency domain, thereby yielding a corresponding block of frequency coefficients in the frequency domain; and a processor adapted to determine a chroma vector of the block of samples of the downsampled component of the audio signal based on the block of frequency coefficients received from the core encoder, wherein the processor is further adapted to determine the chroma vector by applying frequency dependent psychoacoustic processing to the received block of frequency coefficients or to one or more frequency coeffi
for retrieval · CPC title
for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental · CPC title
MDCT [Modified discrete cosine transform], i.e. based on a DCT of overlapping data · CPC title
Associated control or indicating means · CPC title
Details of processing therefor · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.