Audio Signal Classification Method and Apparatus
US-2016155456-A1 · Jun 2, 2016 · US
US9852745B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9852745-B1 |
| Application number | US-201615331651-A |
| Country | US |
| Kind code | B1 |
| Filing date | Oct 21, 2016 |
| Priority date | Jun 24, 2016 |
| Publication date | Dec 26, 2017 |
| Grant date | Dec 26, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive content. The vocal content can then be processed to identify surge points in the digitized audio. In some implementations, the vocal content is included in the harmonic content during the separation procedure and is then separated from the harmonic content.
Opening claim text (preview).
What is claimed is: 1. A computing device comprising: a processing unit; and memory; the computing device configured to perform operations for identifying surge points within audio music content, the operations comprising: generating a frequency spectrum of at least a portion of digitized audio music content; analyzing the frequency spectrum to separate harmonic content and percussive content; using results of the analysis, generating an audio track representing vocal content within the audio music content; and processing the audio track representing vocal content to identify at least one surge point within the audio music content. 2. The computing device of claim 1 wherein generating the frequency spectrum comprises: applying a short-time Fourier transform (STFT) to the audio music content. 3. The computing device of claim 1 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content. 4. The computing device of claim 1 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: in a first pass: generating the frequency spectrum with an STFT with a first frequency resolution; and performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content; and in a second pass: applying an STFT with a second frequency resolution to the harmonic content produced in the first pass; and performing median filtering to results of the STFT using the second frequency resolution to generating the audio track representing vocal content; wherein the second frequency resolution is higher than the first frequency resolution. 5. The computing device of claim 4 wherein the STFT in the first pass uses a first window size, and wherein the STFT in the second pass uses a second window size that is larger than the first window size. 6. The computing device of claim 1 wherein generating the audio track representing vocal content within the music content comprises: performing filtering on the harmonic content. 7. The computing device of claim 1 wherein processing the audio track representing vocal content to identify at least one surge point within the music content comprises: applying a low-pass filter to the audio track that removes features that are less than the length of a bar; and identifying the at least one surge point based, at least in part, upon the low-pass filtered audio track. 8. The computing device of claim 1 wherein processing the audio track representing vocal content to identify at least one surge point within the music content comprises: applying a band-pass filter to the audio track; and identifying the at least one surge point based, at least in part, upon the band-pass filtered audio track. 9. The computing device of claim 1 wherein processing the audio track representing vocal content to identify at least one surge point comprises: filtering the audio track using a low-pass filter or a band-pass filter; applying one or more of a depth classifier, a width classifier, a bar energy classifier, or a beat energy classifier to the filtered audio track; and using result of the one or more classifiers to identify the at least one surge point. 10. The computing device of claim 1 wherein the at least one surge point is a location within the music content where vocal power falls to a local minimum and then returns to a level higher than the vocal power was prior to the local minimum. 11. The computing device of claim 1 wherein the vocal content is a human voice or audio that has characteristics of a human voice. 12. A method, implemented by a computing device, for identifying surge points within audio music content, the method comprising: obtaining audio music content in a digitized format; generating a frequency spectrum of the music content using a short-time Fourier transform (STFT); analyzing the frequency spectrum to separate harmonic content and percussive content; using results of the analysis, generating an audio track representing vocal content within the music content; processing the audio track representing vocal content to identify at least one surge point within the music content; and outputting an indication of the at least one surge point. 13. The method of claim 12 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content. 14. The method of claim 12 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: in a first pass: generating the frequency spectrum using the STFT with a first frequency resolution; and performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content; and in a second pass: applying an STFT with a second frequency resolution to the harmonic content produced in the first pass; and performing median filtering to results of the STFT using the second frequency resolution to generating the audio track representing vocal content; wherein the second frequency resolution is higher than the first frequency resolution. 15. The method of claim 12 wherein processing the audio track representing vocal content to identify at least one surge point within the music content comprises: applying a low-pass filter to the audio track that removes features that are less than the length of a bar; and identifying the at least one surge point based, at least in part, upon the low-pass filtered audio track. 16. The method of claim 12 wherein the at least one surge point is a location within the music content where vocal power falls to a local minimum and then returns to a level higher than the vocal power was prior to the local minimum. 17. A computer-readable storage medium storing computer-executable instructions for causing a computing device to perform operations for identifying surge points within audio music content, the operations comprising: generating a frequency spectrum of at least a portion of digitized audio music content, wherein the frequency spectrum is generated with a short-time Fourier transform (STFT) with a first frequency resolution; performing median filtering on the frequency spectrum to separate harmonic content and percussive content, wherein the first frequency resolution is selected so that vocal content will be included with the harmonic content when the median filtering is performed to separate the harmonic content and the percussive content; applying an STFT with a second frequency resolution to the harmonic content, wherein the second frequency resolution is higher than the first frequency resolution; performing median filtering to results of the STFT using the second frequency resolution to generating audio data representing vocal content within the audio music content; processing the audio data representing vocal content to identify at least one surge point within the audio music content; and outputting an indication of the at least one surge point. 18. The computer-readable storage medium of claim 17 wherein processing the audio data representing vocal content to identify at least one surge point within the audio music content comprises: applying a low-pass filter to the audio data that removes features that are less than the length of a ba
characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title
characterised by the analysis technique · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
for comparison or discrimination · CPC title
for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.