Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L25/18. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Analyzing changes in vocal power within music content using frequency spectrums

US9852745B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9852745-B1
Application number	US-201615331651-A
Country	US
Kind code	B1
Filing date	Oct 21, 2016
Priority date	Jun 24, 2016
Publication date	Dec 26, 2017
Grant date	Dec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive content. The vocal content can then be processed to identify surge points in the digitized audio. In some implementations, the vocal content is included in the harmonic content during the separation procedure and is then separated from the harmonic content.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing device comprising: a processing unit; and memory; the computing device configured to perform operations for identifying surge points within audio music content, the operations comprising: generating a frequency spectrum of at least a portion of digitized audio music content; analyzing the frequency spectrum to separate harmonic content and percussive content; using results of the analysis, generating an audio track representing vocal content within the audio music content; and processing the audio track representing vocal content to identify at least one surge point within the audio music content. 2. The computing device of claim 1 wherein generating the frequency spectrum comprises: applying a short-time Fourier transform (STFT) to the audio music content. 3. The computing device of claim 1 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content. 4. The computing device of claim 1 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: in a first pass: generating the frequency spectrum with an STFT with a first frequency resolution; and performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content; and in a second pass: applying an STFT with a second frequency resolution to the harmonic content produced in the first pass; and performing median filtering to results of the STFT using the second frequency resolution to generating the audio track representing vocal content; wherein the second frequency resolution is higher than the first frequency resolution. 5. The computing device of claim 4 wherein the STFT in the first pass uses a first window size, and wherein the STFT in the second pass uses a second window size that is larger than the first window size. 6. The computing device of claim 1 wherein generating the audio track representing vocal content within the music content comprises: performing filtering on the harmonic content. 7. The computing device of claim 1 wherein processing the audio track representing vocal content to identify at least one surge point within the music content comprises: applying a low-pass filter to the audio track that removes features that are less than the length of a bar; and identifying the at least one surge point based, at least in part, upon the low-pass filtered audio track. 8. The computing device of claim 1 wherein processing the audio track representing vocal content to identify at least one surge point within the music content comprises: applying a band-pass filter to the audio track; and identifying the at least one surge point based, at least in part, upon the band-pass filtered audio track. 9. The computing device of claim 1 wherein processing the audio track representing vocal content to identify at least one surge point comprises: filtering the audio track using a low-pass filter or a band-pass filter; applying one or more of a depth classifier, a width classifier, a bar energy classifier, or a beat energy classifier to the filtered audio track; and using result of the one or more classifiers to identify the at least one surge point. 10. The computing device of claim 1 wherein the at least one surge point is a location within the music content where vocal power falls to a local minimum and then returns to a level higher than the vocal power was prior to the local minimum. 11. The computing device of claim 1 wherein the vocal content is a human voice or audio that has characteristics of a human voice. 12. A method, implemented by a computing device, for identifying surge points within audio music content, the method comprising: obtaining audio music content in a digitized format; generating a frequency spectrum of the music content using a short-time Fourier transform (STFT); analyzing the frequency spectrum to separate harmonic content and percussive content; using results of the analysis, generating an audio track representing vocal content within the music content; processing the audio track representing vocal content to identify at least one surge point within the music content; and outputting an indication of the at least one surge point. 13. The method of claim 12 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content. 14. The method of claim 12 wherein analyzing the frequency spectrum to separate harmonic content and percussive content comprises: in a first pass: generating the frequency spectrum using the STFT with a first frequency resolution; and performing median filtering on the frequency spectrum to separate the harmonic content and the percussive content; and in a second pass: applying an STFT with a second frequency resolution to the harmonic content produced in the first pass; and performing median filtering to results of the STFT using the second frequency resolution to generating the audio track representing vocal content; wherein the second frequency resolution is higher than the first frequency resolution. 15. The method of claim 12 wherein processing the audio track representing vocal content to identify at least one surge point within the music content comprises: applying a low-pass filter to the audio track that removes features that are less than the length of a bar; and identifying the at least one surge point based, at least in part, upon the low-pass filtered audio track. 16. The method of claim 12 wherein the at least one surge point is a location within the music content where vocal power falls to a local minimum and then returns to a level higher than the vocal power was prior to the local minimum. 17. A computer-readable storage medium storing computer-executable instructions for causing a computing device to perform operations for identifying surge points within audio music content, the operations comprising: generating a frequency spectrum of at least a portion of digitized audio music content, wherein the frequency spectrum is generated with a short-time Fourier transform (STFT) with a first frequency resolution; performing median filtering on the frequency spectrum to separate harmonic content and percussive content, wherein the first frequency resolution is selected so that vocal content will be included with the harmonic content when the median filtering is performed to separate the harmonic content and the percussive content; applying an STFT with a second frequency resolution to the harmonic content, wherein the second frequency resolution is higher than the first frequency resolution; performing median filtering to results of the STFT using the second frequency resolution to generating audio data representing vocal content within the audio music content; processing the audio data representing vocal content to identify at least one surge point within the audio music content; and outputting an indication of the at least one surge point. 18. The computer-readable storage medium of claim 17 wherein processing the audio data representing vocal content to identify at least one surge point within the audio music content comprises: applying a low-pass filter to the audio data that removes features that are less than the length of a ba

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G10L21/0308
characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title
G10L25/27
characterised by the analysis technique · CPC title
G10L25/18Primary
the extracted parameters being spectral information of each sub-band · CPC title
G10L25/51
for comparison or discrimination · CPC title
G10H2210/051
for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings · CPC title

Patent family

Related publications grouped by family.

View patent family 60674386

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9852745B1 cover?: Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive con…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L25/18. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).