What technology area does this patent fall under?

Primary CPC classification G10H1/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Audio processing techniques for semantic audio recognition and report generation

US9812109B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9812109-B2
Application number	US-201514885216-A
Country	US
Kind code	B2
Filing date	Oct 16, 2015
Priority date	Dec 21, 2012
Publication date	Nov 7, 2017
Grant date	Nov 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising: a processor to: extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with an audio timbre range, a beat range, a loudness range and a spectral histogram range. 2. The apparatus of claim 1 , wherein the tag is associated with at least one of a genre descriptor, an instrumentation descriptor, a style descriptor, an acoustical dynamics descriptor, or an emotive descriptor for the audio. 3. The apparatus of claim 1 , wherein the harmonic feature includes at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio, the rhythmic feature includes at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo for at least some of the audio, the spectral feature includes at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio, and the temporal feature includes at least one of amplitude, power, or zero crossing of at least some of the audio. 4. The apparatus of claim 1 , wherein the tag is associated with timber and the set of ranges includes a range for a mean of a spectral centroid, a range for a variance of the spectral centroid, and a range of a percentage of low/high energy frames. 5. The apparatus of claim 1 , wherein the tag is associated with beat and the set of ranges includes a range for an amplitude of peaks in a beat histogram, a range for periods of peaks in the beat histogram, and a range for a ratio between a peak and a sum of all peaks in the beat histogram. 6. The apparatus of claim 1 , wherein the tag is associated with pitch and the set of ranges includes a range for an amplitude of prominent peaks in a pitch histogram, and a range for periods of peaks in the pitch histogram, wherein the pitch histogram is on a full semitone scale or an octave independent scale. 7. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising: a processor to: extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with timber and includes a range for a mean of a spectral centroid, a range for a variance of the spectral centroid, and a range of a percentage of low/high energy frames. 8. The apparatus of claim 7 , wherein the tag is associated with at least one of a genre descriptor, an instrumentation descriptor, a style descriptor, an acoustical dynamics descriptor, or an emotive descriptor for the audio. 9. The apparatus of claim 7 , wherein the tag is further associated with beat and the set of ranges further includes a range for an amplitude of peaks in a beat histogram, a range for periods of peaks in the beat histogram, and a range for a ratio between a peak and a sum of all peaks in the beat histogram. 10. The apparatus of claim 7 , wherein the tag is further associated with pitch and the set of ranges further includes a range for an amplitude of prominent peaks in a pitch histogram, a range for periods of peaks in the pitch histogram, wherein the pitch histogram is on a full semitone scale or an octave independent scale. 11. The apparatus of claim 7 , wherein the harmonic feature includes at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio, the rhythmic feature includes at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo for at least some of the audio, the spectral feature includes at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio, and the temporal feature includes at least one of amplitude, power, or zero crossing of at least some of the audio. 12. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising: a processor to: extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with beat and includes a range for an amplitude of peaks in a beat histogram, a range for periods of peaks in the beat histogram, and a range for a ratio between a peak and a sum of all peaks in the beat histogram. 13. The apparatus of claim 12 , wherein the tag is associated with at least one of a genre descriptor, an instrumentation descriptor, a style descriptor, an acoustical dynamics descriptor, or an emotive descriptor for the audio. 14. The apparatus of claim 12 , wherein the tag is further associated with timber and the set of ranges further includes a range for a mean of a spectral centroid, a range for a variance of the spectral centroid, and a range of a percentage of low/high energy frames. 15. The apparatus of claim 12 , wherein the tag is further associated with pitch and the set of ranges further includes a range for an amplitude of prominent peaks in a pitch histogram, and a range for periods of peaks in the pitch histogram, wherein the pitch histogram is on a full semitone scale or an octave independent scale. 16. The apparatus of claim 12 , wherein the harmonic feature includes at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio, the rhythmic feature includes at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo for at

Assignees

Nielsen Co Us Llc

Inventors

Classifications

G06F40/40
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
G10H2210/071
for rhythm pattern analysis or rhythm style recognition · CPC title
G10H1/40Primary
Rhythm · CPC title
G10H2210/036
of musical genre, i.e. analysing the style of musical pieces, usually for selection, filtering or classification · CPC title
G10L15/1815Primary
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

View patent family 50975660

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9812109B2 cover?: System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted a…
Who is the assignee on this patent?: Nielsen Co Us Llc
What technology area does this patent fall under?: Primary CPC classification G10H1/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).