Analyzing audio input for efficient speech and music recognition
US-2015332667-A1 · Nov 19, 2015 · US
US2017193362A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017193362-A1 |
| Application number | US-201615185616-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 17, 2016 |
| Priority date | Jan 3, 2016 |
| Publication date | Jul 6, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A neural network-based classifier system can receive a query including a media signal and, in response, provide an indication that a particular received query corresponds to a known media type or media class. The neural network-based classifier system can select and apply various models to facilitate media classification. In an example embodiment, classifying a media query includes accessing digital media data and a context parameter from a first device. A model for use with the network-based classifier system can be selected based on the context parameter. In an example embodiment, the network-based classifier system provides a media type probability index for the digital media data using the selected model and spectral features corresponding to the digital media data. In an example embodiment, the digital media data includes an audio or video signal sample.
Opening claim text (preview).
What is claimed is: 1 . A method for classifying media, the method comprising: accessing, using one or more processor circuits, digital media data that represents a media query to be identified, the digital media data provided by a first remote device; accessing, using the one or more processor circuits, a first context parameter that corresponds to the media query to be identified, the first context parameter provided by the same first remote device; determining, using the one or more processor circuits, spectral features corresponding to the digital media data; selecting, using the one or more processor circuits, a first classification model stored in a database, the first classification model being one of a plurality of different classification models stored in the database, the selecting based on the first context parameter; determining, using the one or more processor circuits, a media type probability index for the media query using the first classification model and the determined spectral features corresponding to the digital media data, wherein the determined media type probability index indicates a likelihood that the media query corresponds to at least one media characteristic of a plurality of different media characteristics; and receiving, at the first remote device, one or both of the media type probability index and the at least one media characteristic. 2 . The method of claim 1 , further comprising: identifying, using the one or more processor circuits, a change in the digital media data or a change in the first context parameter and, in response, selecting a different second classification model from among the plurality of different classification models; and determining, using the one or more processor circuits, an updated media type probability index using the different second classification model. 3 . The method of claim 1 , wherein the selecting the first classification model from among a plurality of different classification models includes selecting one or more of the determined spectral features and using information about the selected one or more features with a first portion of a neural network, and wherein the determining the media type probability index includes using an output of the neural network. 4 . The method of claim 1 , wherein the determining the media type probability index includes using a neural network with the selected first classification model to provide an indication of a likelihood that the digital media data corresponds to a specified audio event or specified visual event, wherein the neural network is previously trained using a priori information about the specified audio event or the specified visual event. 5 . The method of claim 1 , wherein the accessing the first context parameter includes accessing a context parameter that indicates that the digital media data includes audio data received by a microphone of a mobile device; wherein the selecting the first classification model includes selecting a speech/music classification model for mobile devices; and wherein the determining the media type probability index includes using the selected speech/music classification model for mobile devices and using the determined spectral features corresponding to the digital media data that includes the audio data received by the microphone of the mobile device. 6 . The method of claim 1 , wherein the accessing the first context parameter includes accessing a context parameter that indicates that the digital media data includes audio data received from a television broadcast; wherein the selecting the first classification model includes selecting a speech/music classification model for television broadcast; and wherein the determining the audio type probability index includes using the selected speech/music classification model for television broadcast and using the determined spectral features corresponding to the digital media data that includes the audio data received from the television broadcast. 7 . The method of claim 1 , wherein the accessing the first context parameter includes accessing an indication of a source type of the digital media data, and wherein the source type includes one or more of a mobile device, a broadcast video or broadcast audio stream, a local signal source, or a microphone signal source. 8 . The method of claim 1 , further comprising: accessing, using the one or more processor circuits, a second context parameter that corresponds to the media query to be identified, wherein the second context parameter is provided by the same first remote device or a different device; determining, using the one or more processor circuits, search scope characteristics that are respectively associated with each of the first and second context parameters; and selecting, from the database and using the one or more processor circuits, the one of the first and second context parameters associated with a narrower search scope; wherein the selecting the first classification model includes using the selected one of the first and second context parameters associated with the narrower search scope. 9 . The method of claim 1 , further comprising: accessing, using the one or more processor circuits, a second context parameter that corresponds to the media query to be identified; determining, using the one or more processor circuits, signal quality characteristics that are respectively associated with each of the first and second context parameters, and selecting, using the one or more processor circuits, one of the first and second context parameters based on the determined respective signal quality characteristics; wherein the selecting the first classification model includes using the selected one of the first and second context parameters. 10 . The method of claim 1 , wherein the accessing the first context parameter that corresponds to the media query includes accessing context information that temporally coincides with the media query to be identified. 11 . The method of claim 1 , wherein the accessing the first context parameter includes determining the first context parameter using a determined characteristic of a sampled portion of the digital media data itself. 12 . The method of claim 11 , wherein the determining the first context parameter using the media data itself includes determining whether the media data includes one or more of previously-recorded music, live music, speech, television audio, movie audio, game audio, or other audio. 13 . The method of claim 1 , wherein the accessing the first context parameter includes receiving context information from a sensor device associated with the first remote device, the sensor device including one or more of a GPS or location sensor, an accelerometer, a microphone, a clock or timer circuit, or a user input. 14 . The method of claim 1 , further comprising analyzing the determined spectral features corresponding to the digital media data to determine whether a threshold change has occurred in the media query since earlier digital media data was accessed; and if the threshold change has not occurred, then inhibiting the determining the media type probability index. 15 . The method of claim 1 , wherein the accessing the digital media data includes periodically or intermittently sampling audio data from a continuous query sound source; and wherein the determining the media type probability index includes determining an audio type probability index for each of the respective periodically or intermittently sampled audio data. 16 . The method of claim 1
Architecture, e.g. interconnection topology · CPC title
Combinations of networks · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Clustering; Classification · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.