Non-negative matrix factorization regularized by recurrent neural networks for audio processing

US9721202B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9721202-B2
Application numberUS-201414186832-A
CountryUS
Kind codeB2
Filing dateFeb 21, 2014
Priority dateFeb 21, 2014
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Sound processing techniques using recurrent neural networks are described. In one or more implementations, temporal dependencies are captured in sound data that are modeled through use of a recurrent neural network (RNN). The captured temporal dependencies are employed as part of feature extraction performed using nonnegative matrix factorization (NMF). One or more sound processing techniques are performed on the sound data based at least in part on the feature extraction.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by at least one computing device, the method comprising: capturing, by the at least one computing device, temporal dependencies in sound data modeled through use of a recurrent neural network (RNN); extracting, by the at least one computing device, features from the sound data based on the captured temporal dependencies as a negative log-likelihood term of a nonnegative matrix factorization (NMF) cost using nonnegative matrix factorization (NMF); and performing, by the at least one computing device, one or more sound processing techniques on the sound data based at least in part on the extracted features. 2. A method as described in claim 1 , wherein the recurrent neural network models the temporal dependencies in a temporal sequence of frames in the sound data. 3. A method as described in claim 2 , wherein the frames are configured as part of a magnitude spectrogram. 4. A method as described in claim 2 , wherein the frames are configured as vectors in an activity matrix. 5. A method as described in claim 2 , wherein the recurrent neural network captures long-term temporal dependencies and event co-occurrence in the sound data. 6. A method as described in claim 5 , wherein the long-term temporal dependencies describe a plurality of frames in the temporal sequence that includes a frame, a preceding frame, and at least one other frame. 7. A method as described in claim 1 , wherein the RNN is employed as part of one or more time-dependent restricted Boltzmann machines (RBM) to describe multimodal conditional densities in the sound data. 8. A method as described in claim 1 , wherein the recurrent neural network (RNN) is configured to capture the temporal dependencies by discovering an approximate factorization of an input matrix that describes an observed magnitude spectrogram of the sound data having time and frequency dimensions. 9. A method as described in claim 1 , wherein the NMF is configured to utilize a Cosine distance as a cost distance in nonnegative matrix factorization to generate a likelihood that a respective sound source generated a respective portion of the sound data. 10. A method as described in claim 1 , wherein the temporal information obtained using the recurrent neural network (RNN) is used as part of nonnegative matrix factorization (NMF) to predict plausibility of decomposition of sound data as part of the feature extraction. 11. A method as described in claim 10 , wherein the predicted plausibility of decomposition of the sound data is predicted as a density of activity matrices used as part of nonnegative matrix factorization (NMF). 12. A system comprising: at least one computing device having a processor and memory configured to perform operations comprising: capturing temporal dependencies in sound data modeled through use of a recurrent neural network (RNN); extracting features from the sound data based on the captured temporal dependencies as a negative log-likelihood term of a nonnegative matrix factorization (NMF) cost using nonnegative matrix factorization (NMF); and performing one or more sound processing techniques on the sound data based at least in part on the extracted features. 13. A system as described in claim 12 , wherein the recurrent neural network captures long-term temporal dependencies and event co-occurrence in the sound data. 14. A system as described in claim 13 , wherein the long-term temporal dependencies describe a plurality of frames in the temporal sequence that includes the frame, the preceding frame, and the at least one other frame. 15. A system as described in claim 12 , wherein the recurrent neural network (RNN) is configured to capture the temporal dependencies by discovering an approximate factorization of an input matrix that describes an observed magnitude spectrogram of the sound data having time and frequency dimensions. 16. A system as described in claim 12 , wherein the NMF is configured to utilize a Cosine distance as a cost distance in nonnegative matrix factorization to generate a likelihood that a respective sound source generated a respective portion of the sound data. 17. A system comprising: means for capturing temporal dependencies in sound data modeled through use of a recurrent neural network (RNN); means for extracting features from the sound data based on the captured temporal dependencies as a negative log-likelihood term of a nonnegative matrix factorization (NMF) cost using nonnegative matrix factorization (NMF); and means for performing one or more sound processing techniques on the sound data based at least in part on the extracted features. 18. A system as described in claim 17 , wherein the recurrent neural network models the temporal dependencies in a temporal sequence of frames in the sound data. 19. A system as described in claim 18 , wherein the frames are configured as part of a magnitude spectrogram. 20. A system as described in claim 18 , wherein the frames are configured as vectors in an activity matrix. 21. A system as described in claim 18 , wherein the recurrent neural network captures long-term temporal dependencies and event co-occurrence in the sound data.

Assignees

Inventors

Classifications

  • Voice signal separating · CPC title

  • characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • using properties of sound source · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9721202B2 cover?
Sound processing techniques using recurrent neural networks are described. In one or more implementations, temporal dependencies are captured in sound data that are modeled through use of a recurrent neural network (RNN). The captured temporal dependencies are employed as part of feature extraction performed using nonnegative matrix factorization (NMF). One or more sound processing techniques a…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/044. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).