Who is the assignee on this patent?

Mitsubishi Electric Res Laboratories Inc

What technology area does this patent fall under?

Primary CPC classification G10L19/06. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Methods and systems for end-to-end speech separation with unfolded iterative phase reconstruction

US10529349B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10529349-B2
Application number	US-201815983256-A
Country	US
Kind code	B2
Filing date	May 18, 2018
Priority date	Apr 16, 2018
Publication date	Jan 7, 2020
Grant date	Jan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequence using a spectrogram refinement module, to output a set of refined spectrograms. Wherein the processing of the spectrogram refinement module is based on an iterative reconstruction algorithm. Processing the set of refined spectrograms for the one or more target audio signals using a signal refinement module, to obtain the target audio signal estimates. An output interface to output the optimized target audio signal estimates. Wherein the module is optimized by minimizing an error using an optimizer stored in the memory.

First claim

Opening claim text (preview).

What is claimed is: 1. An audio signal processing system for transforming an input audio signal, wherein the input audio signal includes a mixture of one or more target audio signals, the audio signal processing system comprising: a memory including stored executable instructions and a stored module, such that the stored module transforms the input audio signal to obtain target audio signal estimates; an input interface to receive the input audio signal, a processor in communication with the memory and the input interface, wherein the processor implements steps of the stored module by a spectrogram estimator of the stored module to extract an audio feature sequence from the input audio signal, and process the audio feature sequence to output a set of estimated spectrograms, wherein the set of estimated spectrograms includes an estimated spectrogram for each target audio signal; a spectrogram refinement module of the stored module to process the set of estimated spectrograms and the audio feature sequence, to output a set of refined spectrograms, such that the set of refined spectrograms includes a refined spectrogram for each target audio signal, and wherein using the spectrogram refinement module is based on an iterative reconstruction algorithm; a signal refinement module of the stored module to process the set of refined spectrograms for the one or more target audio signals, to obtain target audio signal estimates, such that there is a target audio signal estimate for each target audio signal; and an output interface to output the target audio signal estimates, wherein parameters of the stored module are trained using training data by minimizing an error using an optimizer stored in the memory, wherein the error includes one or more of an error on the set of refined spectrograms, an error including a consistency measurement on the set of refined spectrograms, or an error on the target audio signal estimates. 2. The audio signal processing system of claim 1 , wherein the spectrogram estimator uses a deep neural network. 3. The audio signal processing system of claim 1 , wherein the spectrogram estimator includes a mask estimation module which outputs a mask estimate value for each target audio signal, and a spectrogram estimate output module which uses the mask estimate value for the one or more target audio signals and the input audio signal, to output the estimated spectrogram for each target audio signal. 4. The audio signal processing system of claim 3 , wherein at least one mask estimate value is greater than 1. 5. The audio signal processing system of claim 1 , wherein the spectrogram refinement module comprises: defining an iterative procedure acting on the set of estimated spectrograms and the input audio feature sequence; unfolding the iterative procedure into a set of layers, such that there is one layer for each iteration of the iterative procedure, and wherein each layer includes a set of fixed network parameters; forming a neural network using fixed network parameters from the sets of fixed network parameters of layers of previous iterations, as variables to be trained, and untying these variables across the layers of previous iterations, by using the variables as separate variables as each variable is separately applicable to their corresponding layer; training the neural network to obtain a trained neural network; and transforming the set of estimated spectrograms and the audio feature sequence using the trained neural network to obtain the set of refined spectrograms. 6. The audio signal processing system of claim 1 , wherein the iterative reconstruction algorithm is an iterative phase reconstruction algorithm. 7. The audio signal processing system of claim 6 , wherein the iterative phase reconstruction algorithm is the Multiple Input Spectrogram Inversion (MISI) algorithm. 8. The audio signal processing system of claim 6 , wherein the iterative phase reconstruction algorithm is the Griffin-Lim algorithm. 9. The audio signal processing system of claim 1 , wherein the error on the target audio signal estimates includes a distance between the target audio signal estimates and reference target audio signals. 10. The audio signal processing system of claim 1 , wherein the error on the target audio signal estimates includes a distance between the estimated spectrograms of target audio signal and the refined spectrograms of the target audio signals. 11. The audio signal processing system of claim 1 , wherein the spectrogram estimator includes a feature extraction module, such that the feature extraction module extracts the input audio signal from the input audio signal. 12. The audio signal processing system of claim 1 , wherein a received audio signal includes one or more of one or more speakers, noise, music, environmental sounds, machine sound. 13. The audio signal processing system of claim 1 , wherein the error further includes an error on the set of estimated spectrograms. 14. A method for transforming input audio signals, comprising the steps of: using a module for transforming an input audio signal of the input audio signals, such that the input audio signal includes a mixture of one or more target audio signals, wherein the module transforms the input audio signal, to obtain target audio signal estimates; using a spectrogram estimator of the model, to extract an audio feature sequence from the input audio signal, and process the audio feature sequence to output a set of estimated spectrograms, wherein the set of estimated spectrograms includes an estimated spectrogram for each target audio signal; using a spectrogram refinement module of the module to process the set of estimated spectrograms and the audio feature sequence, to output a set of refined spectrograms, such that the set of refined spectrograms includes a refined spectrogram for each target audio signal, and wherein using the spectrogram refinement module is based on an iterative reconstruction algorithm; using a signal refinement module of the module to process the set of refined spectrograms for the one or more target audio signals, to obtain target audio signal estimates, such that there is a target audio signal estimate for each target audio signal; and outputting the target audio signal estimates, wherein parameters of the stored module are trained using training data by minimizing an error using an optimizer stored in a memory, wherein the error includes one or more of an error on the set of refined spectrograms, an error including a consistency measurement on the set of refined spectrograms, or an error on the target audio signal estimates, and wherein the steps are performed by a processor in communication with an output device and the memory having stored executable instructions, such that the module is stored in the memory. 15. The method of claim 14 , wherein the spectrogram estimator includes a mask estimation module which outputs a mask estimate value for each target audio signal, and a spectrogram estimate output module which uses the mask estimate value for the one or more target audio signals and the input audio signal, to output the estimated spectrogram for each target audio signal, wherein at least one mask estimate value is greater than 1. 16. The method of claim 14 , wherein the processing of the spectrogram refinement module comprises: defining an iterative procedure acting on the set of estimated spectrograms and the input audio feature sequence; unfolding the iterative procedure into a set of layers, such that there is one layer for each iteration of the iterative procedure, and

Assignees

Mitsubishi Electric Res Laboratories Inc

Inventors

Classifications

G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G06N3/048
Activation functions · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title
G10L25/30
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 68161902

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10529349B2 cover?: Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequenc…
Who is the assignee on this patent?: Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?: Primary CPC classification G10L19/06. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).