Acoustic source tracking and selection
US-2016071526-A1 · Mar 10, 2016 · US
US10529349B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10529349-B2 |
| Application number | US-201815983256-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 18, 2018 |
| Priority date | Apr 16, 2018 |
| Publication date | Jan 7, 2020 |
| Grant date | Jan 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequence using a spectrogram refinement module, to output a set of refined spectrograms. Wherein the processing of the spectrogram refinement module is based on an iterative reconstruction algorithm. Processing the set of refined spectrograms for the one or more target audio signals using a signal refinement module, to obtain the target audio signal estimates. An output interface to output the optimized target audio signal estimates. Wherein the module is optimized by minimizing an error using an optimizer stored in the memory.
Opening claim text (preview).
What is claimed is: 1. An audio signal processing system for transforming an input audio signal, wherein the input audio signal includes a mixture of one or more target audio signals, the audio signal processing system comprising: a memory including stored executable instructions and a stored module, such that the stored module transforms the input audio signal to obtain target audio signal estimates; an input interface to receive the input audio signal, a processor in communication with the memory and the input interface, wherein the processor implements steps of the stored module by a spectrogram estimator of the stored module to extract an audio feature sequence from the input audio signal, and process the audio feature sequence to output a set of estimated spectrograms, wherein the set of estimated spectrograms includes an estimated spectrogram for each target audio signal; a spectrogram refinement module of the stored module to process the set of estimated spectrograms and the audio feature sequence, to output a set of refined spectrograms, such that the set of refined spectrograms includes a refined spectrogram for each target audio signal, and wherein using the spectrogram refinement module is based on an iterative reconstruction algorithm; a signal refinement module of the stored module to process the set of refined spectrograms for the one or more target audio signals, to obtain target audio signal estimates, such that there is a target audio signal estimate for each target audio signal; and an output interface to output the target audio signal estimates, wherein parameters of the stored module are trained using training data by minimizing an error using an optimizer stored in the memory, wherein the error includes one or more of an error on the set of refined spectrograms, an error including a consistency measurement on the set of refined spectrograms, or an error on the target audio signal estimates. 2. The audio signal processing system of claim 1 , wherein the spectrogram estimator uses a deep neural network. 3. The audio signal processing system of claim 1 , wherein the spectrogram estimator includes a mask estimation module which outputs a mask estimate value for each target audio signal, and a spectrogram estimate output module which uses the mask estimate value for the one or more target audio signals and the input audio signal, to output the estimated spectrogram for each target audio signal. 4. The audio signal processing system of claim 3 , wherein at least one mask estimate value is greater than 1. 5. The audio signal processing system of claim 1 , wherein the spectrogram refinement module comprises: defining an iterative procedure acting on the set of estimated spectrograms and the input audio feature sequence; unfolding the iterative procedure into a set of layers, such that there is one layer for each iteration of the iterative procedure, and wherein each layer includes a set of fixed network parameters; forming a neural network using fixed network parameters from the sets of fixed network parameters of layers of previous iterations, as variables to be trained, and untying these variables across the layers of previous iterations, by using the variables as separate variables as each variable is separately applicable to their corresponding layer; training the neural network to obtain a trained neural network; and transforming the set of estimated spectrograms and the audio feature sequence using the trained neural network to obtain the set of refined spectrograms. 6. The audio signal processing system of claim 1 , wherein the iterative reconstruction algorithm is an iterative phase reconstruction algorithm. 7. The audio signal processing system of claim 6 , wherein the iterative phase reconstruction algorithm is the Multiple Input Spectrogram Inversion (MISI) algorithm. 8. The audio signal processing system of claim 6 , wherein the iterative phase reconstruction algorithm is the Griffin-Lim algorithm. 9. The audio signal processing system of claim 1 , wherein the error on the target audio signal estimates includes a distance between the target audio signal estimates and reference target audio signals. 10. The audio signal processing system of claim 1 , wherein the error on the target audio signal estimates includes a distance between the estimated spectrograms of target audio signal and the refined spectrograms of the target audio signals. 11. The audio signal processing system of claim 1 , wherein the spectrogram estimator includes a feature extraction module, such that the feature extraction module extracts the input audio signal from the input audio signal. 12. The audio signal processing system of claim 1 , wherein a received audio signal includes one or more of one or more speakers, noise, music, environmental sounds, machine sound. 13. The audio signal processing system of claim 1 , wherein the error further includes an error on the set of estimated spectrograms. 14. A method for transforming input audio signals, comprising the steps of: using a module for transforming an input audio signal of the input audio signals, such that the input audio signal includes a mixture of one or more target audio signals, wherein the module transforms the input audio signal, to obtain target audio signal estimates; using a spectrogram estimator of the model, to extract an audio feature sequence from the input audio signal, and process the audio feature sequence to output a set of estimated spectrograms, wherein the set of estimated spectrograms includes an estimated spectrogram for each target audio signal; using a spectrogram refinement module of the module to process the set of estimated spectrograms and the audio feature sequence, to output a set of refined spectrograms, such that the set of refined spectrograms includes a refined spectrogram for each target audio signal, and wherein using the spectrogram refinement module is based on an iterative reconstruction algorithm; using a signal refinement module of the module to process the set of refined spectrograms for the one or more target audio signals, to obtain target audio signal estimates, such that there is a target audio signal estimate for each target audio signal; and outputting the target audio signal estimates, wherein parameters of the stored module are trained using training data by minimizing an error using an optimizer stored in a memory, wherein the error includes one or more of an error on the set of refined spectrograms, an error including a consistency measurement on the set of refined spectrograms, or an error on the target audio signal estimates, and wherein the steps are performed by a processor in communication with an output device and the memory having stored executable instructions, such that the module is stored in the memory. 15. The method of claim 14 , wherein the spectrogram estimator includes a mask estimation module which outputs a mask estimate value for each target audio signal, and a spectrogram estimate output module which uses the mask estimate value for the one or more target audio signals and the input audio signal, to output the estimated spectrogram for each target audio signal, wherein at least one mask estimate value is greater than 1. 16. The method of claim 14 , wherein the processing of the spectrogram refinement module comprises: defining an iterative procedure acting on the set of estimated spectrograms and the input audio feature sequence; unfolding the iterative procedure into a set of layers, such that there is one layer for each iteration of the iterative procedure, and
Backpropagation, e.g. using gradient descent · CPC title
Activation functions · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.