Source separation by independent component analysis with moving constraint
US-9099096-B2 · Aug 4, 2015 · US
US9390712B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9390712-B2 |
| Application number | US-201414223468-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 24, 2014 |
| Priority date | Mar 24, 2014 |
| Publication date | Jul 12, 2016 |
| Grant date | Jul 12, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
Opening claim text (preview).
What is claimed is: 1. A method performed by a computer processor for recognizing mixed speech from a source, comprising: training a first neural network to recognize a speech signal spoken by a speaker with a higher level of a speech characteristic from a mixed speech sample; training a second neural network to recognize a speech signal spoken by a speaker with a lower level of the speech characteristic from the mixed speech sample, wherein the lower level is lower than the higher level; and decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals. 2. The method of claim 1 , comprising decoding by considering a probability that a specific frame is a switching point of the speakers. 3. The method of claim 2 , comprising compensating for the switching point occurring in a decoding process based on the switching probability estimated from another neural network. 4. The method of claim 1 , the mixed speech sample comprising a single audio channel, the single audio channel being generated by a microphone. 5. The method of claim 1 , the speech characteristic comprising one of: instantaneous energy in a frame of the mixed speech sample; energy; and pitch. 6. The method of claim 1 , comprising: training a third neural network to predict speech characteristic switching; predicting whether energy is switching from one frame to a next frame; and decoding the mixed speech sample based on the prediction. 7. The method of claim 6 , comprising weighting against the likelihood of energy switching in a frame subsequent to a frame where energy switching is predicted. 8. A system for recognizing mixed speech from a source, the system comprising: a first neural network comprising a first plurality of interconnected systems; and a second neural network comprising a second plurality of interconnected systems, each interconnected system, comprising: a processing unit; and a system memory, wherein the system memory comprises code configured to direct the processing unit to: train the first neural network to recognize a higher level of a speech characteristic in a first speech signal from a mixed speech sample; train the second neural network to recognize a lower level of the speech characteristic in a second speech signal from the mixed speech sample, wherein the lower level is lower than the higher level; and decode the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals. 9. The system of claim 8 , comprising code configured to decode the mixed speech sample by considering a probability that a specific frame is a switching point of the speech characteristic. 10. The system of claim 8 , comprising code configured to direct the processing unit to compensate for the switching point occurring in a decoding process based on the probability estimated from a neural network. 11. The system of claim 8 , the first neural network and the second neural network comprising deep neural networks. 12. The system of claim 8 , the speech characteristic comprising a selected one of pitch, energy, and instantaneous energy, in a frame of the mixed speech sample. 13. The system of claim 8 , comprising code configured to direct the processing unit to: train a third neural network to predict energy switching; predict whether energy is switching from one frame to a next frame; and decode the mixed speech sample based on the prediction. 14. The system of claim 13 , comprising weighting against the likelihood of energy switching in a frame subsequent to a frame where energy switching is predicted. 15. One or more computer-readable storage memory devices for storing computer-readable instructions, the computer-readable instructions when executed by one or more processing devices, the computer-readable instructions comprising code configured to: train a first neural network to recognize a higher level of a speech characteristic in a first speech signal from a mixed speech sample comprising a single audio channel; train a second neural network to recognize a lower level of the speech characteristic in a second speech signal from the mixed speech sample; train a third neural network to estimate a switching probability for each frame; and decode the mixed speech sample with the first neural network, the second neural network, and the third neural network by optimizing the joint likelihood of observing the two speech signals, the joint likelihood meaning a probability that a specific frame is a switching point of the speech characteristic. 16. The computer-readable storage memory devices of claim 15 , comprising code configured to decode the mixed speech sample by considering a probability that a specific frame is a switching point of the speech characteristic. 17. The computer-readable storage memory devices of claim 15 , comprising code configured to compensate for the switching point occurring in a decoding process based on the joint likelihood. 18. The computer-readable storage memory devices of claim 15 , wherein the speech characteristic is a selected one of energy, pitch, and instantaneous energy in a frame of the mixed speech sample. 19. The computer-readable storage memory devices of claim 15 , wherein the speech characteristic is instantaneous energy in a frame of the mixed speech sample. 20. The computer-readable storage memory devices of claim 15 , comprising code configured to: train a third neural network to predict energy switching; predict whether energy is switching from one frame to a next frame; and decode the mixed speech sample based on the prediction.
the extracted parameters being power information · CPC title
Training · CPC title
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
using artificial neural networks · CPC title
Pitch determination of speech signals · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.