Dynamic selection of appropriate far-field signal separation algorithms
US-2024257825-A1 · Aug 1, 2024 · US
US9837102B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9837102-B2 |
| Application number | US-201414321813-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 2, 2014 |
| Priority date | Jul 2, 2014 |
| Publication date | Dec 5, 2017 |
| Grant date | Dec 5, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Examples of the disclosure describe user environment aware single channel acoustic noise reduction. A noisy signal received by a computing device is transformed and feature vectors of the received noisy signal are determined. The computing device accesses classification data corresponding to a plurality of user environments. The classification data for each user environment has associated therewith a noise model. A comparison is performed between the determined feature vectors and the accessed classification data to identify a current user environment. A noise level, a speech level, and a speech presence probability from the transformed noisy signal are estimated and the noise signal is reduced based on the estimates. The resulting signal is outputted as an enhanced signal with a reduced or eliminated noise signal.
Opening claim text (preview).
What is claimed is: 1. A system for providing user environment aware acoustic noise reduction, said system comprising: a memory area for storing classification data corresponding to a plurality of locations, the classification data for each location including a noise model associated therewith; and a processor programmed to: receive a noisy signal and create a transform of the received noisy signal, the noisy signal including a speech signal and a noise signal; determine feature vectors of the received noisy signal; compare the determined feature vectors with the classification data stored in the memory area to identify a current user location; based on the noise model associated with the identified current user location, estimate a noise level and a speech level for the transformed noisy signal; estimate a speech presence probability based on the noise model, the estimated noise level, and the estimated speech level for the transformed noisy signal; based on the estimated noise level, the estimated speech level, the estimated speech presence probability, and the noise model associated with the identified current user location, reduce the noise signal from the transformed noisy signal; and upon reducing the noise signal, output an enhanced version of the speech signal from the noisy signal. 2. The system of claim 1 , wherein the processor is further programmed to confirm the identified current user location upon determining that the identified current user location is the same for at least a predefined number of frames of the noisy signal. 3. The system of claim 1 , wherein the noisy signal includes only the noise signal for at least a predefined time period upon beginning the identification of the current user location. 4. The system of claim 1 , wherein the memory area further stores a running signal-to-noise ratio (SNR) histogram for each of the plurality of locations and the processor is programmed to identify the current user location by: calculating a SNR histogram for the received noisy signal; comparing the calculated SNR histogram with the SNR histograms for the plurality of locations; and identifying the current user location based on the comparison. 5. The system of claim 1 , wherein the processor is further programmed to repeat, after a predefined time period, identification of the current user location based on an updated noisy signal. 6. The system of claim 1 , wherein the noise model associated with the identified current user location describes noise from a car, a pub, a café, pink noise, or clean speech. 7. The system of claim 1 , wherein the processor is further programmed to: calculate a mean and a variance of the determined feature vectors; and compare the calculated mean and the variance with the classification data to identify the noise model for the current user location. 8. The system of claim 1 , wherein the processor is further programmed to: inversely transform the outputted enhanced speech signal; and revise the classification data for the identified current user location based on the inversely transformed enhanced speech signal. 9. The system of claim 1 , wherein the processor is further programmed to identify the current user location by considering data selected from a group consisting of data from a gyroscope, data from back end speech recognition, or speaker-speech characteristics. 10. The system of claim 1 , wherein the processor is programmed to create the transform of the received noisy signal in a frequency domain, and wherein the processor is programmed to determine the feature vectors of the received noisy signal by computing Mel-Scale frequency cepstral coefficients (MFCC). 11. A method comprising: transforming a noisy signal received by a computing device; determining feature vectors of the received noisy signal; accessing classification data corresponding to a plurality of locations, the classification data for each of the locations including a noise model associated therewith; comparing the determined feature vectors with the accessed classification data to identify a current user location; based on the noise model associated with the identified current user location, estimating a noise level and a speech level for the transformed noisy signal; estimating a speech presence probability based on the noise model, the estimated noise level, and the estimated speech level for the transformed noisy signal; and based on the estimated noise level, the speech level, and the speech presence probability and the noise model associated with the identified current user location, reducing a noise signal from the transformed noisy signal to output an enhanced signal. 12. The method of claim 11 , further comprising confirming the identified current user location by identifying a most frequently identified current user location over a predefined time period. 13. The method of claim 11 , wherein a memory area associated with the computing device stores a signal-to-noise ratio (SNR) histogram for each of the locations and the identification of the current user location is performed by: calculating a SNR histogram for the received noisy signal; comparing the calculated SNR histogram with the SNR histograms for the plurality of locations; and identifying the current user location based on the comparison of the calculated SNR histogram with the SNR histograms for the plurality of locations. 14. The method of claim 11 , further comprising: applying a known speech signal to the plurality of locations; and upon applying the known speech signal, characterizing each of the plurality of locations using machine learning to define at least a portion of the classification data. 15. The method of claim 11 , wherein the noise model enables estimation of the noise level, the speech level, and the speech presence probability based on thresholds. 16. The method of claim 11 , wherein transforming the noisy signal comprises transforming the noisy signal in a frequency domain, and wherein determining the feature vectors of the received noisy signal comprises computing Mel-Scale frequency cepstral coefficients (MFCC). 17. A computer storage media storing computer executable components executable by a processor associated with a computing device, said components comprising: a transformation component that when executed by at least one processor causes the at least one processor to transform a noisy signal received by a computing device; a determination component that when executed by at least one processor causes the at least one processor to determine feature vectors of the received noisy signal; a classification component that when executed by at least one processor causes the at least one processor to access classification data corresponding to a plurality locations, the classification data for each location including a noise model associated therewith; an identification component that when executed by at least one processor causes the at least one processor to identify a current user location of the computing device based on a comparison of the feature vectors determined by the determination component with the classification data accessed by the classification component; an estimation component that when executed by at least one processor causes the at least one processor to: based on the noise model associated with the current user location identified by the identification component, estimate a noise level and a speech level for the transformed noisy signal; estimate a speech presence probability based on the noise model, the
the extracted parameters being the cepstrum · CPC title
Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title
characterised by the method used for estimating noise · CPC title
for comparison or discrimination · CPC title
for discriminating voice from noise · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.