Universal reconfigurable echo cancellation system
US-2015126255-A1 · May 7, 2015 · US
US10373609B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10373609-B2 |
| Application number | US-201715619252-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 9, 2017 |
| Priority date | Dec 29, 2016 |
| Publication date | Aug 6, 2019 |
| Grant date | Aug 6, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application discloses a voice recognition method and apparatus. A specific implementation of the method includes: in response to detecting a microphone receiving voice signal containing interfering sound signal, performing high-pass filtering on the voice signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result. This implementation improves the success rate of voice recognition.
Opening claim text (preview).
What is claimed is: 1. A voice recognition method for a terminal device, the terminal device being equipped with a microphone, the method comprising: performing high-pass filtering on a voice signal, in response to detecting the microphone receiving the voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the method further comprises: pre-processing a pre-acquired training sample to generate a target training sample, the target training sample comprising a voice identifier; extracting the feature vector from the target training sample; and training, using a convolutional neural network, a deep neural network, and a restricted Boltzmann machine, and assigning the feature vector extracted from the target training sample as an input and the voice identifier as an output, to obtain the acoustic model. 2. The voice recognition method according to claim 1 , wherein the terminal device is further equipped with a loudspeaker, the interfering sound signal comprises an echo signal and a noise signal, and the echo signal is a sound signal sent by the loudspeaker and transmitted to the microphone. 3. The voice recognition method according to claim 2 , wherein the cancelling the interfering sound signal in the voice signal to obtain target voice signal comprises: performing adaptive filtering on the voice signal subjected to high-pass filtering using a time delay estimation algorithm, to cancel the echo signal; and cancelling the noise signal in the voice signal subjected to adaptive filtering using a noise suppression algorithm. 4. The voice recognition method according to claim 1 , wherein the pre-processing a pre-acquired training sample to generate a target training sample comprises: performing high-pass filtering on the pre-acquired training sample; performing sequentially echo cancellation and noise suppression on the training sample subjected to high-pass filtering; and performing automatic gain control on the training sample subjected to noise suppression, to generate the target training sample. 5. The voice recognition method according to claim 1 , wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving a voice signal, the method further comprises: clustering the voice identifier outputted by the acoustic model using a clustering algorithm, and determining voice identifier obtained after clustering as a voice recognition result matching the training sample. 6. A voice recognition apparatus for a terminal device, the terminal device being equipped with a microphone, the apparatus comprising: at least one processor; and a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: performing high-pass filtering on the voice signal, in response to detecting the microphone receiving voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and input the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the operations further comprise: pre-processing a pre-acquired training sample to generate a target training sample, the target training sample comprising a voice identifier; extracting the feature vector from the target training sample; and training, using a convolutional neural network, a deep neural network, and a restricted Boltzmann machine, and assigning the feature vector extracted from the target training sample as an input and the voice identifier as an output, to obtain the acoustic model. 7. The voice recognition apparatus according to claim 6 , wherein the terminal device is further equipped with a loudspeaker, the interfering sound signal comprises an echo signal and a noise signal, and the echo signal is a sound signal sent by the loudspeaker and transmitted to the microphone. 8. The voice recognition apparatus according to claim 7 , wherein the cancelling the interfering sound signal in the voice signal to obtain target voice signal comprises: performing adaptive filtering on the voice signal subjected to high-pass filtering using a time delay estimation algorithm, to cancel the echo signal; and cancelling the noise signal in the voice signal subjected to adaptive filtering using a noise suppression algorithm. 9. The voice recognition apparatus according to claim 6 , wherein the pre-processing a pre-acquired training sample to generate a target training sample comprises: performing high-pass filtering on the pre-acquired training sample; performing sequentially echo cancellation and noise suppression on the training sample subjected to high-pass filtering; and performing automatic gain control on the training sample subjected to noise suppression, to generate the target training sample. 10. The voice recognition apparatus according to claim 6 , wherein the before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving a voice signal, the operations further comprise: clustering the voice identifier outputted by the acoustic model using a clustering algorithm, and determine a voice identifier obtained after clustering as a voice recognition result matching the training sample. 11. A non-transitory computer storage medium storing a computer program, which when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising: performing high-pass filtering on a voice signal, in response to detecting the microphone receiving the voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the operations further comprise: pre-processing a pre-acquired training sample to generate a target trai
Training · CPC title
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
the noise being echo, reverberation of the speech · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
using artificial neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.