Voice recognition method and apparatus

US10373609B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10373609-B2
Application numberUS-201715619252-A
CountryUS
Kind codeB2
Filing dateJun 9, 2017
Priority dateDec 29, 2016
Publication dateAug 6, 2019
Grant dateAug 6, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application discloses a voice recognition method and apparatus. A specific implementation of the method includes: in response to detecting a microphone receiving voice signal containing interfering sound signal, performing high-pass filtering on the voice signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result. This implementation improves the success rate of voice recognition.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice recognition method for a terminal device, the terminal device being equipped with a microphone, the method comprising: performing high-pass filtering on a voice signal, in response to detecting the microphone receiving the voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the method further comprises: pre-processing a pre-acquired training sample to generate a target training sample, the target training sample comprising a voice identifier; extracting the feature vector from the target training sample; and training, using a convolutional neural network, a deep neural network, and a restricted Boltzmann machine, and assigning the feature vector extracted from the target training sample as an input and the voice identifier as an output, to obtain the acoustic model. 2. The voice recognition method according to claim 1 , wherein the terminal device is further equipped with a loudspeaker, the interfering sound signal comprises an echo signal and a noise signal, and the echo signal is a sound signal sent by the loudspeaker and transmitted to the microphone. 3. The voice recognition method according to claim 2 , wherein the cancelling the interfering sound signal in the voice signal to obtain target voice signal comprises: performing adaptive filtering on the voice signal subjected to high-pass filtering using a time delay estimation algorithm, to cancel the echo signal; and cancelling the noise signal in the voice signal subjected to adaptive filtering using a noise suppression algorithm. 4. The voice recognition method according to claim 1 , wherein the pre-processing a pre-acquired training sample to generate a target training sample comprises: performing high-pass filtering on the pre-acquired training sample; performing sequentially echo cancellation and noise suppression on the training sample subjected to high-pass filtering; and performing automatic gain control on the training sample subjected to noise suppression, to generate the target training sample. 5. The voice recognition method according to claim 1 , wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving a voice signal, the method further comprises: clustering the voice identifier outputted by the acoustic model using a clustering algorithm, and determining voice identifier obtained after clustering as a voice recognition result matching the training sample. 6. A voice recognition apparatus for a terminal device, the terminal device being equipped with a microphone, the apparatus comprising: at least one processor; and a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: performing high-pass filtering on the voice signal, in response to detecting the microphone receiving voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and input the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the operations further comprise: pre-processing a pre-acquired training sample to generate a target training sample, the target training sample comprising a voice identifier; extracting the feature vector from the target training sample; and training, using a convolutional neural network, a deep neural network, and a restricted Boltzmann machine, and assigning the feature vector extracted from the target training sample as an input and the voice identifier as an output, to obtain the acoustic model. 7. The voice recognition apparatus according to claim 6 , wherein the terminal device is further equipped with a loudspeaker, the interfering sound signal comprises an echo signal and a noise signal, and the echo signal is a sound signal sent by the loudspeaker and transmitted to the microphone. 8. The voice recognition apparatus according to claim 7 , wherein the cancelling the interfering sound signal in the voice signal to obtain target voice signal comprises: performing adaptive filtering on the voice signal subjected to high-pass filtering using a time delay estimation algorithm, to cancel the echo signal; and cancelling the noise signal in the voice signal subjected to adaptive filtering using a noise suppression algorithm. 9. The voice recognition apparatus according to claim 6 , wherein the pre-processing a pre-acquired training sample to generate a target training sample comprises: performing high-pass filtering on the pre-acquired training sample; performing sequentially echo cancellation and noise suppression on the training sample subjected to high-pass filtering; and performing automatic gain control on the training sample subjected to noise suppression, to generate the target training sample. 10. The voice recognition apparatus according to claim 6 , wherein the before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving a voice signal, the operations further comprise: clustering the voice identifier outputted by the acoustic model using a clustering algorithm, and determine a voice identifier obtained after clustering as a voice recognition result matching the training sample. 11. A non-transitory computer storage medium storing a computer program, which when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising: performing high-pass filtering on a voice signal, in response to detecting the microphone receiving the voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the operations further comprise: pre-processing a pre-acquired training sample to generate a target trai

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • the noise being echo, reverberation of the speech · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10373609B2 cover?
The present application discloses a voice recognition method and apparatus. A specific implementation of the method includes: in response to detecting a microphone receiving voice signal containing interfering sound signal, performing high-pass filtering on the voice signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain c…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).