Who is the assignee on this patent?

Baidu online network technology beijing co ltd

What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Voice recognition method and apparatus

US10373609B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10373609-B2
Application number	US-201715619252-A
Country	US
Kind code	B2
Filing date	Jun 9, 2017
Priority date	Dec 29, 2016
Publication date	Aug 6, 2019
Grant date	Aug 6, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application discloses a voice recognition method and apparatus. A specific implementation of the method includes: in response to detecting a microphone receiving voice signal containing interfering sound signal, performing high-pass filtering on the voice signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result. This implementation improves the success rate of voice recognition.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice recognition method for a terminal device, the terminal device being equipped with a microphone, the method comprising: performing high-pass filtering on a voice signal, in response to detecting the microphone receiving the voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the method further comprises: pre-processing a pre-acquired training sample to generate a target training sample, the target training sample comprising a voice identifier; extracting the feature vector from the target training sample; and training, using a convolutional neural network, a deep neural network, and a restricted Boltzmann machine, and assigning the feature vector extracted from the target training sample as an input and the voice identifier as an output, to obtain the acoustic model. 2. The voice recognition method according to claim 1 , wherein the terminal device is further equipped with a loudspeaker, the interfering sound signal comprises an echo signal and a noise signal, and the echo signal is a sound signal sent by the loudspeaker and transmitted to the microphone. 3. The voice recognition method according to claim 2 , wherein the cancelling the interfering sound signal in the voice signal to obtain target voice signal comprises: performing adaptive filtering on the voice signal subjected to high-pass filtering using a time delay estimation algorithm, to cancel the echo signal; and cancelling the noise signal in the voice signal subjected to adaptive filtering using a noise suppression algorithm. 4. The voice recognition method according to claim 1 , wherein the pre-processing a pre-acquired training sample to generate a target training sample comprises: performing high-pass filtering on the pre-acquired training sample; performing sequentially echo cancellation and noise suppression on the training sample subjected to high-pass filtering; and performing automatic gain control on the training sample subjected to noise suppression, to generate the target training sample. 5. The voice recognition method according to claim 1 , wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving a voice signal, the method further comprises: clustering the voice identifier outputted by the acoustic model using a clustering algorithm, and determining voice identifier obtained after clustering as a voice recognition result matching the training sample. 6. A voice recognition apparatus for a terminal device, the terminal device being equipped with a microphone, the apparatus comprising: at least one processor; and a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: performing high-pass filtering on the voice signal, in response to detecting the microphone receiving voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and input the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the operations further comprise: pre-processing a pre-acquired training sample to generate a target training sample, the target training sample comprising a voice identifier; extracting the feature vector from the target training sample; and training, using a convolutional neural network, a deep neural network, and a restricted Boltzmann machine, and assigning the feature vector extracted from the target training sample as an input and the voice identifier as an output, to obtain the acoustic model. 7. The voice recognition apparatus according to claim 6 , wherein the terminal device is further equipped with a loudspeaker, the interfering sound signal comprises an echo signal and a noise signal, and the echo signal is a sound signal sent by the loudspeaker and transmitted to the microphone. 8. The voice recognition apparatus according to claim 7 , wherein the cancelling the interfering sound signal in the voice signal to obtain target voice signal comprises: performing adaptive filtering on the voice signal subjected to high-pass filtering using a time delay estimation algorithm, to cancel the echo signal; and cancelling the noise signal in the voice signal subjected to adaptive filtering using a noise suppression algorithm. 9. The voice recognition apparatus according to claim 6 , wherein the pre-processing a pre-acquired training sample to generate a target training sample comprises: performing high-pass filtering on the pre-acquired training sample; performing sequentially echo cancellation and noise suppression on the training sample subjected to high-pass filtering; and performing automatic gain control on the training sample subjected to noise suppression, to generate the target training sample. 10. The voice recognition apparatus according to claim 6 , wherein the before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving a voice signal, the operations further comprise: clustering the voice identifier outputted by the acoustic model using a clustering algorithm, and determine a voice identifier obtained after clustering as a voice recognition result matching the training sample. 11. A non-transitory computer storage medium storing a computer program, which when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising: performing high-pass filtering on a voice signal, in response to detecting the microphone receiving the voice signal containing interfering sound signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain control on the voice signal subjected to cancelling the interfering sound signal, to obtain target voice signal; and extracting a feature vector from the target voice signal and inputting the feature vector into a pre-trained acoustic model, to obtain a voice recognition result matching the target voice signal, the acoustic model being used for representing a corresponding relationship between the feature vector and the voice recognition result, wherein before the performing high-pass filtering on the voice signals, in response to detecting the microphone receiving the voice signal, the operations further comprise: pre-processing a pre-acquired training sample to generate a target trai

Assignees

Baidu online network technology beijing co ltd

Inventors

Classifications

G10L15/063Primary
Training · CPC title
G10L15/20
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
G10L2021/02082
the noise being echo, reverberation of the speech · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L15/16
using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 58928571

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10373609B2 cover?: The present application discloses a voice recognition method and apparatus. A specific implementation of the method includes: in response to detecting a microphone receiving voice signal containing interfering sound signal, performing high-pass filtering on the voice signal; cancelling the interfering sound signal in the voice signal subjected to high-pass filtering; performing automatic gain c…
Who is the assignee on this patent?: Baidu online network technology beijing co ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).