Who is the assignee on this patent?

Tencent Tech Shenzhen Co Ltd, Tencent Tech Shenzhen Company Ltd

What technology area does this patent fall under?

Primary CPC classification G10L15/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for speech recognition, and electronic device

US11217229B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11217229-B2
Application number	US-202016921537-A
Country	US
Kind code	B2
Filing date	Jul 6, 2020
Priority date	Jun 28, 2018
Publication date	Jan 4, 2022
Grant date	Jan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition method, apparatus, a computer device and an electronic device for recognizing speech. The method includes receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; performing a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and determining a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition method, performed by an electronic device, the method comprising: receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals by a plurality of beamformers; performing a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and determining a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals, wherein the performing the speech recognition on each of the plurality of beam signals further comprises: respectively inputting the plurality of beam signals into corresponding speech recognition models; and performing the speech recognition on the plurality of beam signals using the speech recognition models in parallel to obtain the plurality of speech recognition results of the plurality of beam signals, and wherein the plurality of beamformers are divided into one or more groups, each of the one or more groups corresponding to each of the speech recognition models. 2. The method according to claim 1 , wherein the speech recognition result comprises a keyword detection result, and wherein the determining the speech recognition result of the audio signal further comprises determining a keyword detection result of the audio signal based on a plurality of keyword detection results corresponding to the plurality of beam signals. 3. The method according to claim 2 , wherein the determining the keyword detection result of the audio signal further comprises, based on detecting a keyword in any one of the plurality of beam signals, determining that the keyword is detected in the audio signal. 4. The method according to claim 2 , wherein the keyword detection result comprises a keyword detection probability, and wherein the determining the keyword detection result further comprises, based on at least one beam signal among the plurality of beam signals being greater than a preset value, determining that the keyword is detected in the audio signal. 5. The method according to claim 2 , wherein the keyword detection result comprises a keyword detection probability, and wherein the determining the keyword detection result of the audio signal further comprises inputting a plurality of keyword detection probabilities of the plurality of beam signals into a classifier, and determining whether the audio signal includes the keyword based on an output of the classifier. 6. The method according to claim 1 , wherein the determining the speech recognition result of the audio signal further comprises: obtaining at least one of linguistic scores or acoustic scores of the plurality of speech recognition results; and determining one of the plurality of speech recognition results having the highest linguistic score or the highest acoustic score as the speech recognition result of the audio signal. 7. The method according to claim 1 , wherein the method further comprises performing a suppression processing on an echo of a second audio signal outputted by a speech recognition device. 8. A speech recognition apparatus, comprising: at least one memory storing computer program code; and at least one processor configured to access the at least one memory and operate as instructed by the computer program code, the computer program code comprising: audio signal receiving code configured to cause the at least one processor to receive an audio signal obtained by a microphone array by a plurality of beamformers; beamformer code configured to cause the at least one processor to respectively perform a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; speech recognition code configured to cause the at least one processor to perform a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and processing code configured to cause the at least one processor to determine a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals, wherein the speech recognition code is further configured to cause the at least one processor to: respectively input the plurality of beam signals into corresponding speech recognition models; and perform the speech recognition on the plurality of beam signals using the speech recognition models in parallel to obtain the plurality of speech recognition results of the plurality of beam signals, and wherein the plurality of beamformers are divided into one or more groups, each of the one or more groups corresponding to each of the speech recognition models. 9. The speech recognition apparatus according claim 8 , wherein the processing code is further configured to cause the at least one processor to determine a keyword detection result of the audio signal based on a plurality of keyword detection results corresponding to the plurality of beam signals. 10. The speech recognition apparatus according to claim 9 , wherein the processing code is further configured to cause the at least one processor to, based on detecting a keyword in any one of the plurality of beam signals, determine that the keyword is detected in the audio signal. 11. The speech recognition apparatus according to claim 9 , wherein the processing code is further configured to cause the at least one processor to, based on at least one beam signal among the plurality of beam signals being greater than a preset value, determine that the keyword is detected in the audio signal. 12. The speech recognition apparatus according to claim 8 , wherein the speech recognition code is further configured to cause the at least one processor to: obtain at least one of linguistic scores or acoustic scores of the plurality of speech recognition results; and determine one of the plurality of speech recognition results having the highest linguistic score or the highest acoustic score as the speech recognition result of the audio signal. 13. The speech recognition apparatus according to claim 8 , wherein the microphone array comprises at least two annular structures, and wherein the apparatus further comprises a housing encapsulating the microphone array and the at least one processor. 14. The speech recognition apparatus according to claim 13 , wherein at least three microphones are uniformly disposed on each annular structure. 15. The speech recognition apparatus according to claim 13 , wherein the annular structures are concentric circles. 16. The speech recognition apparatus according to claim 15 , wherein a first microphone and a second microphone on two adjacent annular structures are respectively disposed in the same directions. 17. The speech recognition apparatus according to claim 15 , wherein a first microphone in a first annular structure and a second microphone in a second annular structure are disposed at an angle. 18. The method according to claim 5 , wherein the classifier comprises at least one of a neural network, a support vector machine (SVM), or a decision tree. 19. A non-transitory computer-readable storage medium storing programming code, said programming code configured to cause at least one processor to: receive an audio signal obtained by a microphone array by a pluralit

Assignees

Inventors

Classifications

H04R5/027
Spatial or constructional arrangements of microphones, e.g. in dummy heads · CPC title
G10L21/0208
Noise filtering · CPC title
G10L15/08Primary
Speech classification or search · CPC title
H04R3/005
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
H04R3/04
for correcting frequency response · CPC title

Patent family

Related publications grouped by family.

View patent family 67645021

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11217229B2 cover?: A speech recognition method, apparatus, a computer device and an electronic device for recognizing speech. The method includes receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; performing a speech recognition on each of the plurality of beam signals to obtai…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd, Tencent Tech Shenzhen Company Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Utterance classifier

Concentric circular differential microphone arrays and associated beamforming

Speech recognizer with multi-directional decoding

Frequently asked questions