Method and apparatus for speech recognition, and electronic device

US11217229B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11217229-B2
Application numberUS-202016921537-A
CountryUS
Kind codeB2
Filing dateJul 6, 2020
Priority dateJun 28, 2018
Publication dateJan 4, 2022
Grant dateJan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition method, apparatus, a computer device and an electronic device for recognizing speech. The method includes receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; performing a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and determining a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition method, performed by an electronic device, the method comprising: receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals by a plurality of beamformers; performing a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and determining a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals, wherein the performing the speech recognition on each of the plurality of beam signals further comprises: respectively inputting the plurality of beam signals into corresponding speech recognition models; and performing the speech recognition on the plurality of beam signals using the speech recognition models in parallel to obtain the plurality of speech recognition results of the plurality of beam signals, and wherein the plurality of beamformers are divided into one or more groups, each of the one or more groups corresponding to each of the speech recognition models. 2. The method according to claim 1 , wherein the speech recognition result comprises a keyword detection result, and wherein the determining the speech recognition result of the audio signal further comprises determining a keyword detection result of the audio signal based on a plurality of keyword detection results corresponding to the plurality of beam signals. 3. The method according to claim 2 , wherein the determining the keyword detection result of the audio signal further comprises, based on detecting a keyword in any one of the plurality of beam signals, determining that the keyword is detected in the audio signal. 4. The method according to claim 2 , wherein the keyword detection result comprises a keyword detection probability, and wherein the determining the keyword detection result further comprises, based on at least one beam signal among the plurality of beam signals being greater than a preset value, determining that the keyword is detected in the audio signal. 5. The method according to claim 2 , wherein the keyword detection result comprises a keyword detection probability, and wherein the determining the keyword detection result of the audio signal further comprises inputting a plurality of keyword detection probabilities of the plurality of beam signals into a classifier, and determining whether the audio signal includes the keyword based on an output of the classifier. 6. The method according to claim 1 , wherein the determining the speech recognition result of the audio signal further comprises: obtaining at least one of linguistic scores or acoustic scores of the plurality of speech recognition results; and determining one of the plurality of speech recognition results having the highest linguistic score or the highest acoustic score as the speech recognition result of the audio signal. 7. The method according to claim 1 , wherein the method further comprises performing a suppression processing on an echo of a second audio signal outputted by a speech recognition device. 8. A speech recognition apparatus, comprising: at least one memory storing computer program code; and at least one processor configured to access the at least one memory and operate as instructed by the computer program code, the computer program code comprising: audio signal receiving code configured to cause the at least one processor to receive an audio signal obtained by a microphone array by a plurality of beamformers; beamformer code configured to cause the at least one processor to respectively perform a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; speech recognition code configured to cause the at least one processor to perform a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and processing code configured to cause the at least one processor to determine a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals, wherein the speech recognition code is further configured to cause the at least one processor to: respectively input the plurality of beam signals into corresponding speech recognition models; and perform the speech recognition on the plurality of beam signals using the speech recognition models in parallel to obtain the plurality of speech recognition results of the plurality of beam signals, and wherein the plurality of beamformers are divided into one or more groups, each of the one or more groups corresponding to each of the speech recognition models. 9. The speech recognition apparatus according claim 8 , wherein the processing code is further configured to cause the at least one processor to determine a keyword detection result of the audio signal based on a plurality of keyword detection results corresponding to the plurality of beam signals. 10. The speech recognition apparatus according to claim 9 , wherein the processing code is further configured to cause the at least one processor to, based on detecting a keyword in any one of the plurality of beam signals, determine that the keyword is detected in the audio signal. 11. The speech recognition apparatus according to claim 9 , wherein the processing code is further configured to cause the at least one processor to, based on at least one beam signal among the plurality of beam signals being greater than a preset value, determine that the keyword is detected in the audio signal. 12. The speech recognition apparatus according to claim 8 , wherein the speech recognition code is further configured to cause the at least one processor to: obtain at least one of linguistic scores or acoustic scores of the plurality of speech recognition results; and determine one of the plurality of speech recognition results having the highest linguistic score or the highest acoustic score as the speech recognition result of the audio signal. 13. The speech recognition apparatus according to claim 8 , wherein the microphone array comprises at least two annular structures, and wherein the apparatus further comprises a housing encapsulating the microphone array and the at least one processor. 14. The speech recognition apparatus according to claim 13 , wherein at least three microphones are uniformly disposed on each annular structure. 15. The speech recognition apparatus according to claim 13 , wherein the annular structures are concentric circles. 16. The speech recognition apparatus according to claim 15 , wherein a first microphone and a second microphone on two adjacent annular structures are respectively disposed in the same directions. 17. The speech recognition apparatus according to claim 15 , wherein a first microphone in a first annular structure and a second microphone in a second annular structure are disposed at an angle. 18. The method according to claim 5 , wherein the classifier comprises at least one of a neural network, a support vector machine (SVM), or a decision tree. 19. A non-transitory computer-readable storage medium storing programming code, said programming code configured to cause at least one processor to: receive an audio signal obtained by a microphone array by a pluralit

Assignees

Inventors

Classifications

  • Spatial or constructional arrangements of microphones, e.g. in dummy heads · CPC title

  • Noise filtering · CPC title

  • G10L15/08Primary

    Speech classification or search · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • for correcting frequency response · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11217229B2 cover?
A speech recognition method, apparatus, a computer device and an electronic device for recognizing speech. The method includes receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; performing a speech recognition on each of the plurality of beam signals to obtai…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd, Tencent Tech Shenzhen Company Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).