Environment adaptive speech recognition method and device

US9870771B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9870771-B2
Application numberUS-201615149599-A
CountryUS
Kind codeB2
Filing dateMay 9, 2016
Priority dateNov 14, 2013
Publication dateJan 16, 2018
Grant dateJan 16, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition method, a speech recognition device, and an electronic device. In this method, first determining is performed by using a sample environment corresponding to a detection speech and a previous environment type, so as to output a corresponding speech correction instruction to a speech engine; then, a to-be-recognized speech is input to the speech engine and a noise type detection engine at the same time, and the speech engine corrects the to-be-recognized speech by using the speech correction instruction, so that quality of an original speech is not impaired by noise processing, and a corresponding initial recognition result is output; the noise type detection engine determines a current environment type by using the to-be-recognized speech and a speech training sample under a different environment; finally, confidence of the initial recognition result is adjusted by using the current environment type.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition method, comprising: receiving, by a speech recognition device, an input speech, wherein the speech recognition device comprises a noise type detection engine, a storage area and a speech engine; dividing the input speech, by the speech recognition device, into detection speech at a beginning of the input speech and a to-be-recognized speech following the detection speech, wherein a length of speech data comprised in the detection speech is less than a length of speech data comprised in the to-be-recognized speech; selecting, by the noise type detection engine based on comparing the detection speech with a plurality of speech training samples under a plurality of different sample environments, a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the detection speech, as a detection environment type, wherein the plurality of sample environments comprises a quiet environment and a noise environment; detecting, by the speech recognition device, a storage area; outputting, by the speech recognition device, when a recognizable previous environment type exists in the storage area, a speech correction instruction according to a result of comparison between the detection environment type and the previous environment type, wherein the previous environment type comprises a quiet environment or a noise environment; controlling, by the speech engine according to the speech correction instruction, correction on the to-be-recognized speech, and outputting an initial recognition result; separately comparing, by the noise type detection engine, the received to-be-recognized speech with the plurality of the speech training samples, and selecting a sample environment corresponding to a speech training sample among the plurality of speech training samples that has a minimum difference with the to-be-recognized speech, as a current environment type; storing, by the speech recognition device, the current environment type to the storage area, and abandoning the current environment type after a preset duration; and outputting, by the speech recognition device, a final recognition result after a confidence value of the initial recognition result is adjusted according to the current environment type. 2. The method according to claim 1 , wherein when the previous environment type is not recognized in the storage area, the method further comprises: acquiring, by the speech recognition device, a pre-stored initial environment type, wherein the initial environment type comprises a quiet environment or a noise environment; and determining, by the speech recognition device, according to the initial environment type and the detection environment type, and outputting the speech correction instruction. 3. The method according to claim 2 , wherein the determining, by the speech recognition device, according to the initial environment type and the detection environment type, and outputting the speech correction instruction comprises: determining, by the speech recognition device, whether the initial environment type is the same as the detection environment type; if the initial environment type is the same as the detection environment type, outputting by the speech recognition device, when both the initial environment type and the detection environment type are noise environments, a speech correction instruction used for speech quality enhancement, and outputting by the speech recognition device, when both the initial environment type and the detection environment type are quiet environments, a speech correction instruction used for disabling noise reduction processing; and if the initial environment type is not the same as the detection environment type, outputting by the speech recognition device, when the initial environment type is a noise environment, a speech correction instruction used for speech quality enhancement, and outputting by the speech recognition device, when the initial environment type is a quiet environment, a speech correction instruction used for disabling noise reduction processing. 4. The method according to claim 1 , wherein the outputting by the speech recognition device, when a recognizable previous environment type exists in the storage area, a speech correction instruction according to a result of comparison between the detection environment type and the previous environment type comprises: acquiring the previous environment type and effective impact duration T of the previous environment type on the input speech; calculating a time difference t between time for inputting the detection speech and time for previously inputting a speech, and an impact value w(t) of the previous environment type on the detection environment type, wherein w(t) is a truncation function that decays with time t, a value of w(t) is obtained by training sample data of a speech training sample under a different sample environment, and values of t and T are positive integers; determining a balance relationship between the previous environment type and the detection environment type; outputting, when both the previous environment type and the detection environment type are noise environments, a speech correction instruction used for speech quality enhancement; outputting, when both the previous environment type and the detection environment type are quiet environments, a speech correction instruction used for disabling noise reduction processing; outputting, when the previous environment type is a noise environment, the detection environment type is a quiet environment, and w(t)>=0.5, a speech correction instruction used for speech quality enhancement; outputting, when the previous environment type is a noise environment, the detection environment type is a quiet environment, and w(t)<0.5, a speech correction instruction used for disabling noise reduction processing; and when the w(t)>T, outputting, when the detection environment type is a quiet environment, a speech correction instruction used for disabling noise reduction processing, and outputting, when the detection environment is a noise environment, a speech correction instruction used for speech quality enhancement. 5. The method according to claim 1 , wherein the separately comparing, by the noise type detection engine, the received to-be-recognized speech with a speech training sample under a different sample environment, and selecting a sample environment corresponding to a speech training sample that has a minimum difference with the to-be-recognized speech, as a current environment type comprises: analyzing, by the noise type detection engine, a speech frame part and a noise frame part of the received to-be-recognized speech to acquire a noise level, a speech level, and a signal-to-noise ratio (SNR) of the to-be-recognized speech; comparing the noise level, the speech level, and the SNR of the to-be-recognized speech with a noise training level, a speech training level, and a training SNR of a speech training sample under a different sample environment, respectively; and determining that a sample environment corresponding to a noise training level that has a minimum difference with the noise level, a speech training level that has a minimum difference with the speech level, and a training SNR that has a minimum difference with the SNR is the current environment type. 6. The method according to claim 1 , wherein the preset duration when the current environment type is a quiet environment is longer than the preset duration when the current environment type is a noise environment. 7. The method according to claim 1 , wherein the noise environment comprises: a vehicle-mounted low noise environment,

Assignees

Inventors

Classifications

  • characterised by the method used for estimating noise · CPC title

  • G10L15/20Primary

    Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Processing in the time domain · CPC title

  • Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title

  • Circuits for transducers (arrangements for producing a reverberation or echo sound G10K15/08; amplifiers H03F) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9870771B2 cover?
A speech recognition method, a speech recognition device, and an electronic device. In this method, first determining is performed by using a sample environment corresponding to a detection speech and a previous environment type, so as to output a corresponding speech correction instruction to a speech engine; then, a to-be-recognized speech is input to the speech engine and a noise type detect…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 16 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).