Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
US-2022013111-A1 · Jan 13, 2022 · US
US11848008B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11848008-B2 |
| Application number | US-202117483617-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 23, 2021 |
| Priority date | Nov 14, 2019 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.
Opening claim text (preview).
What is claimed is: 1. An artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device, the method comprising: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user, the pronunciation dictionary comprising pronunciations respectively corresponding to a plurality of text elements, and the syllable combination sequence being an ordered combination of a plurality of syllables corresponding to a plurality of text elements of the wakeup word text; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers, the DNN model comprising the same quantity of syllable output units as syllables of the pronunciation dictionary; determining a target probability vector from the posterior probability vectors according to the syllable combination sequence, the target probability vector comprising posterior probability values that are determined according to the posterior probability vectors and that correspond to the text elements in the wakeup word text; and calculating a confidence according to the target probability vector, and determining that the speech frames comprise the wakeup word text when the confidence is greater than or equal to a threshold. 2. The AI-based wakeup word detection method according to claim 1 , wherein the calculating a confidence according to the target probability vector comprises: performing probability processing on the posterior probability values comprised in the target probability vector; determining whether the wakeup word text comprises a polyphonic character according to a mapping relationship between syllable identifiers comprised in the syllable combination sequence and characters comprised in the wakeup word text; and calculating the confidence according to the target probability vector after the probability processing when the wakeup word text comprises no polyphonic character. 3. The AI-based wakeup word detection method according to claim 2 , wherein the calculating a confidence according to the target probability vector further comprises: performing, when the wakeup word text comprises a polyphonic character, summation on the target probability vector after the probability processing according to a correspondence of the polyphonic character; and calculating the confidence according to the target probability vector after the summation. 4. The AI-based wakeup word detection method according to claim 2 , wherein the performing probability processing on the posterior probability values comprised in the target probability vector comprises: setting the posterior probability values to 0 when the posterior probability values are lower than prior probability values corresponding to the posterior probability values; otherwise, skipping processing the posterior probability values; and dividing the posterior probability values after the processing by the corresponding prior probability values to obtain a processed target probability vector. 5. The AI-based wakeup word detection method according to claim 2 , wherein the calculating the confidence according to the target probability vector after the probability processing comprises: smoothing the target probability vector after the probability processing; and calculating the confidence according to the target probability vector after the smoothing. 6. The AI-based wakeup word detection method according to claim 1 , wherein the constructing at least one syllable combination sequence comprises: obtaining the self-defined wakeup word text inputted by the user; converting all characters comprised in the self-defined wakeup word text into the syllable identifiers by looking up the pronunciation dictionary; and constructing a mapping relationship between the syllable identifiers and the characters comprised in the self-defined wakeup word text, the mapping relationship being used as the syllable combination sequence. 7. The AI-based wakeup word detection method according to claim 1 , further comprising: obtaining a speech data set to be trained; annotating all speech data in the speech data set according to the syllables comprised in the pronunciation dictionary, to obtain a training data set; and training a DNN by using the training data set to obtain the DNN model, input of the DNN model being the speech features of the speech frames, and output of the syllable output units being the posterior probability values of the speech features corresponding to the syllable identifiers relative to the syllable output units. 8. A computing device, comprising a memory, a processor, and a plurality of computer programs stored in the memory that, when executed by the processor, cause the computing device to perform a plurality of operations including: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user, the pronunciation dictionary comprising pronunciations respectively corresponding to a plurality of text elements, and the syllable combination sequence being an ordered combination of a plurality of syllables corresponding to a plurality of text elements of the wakeup word text; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers, the DNN model comprising the same quantity of syllable output units as syllables of the pronunciation dictionary; determining a target probability vector from the posterior probability vectors according to the syllable combination sequence, the target probability vector comprising posterior probability values that are determined according to the posterior probability vectors and that correspond to the text elements in the wakeup word text; and calculating a confidence according to the target probability vector, and determining that the speech frames comprise the wakeup word text when the confidence is greater than or equal to a threshold. 9. The computing device according to claim 8 , wherein the calculating a confidence according to the target probability vector comprises: performing probability processing on the posterior probability values comprised in the target probability vector; determining whether the wakeup word text comprises a polyphonic character according to a mapping relationship between syllable identifiers comprised in the syllable combination sequence and characters comprised in the wakeup word text; and calculating the confidence according to the target probability vector after the probability processing when the wakeup word text comprises no polyphonic character. 10. The computing device according to claim 9 , wherein the calculating a confidence according to the target probability vector further comprises: performing, when the wakeup word text comprises a polyphonic character, summation on the target probability vector after the probability processing according to a correspondence of the polyphonic character; and calculating the confidence according to the target probability vector after the summation. 11. The computing device according to claim 9 , wherein the performing probability processing on the posterior probability values comprised in the target probabilit
using artificial neural networks · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Syllables being the recognition units · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.