Artificial intelligence-based wakeup word detection method and apparatus, device, and medium

US11848008B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11848008-B2
Application numberUS-202117483617-A
CountryUS
Kind codeB2
Filing dateSep 23, 2021
Priority dateNov 14, 2019
Publication dateDec 19, 2023
Grant dateDec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. An artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device, the method comprising: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user, the pronunciation dictionary comprising pronunciations respectively corresponding to a plurality of text elements, and the syllable combination sequence being an ordered combination of a plurality of syllables corresponding to a plurality of text elements of the wakeup word text; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers, the DNN model comprising the same quantity of syllable output units as syllables of the pronunciation dictionary; determining a target probability vector from the posterior probability vectors according to the syllable combination sequence, the target probability vector comprising posterior probability values that are determined according to the posterior probability vectors and that correspond to the text elements in the wakeup word text; and calculating a confidence according to the target probability vector, and determining that the speech frames comprise the wakeup word text when the confidence is greater than or equal to a threshold. 2. The AI-based wakeup word detection method according to claim 1 , wherein the calculating a confidence according to the target probability vector comprises: performing probability processing on the posterior probability values comprised in the target probability vector; determining whether the wakeup word text comprises a polyphonic character according to a mapping relationship between syllable identifiers comprised in the syllable combination sequence and characters comprised in the wakeup word text; and calculating the confidence according to the target probability vector after the probability processing when the wakeup word text comprises no polyphonic character. 3. The AI-based wakeup word detection method according to claim 2 , wherein the calculating a confidence according to the target probability vector further comprises: performing, when the wakeup word text comprises a polyphonic character, summation on the target probability vector after the probability processing according to a correspondence of the polyphonic character; and calculating the confidence according to the target probability vector after the summation. 4. The AI-based wakeup word detection method according to claim 2 , wherein the performing probability processing on the posterior probability values comprised in the target probability vector comprises: setting the posterior probability values to 0 when the posterior probability values are lower than prior probability values corresponding to the posterior probability values; otherwise, skipping processing the posterior probability values; and dividing the posterior probability values after the processing by the corresponding prior probability values to obtain a processed target probability vector. 5. The AI-based wakeup word detection method according to claim 2 , wherein the calculating the confidence according to the target probability vector after the probability processing comprises: smoothing the target probability vector after the probability processing; and calculating the confidence according to the target probability vector after the smoothing. 6. The AI-based wakeup word detection method according to claim 1 , wherein the constructing at least one syllable combination sequence comprises: obtaining the self-defined wakeup word text inputted by the user; converting all characters comprised in the self-defined wakeup word text into the syllable identifiers by looking up the pronunciation dictionary; and constructing a mapping relationship between the syllable identifiers and the characters comprised in the self-defined wakeup word text, the mapping relationship being used as the syllable combination sequence. 7. The AI-based wakeup word detection method according to claim 1 , further comprising: obtaining a speech data set to be trained; annotating all speech data in the speech data set according to the syllables comprised in the pronunciation dictionary, to obtain a training data set; and training a DNN by using the training data set to obtain the DNN model, input of the DNN model being the speech features of the speech frames, and output of the syllable output units being the posterior probability values of the speech features corresponding to the syllable identifiers relative to the syllable output units. 8. A computing device, comprising a memory, a processor, and a plurality of computer programs stored in the memory that, when executed by the processor, cause the computing device to perform a plurality of operations including: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user, the pronunciation dictionary comprising pronunciations respectively corresponding to a plurality of text elements, and the syllable combination sequence being an ordered combination of a plurality of syllables corresponding to a plurality of text elements of the wakeup word text; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers, the DNN model comprising the same quantity of syllable output units as syllables of the pronunciation dictionary; determining a target probability vector from the posterior probability vectors according to the syllable combination sequence, the target probability vector comprising posterior probability values that are determined according to the posterior probability vectors and that correspond to the text elements in the wakeup word text; and calculating a confidence according to the target probability vector, and determining that the speech frames comprise the wakeup word text when the confidence is greater than or equal to a threshold. 9. The computing device according to claim 8 , wherein the calculating a confidence according to the target probability vector comprises: performing probability processing on the posterior probability values comprised in the target probability vector; determining whether the wakeup word text comprises a polyphonic character according to a mapping relationship between syllable identifiers comprised in the syllable combination sequence and characters comprised in the wakeup word text; and calculating the confidence according to the target probability vector after the probability processing when the wakeup word text comprises no polyphonic character. 10. The computing device according to claim 9 , wherein the calculating a confidence according to the target probability vector further comprises: performing, when the wakeup word text comprises a polyphonic character, summation on the target probability vector after the probability processing according to a correspondence of the polyphonic character; and calculating the confidence according to the target probability vector after the summation. 11. The computing device according to claim 9 , wherein the performing probability processing on the posterior probability values comprised in the target probabilit

Assignees

Inventors

Classifications

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Syllables being the recognition units · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11848008B2 cover?
This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).