What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Artificial intelligence-based wakeup word detection method and apparatus, device, and medium

US11848008B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11848008-B2
Application number	US-202117483617-A
Country	US
Kind code	B2
Filing date	Sep 23, 2021
Priority date	Nov 14, 2019
Publication date	Dec 19, 2023
Grant date	Dec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. An artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device, the method comprising: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user, the pronunciation dictionary comprising pronunciations respectively corresponding to a plurality of text elements, and the syllable combination sequence being an ordered combination of a plurality of syllables corresponding to a plurality of text elements of the wakeup word text; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers, the DNN model comprising the same quantity of syllable output units as syllables of the pronunciation dictionary; determining a target probability vector from the posterior probability vectors according to the syllable combination sequence, the target probability vector comprising posterior probability values that are determined according to the posterior probability vectors and that correspond to the text elements in the wakeup word text; and calculating a confidence according to the target probability vector, and determining that the speech frames comprise the wakeup word text when the confidence is greater than or equal to a threshold. 2. The AI-based wakeup word detection method according to claim 1 , wherein the calculating a confidence according to the target probability vector comprises: performing probability processing on the posterior probability values comprised in the target probability vector; determining whether the wakeup word text comprises a polyphonic character according to a mapping relationship between syllable identifiers comprised in the syllable combination sequence and characters comprised in the wakeup word text; and calculating the confidence according to the target probability vector after the probability processing when the wakeup word text comprises no polyphonic character. 3. The AI-based wakeup word detection method according to claim 2 , wherein the calculating a confidence according to the target probability vector further comprises: performing, when the wakeup word text comprises a polyphonic character, summation on the target probability vector after the probability processing according to a correspondence of the polyphonic character; and calculating the confidence according to the target probability vector after the summation. 4. The AI-based wakeup word detection method according to claim 2 , wherein the performing probability processing on the posterior probability values comprised in the target probability vector comprises: setting the posterior probability values to 0 when the posterior probability values are lower than prior probability values corresponding to the posterior probability values; otherwise, skipping processing the posterior probability values; and dividing the posterior probability values after the processing by the corresponding prior probability values to obtain a processed target probability vector. 5. The AI-based wakeup word detection method according to claim 2 , wherein the calculating the confidence according to the target probability vector after the probability processing comprises: smoothing the target probability vector after the probability processing; and calculating the confidence according to the target probability vector after the smoothing. 6. The AI-based wakeup word detection method according to claim 1 , wherein the constructing at least one syllable combination sequence comprises: obtaining the self-defined wakeup word text inputted by the user; converting all characters comprised in the self-defined wakeup word text into the syllable identifiers by looking up the pronunciation dictionary; and constructing a mapping relationship between the syllable identifiers and the characters comprised in the self-defined wakeup word text, the mapping relationship being used as the syllable combination sequence. 7. The AI-based wakeup word detection method according to claim 1 , further comprising: obtaining a speech data set to be trained; annotating all speech data in the speech data set according to the syllables comprised in the pronunciation dictionary, to obtain a training data set; and training a DNN by using the training data set to obtain the DNN model, input of the DNN model being the speech features of the speech frames, and output of the syllable output units being the posterior probability values of the speech features corresponding to the syllable identifiers relative to the syllable output units. 8. A computing device, comprising a memory, a processor, and a plurality of computer programs stored in the memory that, when executed by the processor, cause the computing device to perform a plurality of operations including: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user, the pronunciation dictionary comprising pronunciations respectively corresponding to a plurality of text elements, and the syllable combination sequence being an ordered combination of a plurality of syllables corresponding to a plurality of text elements of the wakeup word text; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers, the DNN model comprising the same quantity of syllable output units as syllables of the pronunciation dictionary; determining a target probability vector from the posterior probability vectors according to the syllable combination sequence, the target probability vector comprising posterior probability values that are determined according to the posterior probability vectors and that correspond to the text elements in the wakeup word text; and calculating a confidence according to the target probability vector, and determining that the speech frames comprise the wakeup word text when the confidence is greater than or equal to a threshold. 9. The computing device according to claim 8 , wherein the calculating a confidence according to the target probability vector comprises: performing probability processing on the posterior probability values comprised in the target probability vector; determining whether the wakeup word text comprises a polyphonic character according to a mapping relationship between syllable identifiers comprised in the syllable combination sequence and characters comprised in the wakeup word text; and calculating the confidence according to the target probability vector after the probability processing when the wakeup word text comprises no polyphonic character. 10. The computing device according to claim 9 , wherein the calculating a confidence according to the target probability vector further comprises: performing, when the wakeup word text comprises a polyphonic character, summation on the target probability vector after the probability processing according to a correspondence of the polyphonic character; and calculating the confidence according to the target probability vector after the summation. 11. The computing device according to claim 9 , wherein the performing probability processing on the posterior probability values comprised in the target probabilit

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G10L15/16Primary
using artificial neural networks · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L15/187Primary
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2015/027
Syllables being the recognition units · CPC title

Patent family

Related publications grouped by family.

View patent family 69576497

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11848008B2 cover?: This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames …
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).