Method and device for waking up via speech based on artificial intelligence and computer device

US10388276B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10388276-B2
Application numberUS-201715854926-A
CountryUS
Kind codeB2
Filing dateDec 27, 2017
Priority dateMay 16, 2017
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a method and a device for waking up via a speech based on AI and a computer device. The method includes the followings. A windowing and framing operation is performed on an online recorded speech, to obtain at least one speech frame. A feature extraction is performed on the at least one speech frame, to obtain speech features. A calculation is performed on a static speech feature contained in the at least one speech frame through a speech wake-up model based on convolutional neural network, to obtain a first posteriori probability matched with a category of non-wake-up words and a second posteriori probability matched with a category of wake-up words. It is determined that a wake-up word is contained in the online recorded speech, when the second posteriori probability is greater than or equal to a preset threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for waking up via a speech based on artificial intelligence, performed by one or more computer devices and comprising: performing a windowing and framing operation on an online recorded speech, to obtain at least one speech frame; performing a feature extraction on the at least one speech frame, to obtain speech features; performing a calculation on a static speech feature contained in the at least one speech frame through a speech wake-up model based on convolutional neural network, to obtain a first posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of non-wake-up words and a second posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of wake-up words; and determining that a wake-up word is contained in the online recorded speech, when the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words is greater than or equal to a preset threshold. 2. The method according to claim 1 , wherein, before performing the windowing and framing operation on the online recorded speech, to obtain the at least one speech frame, the method further comprises: recording online the speech inputted by a user. 3. The method according to claim 1 , wherein the static speech feature contained in the at least one speech frame comprises: a static speech feature contained in a current speech frame, a static speech feature contained in a first number of speech frames ahead of the current speech frame, and a static speech feature contained in a second number of speech frames behind of the current speech frame. 4. The method according to claim 1 , wherein after obtaining the first posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of non-wake-up words and the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words, the method further comprises: accumulating the first posteriori probability of the static speech feature contained in a preset number of speech frames and matched with the category of non-wake-up words, and accumulating the second posteriori probability of the static speech feature contained in the preset number of speech frames and matched with the category of wake-up words, obtaining a third posteriori probability of the static speech frames contained in the preset number of speech frames and matched with the category of non-wake-up words, and obtaining a fourth posteriori probability of the static speech frames contained in the preset number of speech frames and matched with the category of wake-up words; and wherein the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words being greater than or equal to the preset threshold comprises: the fourth posteriori probability of the static speech frames contained in the preset number of speech frames and matched with the category of wake-up words is greater than or equal to the preset threshold. 5. The method according to claim 1 , wherein before performing the calculation on the static speech feature contained in the at least one speech frame through the speech wake-up model based on convolutional neural network, the method further comprises: training an initiated model based on convolutional neural network according to a training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network. 6. The method according to claim 5 , wherein training the initiated model based on convolutional neural network according to the training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network comprises: by using training data, training the initiated model based on convolutional neural network according to the training criterion based on connectionist temporal classifier, to obtain a seed model based on convolutional neural network; testing the seed model based on convolutional neural network through test data, to obtain error test data falsely identified by the seed model based on convolutional neural network; and training again the seed model based on convolutional neural network by using the error test data falsely identified until the seed model based on convolutional neural network is converged on a development set, to obtain the speech wake-up model based on convolutional neural network. 7. The method according to claim 2 , wherein before performing the calculation on the static speech feature contained in the at least one speech frame through the speech wake-up model based on convolutional neural network, the method further comprises: training an initiated model based on convolutional neural network according to a training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network. 8. The method according to claim 3 , wherein before performing the calculation on the static speech feature contained in the at least one speech frame through the speech wake-up model based on convolutional neural network, the method further comprises: training an initiated model based on convolutional neural network according to a training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network. 9. A computer device, comprising: one or more processors; a storage device, configured to store one or more programs; wherein the one or more processors are configured to read the one or more programs from the storage device to execute acts of: performing a windowing and framing operation on an online recorded speech, to obtain at least one speech frame; performing a feature extraction on the at least one speech frame, to obtain speech features; performing a calculation on a static speech feature contained in the at least one speech frame through a speech wake-up model based on convolutional neural network, to obtain a first posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of non-wake-up words and a second posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of wake-up words; and determining that a wake-up word is contained in the online recorded speech, when the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words is greater than or equal to a preset threshold. 10. The computer device according to claim 9 , wherein the one or more processors are further configured to execute an act of: recording online the speech inputted by a user before performing the windowing and framing operation on the online recorded speech, to obtain the at least one speech frame. 11. The computer device according to claim 9 , wherein the static speech feature contained in the at least one speech frame comprises: a static speech feature contained in a current speech frame, a static speech feature contained in a first number of speech frames ahead of the current speech frame, and a static speech feature contained in a second number of speech frames behind of the current speech frame. 12. The computer device according to claim 9 , wherein the one or more processors a

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Suspend and resume; Hibernate and awake · CPC title

  • using neural networks · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10388276B2 cover?
Embodiments of the present disclosure provide a method and a device for waking up via a speech based on AI and a computer device. The method includes the followings. A windowing and framing operation is performed on an online recorded speech, to obtain at least one speech frame. A feature extraction is performed on the at least one speech frame, to obtain speech features. A calculation is perfo…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).