Method and device for extracting speech feature based on artificial intelligence
US-2018182377-A1 · Jun 28, 2018 · US
US10388276B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10388276-B2 |
| Application number | US-201715854926-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 27, 2017 |
| Priority date | May 16, 2017 |
| Publication date | Aug 20, 2019 |
| Grant date | Aug 20, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure provide a method and a device for waking up via a speech based on AI and a computer device. The method includes the followings. A windowing and framing operation is performed on an online recorded speech, to obtain at least one speech frame. A feature extraction is performed on the at least one speech frame, to obtain speech features. A calculation is performed on a static speech feature contained in the at least one speech frame through a speech wake-up model based on convolutional neural network, to obtain a first posteriori probability matched with a category of non-wake-up words and a second posteriori probability matched with a category of wake-up words. It is determined that a wake-up word is contained in the online recorded speech, when the second posteriori probability is greater than or equal to a preset threshold.
Opening claim text (preview).
What is claimed is: 1. A method for waking up via a speech based on artificial intelligence, performed by one or more computer devices and comprising: performing a windowing and framing operation on an online recorded speech, to obtain at least one speech frame; performing a feature extraction on the at least one speech frame, to obtain speech features; performing a calculation on a static speech feature contained in the at least one speech frame through a speech wake-up model based on convolutional neural network, to obtain a first posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of non-wake-up words and a second posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of wake-up words; and determining that a wake-up word is contained in the online recorded speech, when the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words is greater than or equal to a preset threshold. 2. The method according to claim 1 , wherein, before performing the windowing and framing operation on the online recorded speech, to obtain the at least one speech frame, the method further comprises: recording online the speech inputted by a user. 3. The method according to claim 1 , wherein the static speech feature contained in the at least one speech frame comprises: a static speech feature contained in a current speech frame, a static speech feature contained in a first number of speech frames ahead of the current speech frame, and a static speech feature contained in a second number of speech frames behind of the current speech frame. 4. The method according to claim 1 , wherein after obtaining the first posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of non-wake-up words and the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words, the method further comprises: accumulating the first posteriori probability of the static speech feature contained in a preset number of speech frames and matched with the category of non-wake-up words, and accumulating the second posteriori probability of the static speech feature contained in the preset number of speech frames and matched with the category of wake-up words, obtaining a third posteriori probability of the static speech frames contained in the preset number of speech frames and matched with the category of non-wake-up words, and obtaining a fourth posteriori probability of the static speech frames contained in the preset number of speech frames and matched with the category of wake-up words; and wherein the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words being greater than or equal to the preset threshold comprises: the fourth posteriori probability of the static speech frames contained in the preset number of speech frames and matched with the category of wake-up words is greater than or equal to the preset threshold. 5. The method according to claim 1 , wherein before performing the calculation on the static speech feature contained in the at least one speech frame through the speech wake-up model based on convolutional neural network, the method further comprises: training an initiated model based on convolutional neural network according to a training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network. 6. The method according to claim 5 , wherein training the initiated model based on convolutional neural network according to the training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network comprises: by using training data, training the initiated model based on convolutional neural network according to the training criterion based on connectionist temporal classifier, to obtain a seed model based on convolutional neural network; testing the seed model based on convolutional neural network through test data, to obtain error test data falsely identified by the seed model based on convolutional neural network; and training again the seed model based on convolutional neural network by using the error test data falsely identified until the seed model based on convolutional neural network is converged on a development set, to obtain the speech wake-up model based on convolutional neural network. 7. The method according to claim 2 , wherein before performing the calculation on the static speech feature contained in the at least one speech frame through the speech wake-up model based on convolutional neural network, the method further comprises: training an initiated model based on convolutional neural network according to a training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network. 8. The method according to claim 3 , wherein before performing the calculation on the static speech feature contained in the at least one speech frame through the speech wake-up model based on convolutional neural network, the method further comprises: training an initiated model based on convolutional neural network according to a training criterion based on connectionist temporal classifier, to obtain the speech wake-up model based on convolutional neural network. 9. A computer device, comprising: one or more processors; a storage device, configured to store one or more programs; wherein the one or more processors are configured to read the one or more programs from the storage device to execute acts of: performing a windowing and framing operation on an online recorded speech, to obtain at least one speech frame; performing a feature extraction on the at least one speech frame, to obtain speech features; performing a calculation on a static speech feature contained in the at least one speech frame through a speech wake-up model based on convolutional neural network, to obtain a first posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of non-wake-up words and a second posteriori probability of the static speech feature contained in the at least one speech frame and matched with a category of wake-up words; and determining that a wake-up word is contained in the online recorded speech, when the second posteriori probability of the static speech feature contained in the at least one speech frame and matched with the category of wake-up words is greater than or equal to a preset threshold. 10. The computer device according to claim 9 , wherein the one or more processors are further configured to execute an act of: recording online the speech inputted by a user before performing the windowing and framing operation on the online recorded speech, to obtain the at least one speech frame. 11. The computer device according to claim 9 , wherein the static speech feature contained in the at least one speech frame comprises: a static speech feature contained in a current speech frame, a static speech feature contained in a first number of speech frames ahead of the current speech frame, and a static speech feature contained in a second number of speech frames behind of the current speech frame. 12. The computer device according to claim 9 , wherein the one or more processors a
Combinations of networks · CPC title
Suspend and resume; Hibernate and awake · CPC title
using neural networks · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.