What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu May 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Low resource key phrase detection for wake on voice

US2017148444A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2017148444-A1
Application number	US-201514950670-A
Country	US
Kind code	A1
Filing date	Nov 24, 2015
Priority date	Nov 24, 2015
Publication date	May 25, 2017
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for key phrase detection comprising: generating, via acoustic scoring of an acoustic model, a time series of scores of sub-phonetic units based on a time series of feature vectors representative of received audio input; updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a rejection likelihood score and a key phrase likelihood score; and determining whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score. 2 . The method of claim 1 , wherein the start state based rejection model comprises self loops associated with at least some of the scores of sub-phonetic units of the acoustic model. 3 . The method of claim 1 , wherein the start state based rejection model consists of a single state preceding the key phrase model. 4 . The method of claim 1 , wherein the key phrase model comprises a multi-state lexicon look up key phrase model having transitions associated with the lexicon look up for the predetermined key phrase. 5 . The method of claim 4 , wherein the key phrase likelihood score is associated with a final state of the multi-state lexicon look up key phrase model. 6 . The method of claim 1 , wherein determining whether the received audio input is associated with the predetermined key phrase comprises: determining a log likelihood score based on the rejection likelihood score and the key phrase likelihood score; and comparing the log likelihood score to a threshold. 7 . The method of claim 1 , wherein the acoustic model comprises a deep neural network and the time series of feature vectors comprises a first feature vector comprising a stack of a time series of coefficients each associated with a sampling time. 8 . The method of claim 1 , further comprising: updating a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score; and determining whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score. 9 . The method of claim 8 , wherein the received audio input is associated with the second predetermined key phrase, the method further comprising: providing a system command corresponding to the second predetermined key phrase. 10 . A system for performing key phrase detection comprising: a memory configured to store an acoustic model, a start state based rejection model, and a key phrase model associated with a predetermined key phrase; and a digital signal processor coupled to the memory, the digital signal processor to generate, based on the acoustic model, a time series of scores of sub-phonetic units based on a time series of feature vectors representative of an audio input, to update the start state based rejection model and the key phrase model based on at least some of the time series of scores of sub-phonetic units to generate a rejection likelihood score and a key phrase likelihood score, and to determine whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score. 11 . The system of claim 10 , wherein the start state based rejection model comprises self loops associated with at least some of the scores of sub-phonetic units of the acoustic model. 12 . The system of claim 10 , wherein the start state based rejection model consists of a single state preceding the key phrase model. 13 . The system of claim 10 , wherein the key phrase model comprises a multi-state lexicon look up key phrase model having transitions associated with the lexicon look up for the predetermined key phrase. 14 . The system of claim 13 , wherein the key phrase likelihood score is associated with a final state of the multi-state lexicon look up key phrase model. 15 . The system of claim 10 , wherein the digital signal processor to determine whether the received audio input is associated with the predetermined key phrase comprises the digital signal processor to determine a log likelihood score based on the rejection likelihood score and the key phrase likelihood score and compare the log likelihood score to a threshold. 16 . The system of claim 10 , wherein the acoustic model comprises a deep neural network and the time series of feature vectors comprises a first feature vector comprising a stack of a time series of coefficients each associated with a sampling time. 17 . The system of claim 10 , wherein the digital signal processor is further to update a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score and determine whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score. 18 . At least one machine readable medium comprising a plurality of instructions that, in response to being executed on a device, cause the device to generate a key phrase detection model comprising a start state based rejection model, a key phrase model, and a pruned acoustic model by: training an acoustic model having a plurality of output nodes, the output nodes comprising a plurality of sub-phonetic units in the form of tied context-dependent triphone HMM-states, wherein each of the tied context-dependent triphone HMM-states is associated with one of a plurality of monophones; and generating a selected subset of the output nodes by: determining a usage rate for each of the sub-phonetic units during the training; including, in the selected subset, at least one output node corresponding to a highest usage rate sub-phonetic unit for each of the plurality of monophones; and including, in the selected subset, output nodes corresponding to nodes of the key phrase model. 19 . The machine readable medium of claim 18 , further comprising instructions that, in response to being executed on a computing device, cause the device to generate the key phrase detection model by: generating a pruned acoustic model having outputs consisting of the selected subset of the output nodes. 20 . The machine readable medium of claim 18 , wherein the plurality of output nodes of the acoustic model further comprise a plurality of non-speech nodes, and wherein the selected subset of the output nodes comprises the plurality of non-speech nodes. 21 . The machine readable medium of claim 18 , wherein determining the usage rate for each of the sub-phonetic units comprises incrementing a first usage rate associated with a first sub-phonetic unit when the first sub-phonetic unit has a non-zero output during the training of the acoustic model. 22 . The machine readable medium of claim 18 , wherein the start state based rejection model comprises a single state and self loops corresponding to the output nodes of the highest usage rate sub-phonetic unit for each of the plurality of monophones of the selected subset of the output nodes. 23 . The machine readable medium of claim 18 , wherein the

Assignees

Intel Ip Corp

Inventors

Classifications

G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G10L15/142
Hidden Markov Models [HMMs] · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L17/22Primary
Interactive procedures; Man-machine interfaces · CPC title

Patent family

Related publications grouped by family.

View patent family 58721077

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017148444A1 cover?: Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a pr…
Who is the assignee on this patent?: Intel Ip Corp
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu May 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Unsupervised acoustic model training

Isolated word training and detection

Preventing false wake word detections with a voice-controlled device

Selective enabling of a component by a microphone circuit

Robust Feature Extraction Using Differential Zero-Crossing Countes

Frequently asked questions