Low resource key phrase detection for wake on voice

US2017148444A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017148444-A1
Application numberUS-201514950670-A
CountryUS
Kind codeA1
Filing dateNov 24, 2015
Priority dateNov 24, 2015
Publication dateMay 25, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for key phrase detection comprising: generating, via acoustic scoring of an acoustic model, a time series of scores of sub-phonetic units based on a time series of feature vectors representative of received audio input; updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a rejection likelihood score and a key phrase likelihood score; and determining whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score. 2 . The method of claim 1 , wherein the start state based rejection model comprises self loops associated with at least some of the scores of sub-phonetic units of the acoustic model. 3 . The method of claim 1 , wherein the start state based rejection model consists of a single state preceding the key phrase model. 4 . The method of claim 1 , wherein the key phrase model comprises a multi-state lexicon look up key phrase model having transitions associated with the lexicon look up for the predetermined key phrase. 5 . The method of claim 4 , wherein the key phrase likelihood score is associated with a final state of the multi-state lexicon look up key phrase model. 6 . The method of claim 1 , wherein determining whether the received audio input is associated with the predetermined key phrase comprises: determining a log likelihood score based on the rejection likelihood score and the key phrase likelihood score; and comparing the log likelihood score to a threshold. 7 . The method of claim 1 , wherein the acoustic model comprises a deep neural network and the time series of feature vectors comprises a first feature vector comprising a stack of a time series of coefficients each associated with a sampling time. 8 . The method of claim 1 , further comprising: updating a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score; and determining whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score. 9 . The method of claim 8 , wherein the received audio input is associated with the second predetermined key phrase, the method further comprising: providing a system command corresponding to the second predetermined key phrase. 10 . A system for performing key phrase detection comprising: a memory configured to store an acoustic model, a start state based rejection model, and a key phrase model associated with a predetermined key phrase; and a digital signal processor coupled to the memory, the digital signal processor to generate, based on the acoustic model, a time series of scores of sub-phonetic units based on a time series of feature vectors representative of an audio input, to update the start state based rejection model and the key phrase model based on at least some of the time series of scores of sub-phonetic units to generate a rejection likelihood score and a key phrase likelihood score, and to determine whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score. 11 . The system of claim 10 , wherein the start state based rejection model comprises self loops associated with at least some of the scores of sub-phonetic units of the acoustic model. 12 . The system of claim 10 , wherein the start state based rejection model consists of a single state preceding the key phrase model. 13 . The system of claim 10 , wherein the key phrase model comprises a multi-state lexicon look up key phrase model having transitions associated with the lexicon look up for the predetermined key phrase. 14 . The system of claim 13 , wherein the key phrase likelihood score is associated with a final state of the multi-state lexicon look up key phrase model. 15 . The system of claim 10 , wherein the digital signal processor to determine whether the received audio input is associated with the predetermined key phrase comprises the digital signal processor to determine a log likelihood score based on the rejection likelihood score and the key phrase likelihood score and compare the log likelihood score to a threshold. 16 . The system of claim 10 , wherein the acoustic model comprises a deep neural network and the time series of feature vectors comprises a first feature vector comprising a stack of a time series of coefficients each associated with a sampling time. 17 . The system of claim 10 , wherein the digital signal processor is further to update a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score and determine whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score. 18 . At least one machine readable medium comprising a plurality of instructions that, in response to being executed on a device, cause the device to generate a key phrase detection model comprising a start state based rejection model, a key phrase model, and a pruned acoustic model by: training an acoustic model having a plurality of output nodes, the output nodes comprising a plurality of sub-phonetic units in the form of tied context-dependent triphone HMM-states, wherein each of the tied context-dependent triphone HMM-states is associated with one of a plurality of monophones; and generating a selected subset of the output nodes by: determining a usage rate for each of the sub-phonetic units during the training; including, in the selected subset, at least one output node corresponding to a highest usage rate sub-phonetic unit for each of the plurality of monophones; and including, in the selected subset, output nodes corresponding to nodes of the key phrase model. 19 . The machine readable medium of claim 18 , further comprising instructions that, in response to being executed on a computing device, cause the device to generate the key phrase detection model by: generating a pruned acoustic model having outputs consisting of the selected subset of the output nodes. 20 . The machine readable medium of claim 18 , wherein the plurality of output nodes of the acoustic model further comprise a plurality of non-speech nodes, and wherein the selected subset of the output nodes comprises the plurality of non-speech nodes. 21 . The machine readable medium of claim 18 , wherein determining the usage rate for each of the sub-phonetic units comprises incrementing a first usage rate associated with a first sub-phonetic unit when the first sub-phonetic unit has a non-zero output during the training of the acoustic model. 22 . The machine readable medium of claim 18 , wherein the start state based rejection model comprises a single state and self loops corresponding to the output nodes of the highest usage rate sub-phonetic unit for each of the plurality of monophones of the selected subset of the output nodes. 23 . The machine readable medium of claim 18 , wherein the

Assignees

Inventors

Classifications

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Hidden Markov Models [HMMs] · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • G10L17/22Primary

    Interactive procedures; Man-machine interfaces · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017148444A1 cover?
Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a pr…
Who is the assignee on this patent?
Intel Ip Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).