Low resource key phrase detection for wake on voice

US9792907B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9792907-B2
Application numberUS-201514950670-A
CountryUS
Kind codeB2
Filing dateNov 24, 2015
Priority dateNov 24, 2015
Publication dateOct 17, 2017
Grant dateOct 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for key phrase detection comprising: generating, via acoustic scoring of an acoustic model, a time series of scores of sub-phonetic units based on a time series of feature vectors representative of received audio input; updating a start state based rejection model and a key phrase model associated with a predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a rejection likelihood score and a key phrase likelihood score, wherein the start state based rejection model consists of a single rejection state and comprises a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units of the acoustic model, wherein the key phrase model comprises a plurality of key phrase states interconnected by transitions therebetween, the plurality of key phrase states each comprising a self loop associated with a particular score of the scores of sub-phonetic units of the acoustic model, and wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states; and determining whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score. 2. The method of claim 1 , wherein updating the start state based rejection model and the key phrase model comprises: providing a continual summing at the single rejection state of the start state based rejection model based on a previous score of the single rejection state and the particular scores corresponding to the plurality of rejection model self loops; and providing a continual summing at each key phrase state of the plurality of key phrase states based on a previous score of each key phrase state, the particular score corresponding to the self loop of each key phrase state, and a second score transitioned to each key phrase state from another state. 3. The method of claim 2 , wherein updating the start state based rejection model and the key phrase model further comprises: comparing a sum of the previous score for a first key phrase state and a particular score corresponding to the self loop of the first key phrase state to a score for a second key phrase state interconnected to the first key phrase state by a first transition; and updating the score for the second key phrase state to the sum when the sum is greater than the score for the second key phrase state. 4. The method of claim 1 , wherein the key phrase model comprises a multi-state lexicon look up key phrase model and the transitions of the key phrase model are associated with the lexicon look up for the predetermined key phrase. 5. The method of claim 4 , wherein the key phrase likelihood score is associated with a final state of the multi-state lexicon look up key phrase model. 6. The method of claim 1 , wherein determining whether the received audio input is associated with the predetermined key phrase comprises: determining a log likelihood score based on the rejection likelihood score and the key phrase likelihood score; and comparing the log likelihood score to a threshold. 7. The method of claim 1 , wherein the acoustic model comprises a deep neural network and the time series of feature vectors comprises a first feature vector comprising a stack of a time series of coefficients each associated with a sampling time. 8. The method of claim 1 , further comprising: updating a second key phrase model associated with a second predetermined key phrase based on at least some of the time series of scores of sub-phonetic units to generate a second key phrase likelihood score; and determining whether the received audio input is associated with the second predetermined key phrase based on the rejection likelihood score and the second key phrase likelihood score. 9. The method of claim 8 , wherein the received audio input is associated with the second predetermined key phrase, the method further comprising: providing a system command corresponding to the second predetermined key phrase. 10. A system for performing key phrase detection comprising: a memory configured to store an acoustic model, a start state based rejection model, and a key phrase model associated with a predetermined key phrase; and a digital signal processor coupled to the memory, the digital signal processor to generate, based on the acoustic model, a time series of scores of sub-phonetic units based on a time series of feature vectors representative of an audio input, to update the start state based rejection model and the key phrase model based on at least some of the time series of scores of sub-phonetic units to generate a rejection likelihood score and a key phrase likelihood score, wherein the start state based rejection model consists of a single rejection state and comprises a plurality of rejection model self loops each associated with a particular score of the scores of sub-phonetic units of the acoustic model, wherein the key phrase model comprises a plurality of key phrase states interconnected by transitions therebetween, the plurality of key phrase states each comprising a self loop associated with a particular score of the scores of sub-phonetic units of the acoustic model, and wherein the start state based rejection model and the key phrase model are connected by a first transition from the single rejection state to a first key phrase state of the plurality of key phrase states, and to determine whether the received audio input is associated with the predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score. 11. The system of claim 10 , wherein the digital signal processor to update the start state based rejection model and the key phrase model comprises the digital signal processor to provide a continual summing at the single rejection state of the start state based rejection model based on a previous score of the single rejection state and the particular scores corresponding to the plurality of rejection model self loops and to provide a continual summing at each key phrase state of the plurality of key phrase states based on a previous score of each key phrase state, the particular score corresponding to the self loop of each key phrase state, and a second score transitioned to each key phrase state from another state. 12. The system of claim 10 , wherein the digital signal processor to update the start state based rejection model and the key phrase model comprises the digital signal processor further to compare a sum of the previous score for a first key phrase state and a particular score corresponding to the self loop of the first key phrase state to a score for a second key phrase state interconnected to the first key phrase state by a first transition and to update the score for the second key phrase state to the sum when the sum is greater than the score for the second key phrase state. 13. The system of claim 10 , wherein the key phrase model comprises a multi-state lexicon look up key phrase model and the transitions of the key phrase model are associated with the lexicon look up for the predetermined key phrase. 14. The system of claim 13 , wherein the key phrase likelihood score is associated with a final state of the multi-state lexicon look up key phrase model. 15. The system of claim 10 , wherein the digital signal processor to determine whether the received audio input is associated with the predetermined

Assignees

Inventors

Classifications

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Hidden Markov Models [HMMs] · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9792907B2 cover?
Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a pr…
Who is the assignee on this patent?
Intel Ip Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).