Multi-stage hotword detection

US9418656B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9418656-B2
Application numberUS-201514657588-A
CountryUS
Kind codeB2
Filing dateMar 13, 2015
Priority dateOct 29, 2014
Publication dateAug 16, 2016
Grant dateAug 16, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance. The actions further include determining a likelihood that the initial portion of the utterance includes a hotword. The actions further include determining that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold. The actions further include, in response to determining that the likelihood satisfies the threshold, transmitting a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance; determining, by the second stage hotword detector, a likelihood that the initial portion of the utterance includes a hotword; determining, by the second stage hotword detector, that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold; and in response to determining that the likelihood satisfies the threshold, transmitting, by the second stage hotword detector and to the first stage hotword detector, a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance, wherein the method is executed by at least one processor. 2. The method of claim 1 , wherein the first stage hotword detector is implemented in a digital signal processor and the second stage hotword detector is implemented in software. 3. The method of claim 1 , comprising: providing, by the second stage hotword detector, the audio data to a speaker identifier. 4. The method of claim 1 , wherein receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance comprises: accessing, by the second stage hotword detector, the audio data from a particular memory location, wherein the first stage hotword detector stored the audio data in the particular memory location. 5. The method of claim 1 , wherein the first stage hotword detector is based on a neural network and includes a first number of nodes and a second number of hidden layers, and wherein the second stage hotword detector is based on the neural network and includes a third number of nodes and a fourth number of hidden layers, the third number being greater than the first number and the fourth number being greater than the second number. 6. The method of claim 1 , wherein the first stage hotword detector is speaker and language independent and the second stage hotword detector is speaker and language dependent. 7. The method of claim 1 , wherein the audio data that corresponds to the initial portion of an utterance includes audio data that was received before the initial portion of the utterance. 8. The method of claim 1 , comprising: receiving, by the second stage hotword detector, data indicating that the first stage hotword detector determined an initial likelihood that the initial portion of the utterance included the hotword, wherein the initial likelihood satisfied an initial threshold. 9. The method of claim 1 , wherein transmitting, by the second stage hotword detector and to the first stage hotword detector, a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance comprises: transmitting, by the second stage hotword detector and to the first stage hotword detector, the request for the first stage hotword detector to cease providing, to a memory for consumption by the second stage hotword detector or directly to the second stage hotword detector, the additional audio data that corresponds to the one or more subsequent portions of the utterance. 10. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance; determining, by the second stage hotword detector, a likelihood that the initial portion of the utterance includes a hotword; determining, by the second stage hotword detector, that the likelihood that the initial portion of the utterance includes the hotword satisfies a threshold; and in response to determining that the likelihood satisfies the threshold, transmitting, by the second stage hotword detector and to the first stage hotword detector, a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance. 11. The system of claim 10 , wherein the first stage hotword detector is implemented in a digital signal processor and the second stage hotword detector is implemented in software. 12. The system of claim 10 , wherein the operations further comprise: providing, by the second stage hotword detector, the audio data to a speaker identifier. 13. The system of claim 10 , wherein receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corresponds to an initial portion of an utterance comprises: accessing, by the second stage hotword detector, the audio data from a particular memory location, wherein the first stage hotword detector stored the audio data in the particular memory location. 14. The system of claim 10 , wherein the first stage hotword detector is based on a wherein the second stage hotword detector is based on the neural network and includes a third number of nodes and a fourth number of hidden layers, the third number being greater than the first number and the fourth number being greater than the second number. 15. The system of claim 10 , wherein the first stage hotword detector is speaker and language independent and the second stage hotword detector is speaker and language dependent. 16. The system of claim 10 , wherein the audio data that corresponds to the initial portion of an utterance includes audio data that was received before the initial portion of the utterance. 17. The system of claim 10 , wherein the operations further comprise: receiving, by the second stage hotword detector, data indicating that the first stage hotword detector determined an initial likelihood that the initial portion of the utterance included the hotword, wherein the initial likelihood satisfied an initial threshold. 18. The system of claim 10 , wherein transmitting, by the second stage hotword detector and to the first stage hotword detector, a request for the first stage hotword detector to cease providing additional audio data that corresponds to one or more subsequent portions of the utterance comprises: transmitting, by the second stage hotword detector and to the first stage hotword detector, the request for the first stage hotword detector to cease providing, to a memory for consumption by the second stage hotword detector or directly to the second stage hotword detector, the additional audio data that corresponds to the one or more subsequent portions of the utterance. 19. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector an

Assignees

Inventors

Classifications

  • using artificial neural networks · CPC title

  • Execution procedure of a spoken command · CPC title

  • G10L17/10Primary

    Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems · CPC title

  • Word spotting · CPC title

  • Speaker identification or verification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9418656B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multi-stage hotword detection are disclosed. In one aspect, a method includes the actions of receiving, by a second stage hotword detector of a multi-stage hotword detection system that includes at least a first stage hotword detector and the second stage hotword detector, audio data that corr…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).