Wake word detection modeling

US11657804B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11657804-B2
Application numberUS-202017090716-A
CountryUS
Kind codeB2
Filing dateNov 5, 2020
Priority dateJun 20, 2014
Publication dateMay 23, 2023
Grant dateMay 23, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: receive audio data from a computing device subsequent to the computing device determining a wake word was detected in a first portion of the audio data; generate speech recognition result data using at least the first portion of the audio data; generate acoustic data representing an acoustic property of a voice represented by the audio data; generate feature data using the speech recognition result data and the acoustic data; generate wake word detection data using a statistical wake word detection model trained to receive the feature data as input; and determine, based on the wake word detection data, that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word. 2. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to close an audio data stream from the computing device subsequent to determining that the audio data fails to satisfy the detection criterion. 3. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: generate detection score data using the statistical wake word detection model and the feature data; and determine that the detection score data fails to satisfy a detection threshold, wherein the detection score data failing to satisfy the detection threshold indicates the audio data fails to satisfy the detection criterion. 4. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to determine the acoustic property, wherein the acoustic property comprises at least one of: prosody, energy, or speaking rate. 5. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to generate environmental data representing a property of an environment in which the computing device is located. 6. The system of claim 5 , wherein the environmental data represents at least one of: a noise level of the environment, an acoustic property of the environment, or a distance between a user and a microphone of the computing device. 7. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: generate a user-specific statistical wake word detection model based on at least one of: the speech recognition result data, the acoustic data, or the feature data; generate second feature data based at least partly on second audio data; and determine, using the user-specific statistical wake word detection model, that the second audio data satisfies the detection criterion. 8. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: generate a user-specific statistical wake word detection model based on at least one of: the speech recognition result data, the acoustic data, or the feature data; and send the user-specific statistical wake word detection model to the computing device. 9. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to receive identity data representing at least one of a start position or an end position of the first portion of audio data within the audio data. 10. The system of claim 1 , wherein the statistical wake word detection model comprises a neural network. 11. A computer-implemented method comprising: as implemented by a computing system comprising one or more processors configured to execute specific instructions, receiving audio data subsequent to a wake word being detected in a first portion of the audio data; generating speech recognition result data using at least the first portion of the audio data; generating acoustic data representing an acoustic property of a voice represented by the audio data; generating feature data using the speech recognition result data and the acoustic data; generating wake word detection data using a statistical wake word detection model trained to receive the feature data as input; and determining, based on the wake word detection data, that the audio data satisfies a detection criterion related to detecting a representation of the wake word. 12. The computer-implemented method of claim 11 , wherein the receiving the audio data comprises receiving the audio data over a network from a user computing device, wherein the audio data is received subsequent to the user computing device determining the wake word was detected in the first portion of the audio data. 13. The computer-implemented method of claim 12 , further comprising receiving, over the network from the user computing device, identity data representing at least one of a start position or an end position of the first portion of audio data within the audio data. 14. The computer-implemented method of claim 11 , wherein the receiving the audio data comprises receiving the audio data from a microphone of the computing system. 15. The computer-implemented method of claim 11 , further comprising: generating natural language understanding data using the audio data, wherein the natural language understanding data represents an action requested in the audio data; and performing the action. 16. The computer-implemented method of claim 11 , further comprising: generating detection score data using the statistical wake word detection model and the feature data; and determining that the detection score data satisfies a detection threshold, wherein the detection score data satisfying the detection threshold indicates the audio data satisfies the detection criterion. 17. The computer-implemented method of claim 11 , further comprising generating a user-specific statistical wake word detection model based on at least one of the speech recognition result data, the acoustic data, or the feature data. 18. The computer-implemented method of claim 17 , further comprising: generating second feature data based at least partly on second audio data; and determining, using the user-specific statistical wake word detection model, that the second audio data satisfies the detection criterion. 19. The computer-implemented method of claim 17 , further comprising sending the user-specific statistical wake word detection model to a user computing device. 20. The computer-implemented method of claim 11 , further comprising generating environmental data representing an environmental property of an environment in which a user computing device is located, wherein generating the feature data comprises using the environmental data.

Assignees

Inventors

Classifications

  • Speech classification or search · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • G10L15/18Primary

    using natural language modelling · CPC title

  • Word spotting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11657804B2 cover?
Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).