What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Wakeword and acoustic event detection

US11670299B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11670299-B2
Application number	US-202117321999-A
Country	US
Kind code	B2
Filing date	May 17, 2021
Priority date	Jun 26, 2019
Publication date	Jun 6, 2023
Grant date	Jun 6, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: determining a feature vector representing at least one frame of audio data; determining, using a first model and the feature vector, first output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a word; and determining, using a second model different from the first model and the feature vector, second output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a non-speech acoustic event, wherein determination of the second output data is performed independently of the first output data. 2. The computer-implemented method of claim 1 , further comprising: processing the first output data using a normalization component to determine first probability data. 3. The computer-implemented method of claim 1 , further comprising: processing the second output data using at least one activation function component to determine the second output data. 4. The computer-implemented method of claim 3 , further comprising: processing the second output data using a classifier to detect an occurrence of the non- speech acoustic event. 5. The computer-implemented method of claim 1 , wherein the non-speech acoustic event comprises a non-speech sound made by a human. 6. The computer-implemented method of claim 1 , wherein the first output data corresponds to a likelihood that the at least one frame includes a representation of at least part of a first wakeword. 7. The computer-implemented method of claim 6 , further comprising: determining, using the feature vector, third output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a second wakeword. 8. The computer-implemented method of claim 1 , further comprising: receiving the at least one frame of audio data; and processing the at least one frame of audio data using a feature-extraction model to determine the feature vector, the feature-extraction model configured to determine feature output data operable by both the first model and the second model, wherein determining the first output data comprises processing the feature vector using the first model, and wherein determining the second output data comprises processing the feature vector using the second model. 9. The computer-implemented method of claim 1 , wherein the feature vector represents acoustic feature data and the method further comprises: processing the feature vector using a feature-extraction model to determine a second feature vector, the feature-extraction model configured to determine feature output data operable by both the first model and the second model, wherein determining the first output data comprises processing the second feature vector using the first model, and wherein determining the second output data comprises processing the second feature vector using the second model. 10. The computer-implemented method of claim 1 , wherein: determining the first output data comprises: processing the feature vector using a feature extraction component to determine first feature data, and processing the first feature data using the first model to determine the first output data; and determining the second output data comprises: processing the feature vector using the feature extraction component to determine second feature data, and processing the second feature data using the second model to determine the second output data. 11. The computer-implemented method of claim 1 , wherein: the feature vector represents acoustic feature data; the first model comprises a feature extraction component; determining the first output data comprises: processing the feature vector using the first model to determine a second feature vector, and using the second feature vector to determine the first output data; and determining the second output data comprises using the second feature vector and the second model to determine the second output data. 12. A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: determine a feature vector representing at least one frame of audio data; determine, using a first model and the feature vector, first output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a wakeword; and determine, using a second model different from the first model and the feature vector, second output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a non-speech acoustic event, wherein determination of the second output data is performed independently of the first output data. 13. The system of claim 12 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the first output data using a normalization component to determine first probability data. 14. The system of claim 12 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the second output data using at least one activation function component to determine the second output data. 15. The system of claim 14 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the second output data using a classifier to detect an occurrence of the non- speech acoustic event. 16. The system of claim 12 , wherein the non-speech acoustic event comprises a non-speech sound made by a human. 17. The system of claim 12 , wherein the first output data corresponds to a likelihood that the at least one frame includes a representation of at least part of a first wakeword. 18. The system of claim 17 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using the feature vector, third output data corresponding to a likelihood that the at least one frame includes a representation of at least part of a second wakeword. 19. The system of claim 12 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: receive the at least one frame of audio data; and process the at least one frame of audio data using a feature-extraction model to determine the feature vector, the feature-extraction model configured to determine feature output data operable by both the first model and the second model, wherein the instructions that cause the system to determine the first output data comprise instructions that, when executed by the at least one processor, further cause the system to process the feature vector using the first model, and wherein the instructions that cause the system to determine the second output data comprise instructions that, when executed by the at least one processor, further cause the system to process the feature vector using the second model. 20. The system of claim 12 , wherein the feature vector represents acoustic feature data and wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the feature v

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L15/16Primary
using artificial neural networks · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L25/30
using neural networks · CPC title
G10L25/51
for comparison or discrimination · CPC title
G10L2015/088
Word spotting · CPC title

Patent family

Related publications grouped by family.

View patent family 76441885

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11670299B2 cover?: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, t…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).