What technology area does this patent fall under?

Primary CPC classification G10L25/51. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Acoustic event detection

US11790932B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11790932-B2
Application number	US-202117547644-A
Country	US
Kind code	B2
Filing date	Dec 10, 2021
Priority date	Dec 10, 2021
Publication date	Oct 17, 2023
Grant date	Oct 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data corresponding to audio data that potentially represents occurrence of one or more events. Based on processing by the first and second AED components, a device may output data indicating that one or more acoustic events occurred, where the acoustic events may be a predetermined acoustic event and/or a custom acoustic event.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a device associated with a user profile, first audio data including a plurality of audio frames; determining, using first audio frames of the plurality of audio frames, first feature data representing log Mel-filterbank energy features; processing the first feature data using a first convolutional recurrent neural network (CRNN) to determine first encoded representation data, the first CRNN configured as an encoder associated with a first acoustic event detector to detect an acoustic event from a predetermined set of acoustic events; processing the first feature data using a second CRNN to determine second encoded representation data, the second CRNN configured as an encoder associated with a second acoustic event detector different from the first acoustic event detector, the second acoustic event detector configured to detect an acoustic event from a custom set of acoustic events associated with the user profile; determining, using the first encoded representation data and the first acoustic event detector, a likelihood that a first acoustic event from the predetermined set of acoustic events is represented in the first audio frames; determining, using the second encoded representation data and the second acoustic event detector, comparison data representing that a second acoustic event from the custom set of acoustic events is represented in the first audio frames; and determining, based at least in part on the likelihood and the comparison data, output data indicating that at least one of the first acoustic event or the second acoustic event occurred. 2. The computer-implemented method of claim 1 , wherein determining the likelihood that the first acoustic event is represented in the first audio frames comprises: processing the first encoded representation data using a classifier of the first acoustic event detector, the classifier configured to detect occurrence of one or more of the predetermined set of acoustic events; determining, based on processing by the classifier, the likelihood that the first acoustic event occurred; and determining, based on the likelihood, that the first acoustic event is represented in the first audio frames. 3. The computer-implemented method of claim 1 , wherein determining the comparison data representing that the second acoustic event is represented in the first audio frames comprises: using a comparison component of the second acoustic event detector to process the second encoded representation data with respect to stored custom event profile data associated with the second acoustic event and the user profile; determining the comparison data representing a cosine similarity between the second encoded representation data and the stored custom event profile data; and determining, based on the comparison data satisfying a threshold associated with the stored custom event profile data, that the second acoustic event is represented in the first audio frames. 4. The computer-implemented method of claim 3 , further comprising, prior to receiving the first audio data: receiving second audio data representing occurrence of the second acoustic event; determining, using the second CRNN and the second audio data, third encoded representation data; receiving third audio data representing occurrence of the second acoustic event; determining, using the second CRNN and the third audio data, fourth encoded representation data; determining, using the third encoded representation data and the fourth encoded representation data, the stored custom event profile data corresponding to the second acoustic event; and determining, using the third encoded representation data and the fourth encoded representation data, the threshold corresponding to detection of the second acoustic event. 5. A computer-implemented method comprising: receiving, by a device, first audio data; determining, using the first audio data, first acoustic feature data; determining, by processing the first acoustic feature data using a first acoustic event detection (AED) component configured to detect occurrence of one or more acoustic events from a predetermined set of acoustic events, first event detection data representing a likelihood that at least one acoustic event from the predetermined set of acoustic events is represented in the first audio data, wherein the first AED component is a classifier-based AED component; determining, by processing the first acoustic feature data using a second AED component configured to detect occurrence of one or more acoustic events from a custom set of acoustic events associated with the device, second event detection data based at least in part on a comparison of the first acoustic feature data with stored event data representing the custom set of acoustic events, wherein the second AED component is a comparison-based AED component; determining, based at least in part on the first event detection data and the second event detection data, that at least one of a first acoustic event from the predetermined set of acoustic events or a second acoustic event from the custom set of acoustic events is represented in the first audio data; and determining output data indicating that at least one of the first acoustic event or the second acoustic event occurred. 6. The computer-implemented method of claim 5 , wherein processing the first acoustic feature data using the first AED component comprises: processing the first acoustic feature data using a convolutional recurrent neural network (CRNN) to determine encoded representation data, wherein the CRNN is configured as an encoder associated with the first AED component to detect an acoustic event from the predetermined set of acoustic events; processing the encoded representation data using a classifier of the first AED component configured to detect occurrence of one or more of the predetermined set of acoustic events; and determining, based on processing by the classifier, that the first acoustic event is represented in the first audio data. 7. The computer-implemented method of claim 6 , further comprising: determining, using the first acoustic feature data and a feature normalization component associated with the first AED component, normalized feature data, wherein the feature normalization component is configured using audio samples corresponding to the predetermined set of acoustic events; and processing the normalized feature data using the CRNN. 8. The computer-implemented method of claim 5 , wherein processing the first acoustic feature data using the second AED component comprises: processing the first acoustic feature data using a CRNN to determine first encoded representation data, wherein the CRNN is configured as an encoder associated with the second AED component to detect an acoustic event from the custom set of acoustic events; processing the first encoded representation data with respect to stored custom event profile data associated with a user profile associated with the device; and determining, based on processing the first encoded representation data with respect to stored custom event profile data, that the second acoustic event is represented in the first audio data. 9. The computer-implemented method of claim 8 , further comprising: determining, using the first acoustic feature data and a feature normalization component associated with the second AED component, normalized feature data, wherein the feature normalization component is configured using audio samples corresponding to a plurality of acoustic events; and processing the normalized feature data using the CRNN. 10. The computer-implemented method of

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L25/51Primary
for comparison or discrimination · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/08
Learning methods · CPC title
G10L25/21
the extracted parameters being power information · CPC title
G10L25/30
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 84887714

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11790932B2 cover?: A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data …
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).