Wakeword detection using a neural network
US-11521599-B1 · Dec 6, 2022 · US
US11790932B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11790932-B2 |
| Application number | US-202117547644-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 10, 2021 |
| Priority date | Dec 10, 2021 |
| Publication date | Oct 17, 2023 |
| Grant date | Oct 17, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system may include a first acoustic event detection (AED) component configured to detect a predetermined set of acoustic events, and include a second AED component configured to detect custom acoustic events that a user configures a device to detect. The first and second AED components are configured to perform task-specific processing, and may receive as input the same acoustic feature data corresponding to audio data that potentially represents occurrence of one or more events. Based on processing by the first and second AED components, a device may output data indicating that one or more acoustic events occurred, where the acoustic events may be a predetermined acoustic event and/or a custom acoustic event.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by a device associated with a user profile, first audio data including a plurality of audio frames; determining, using first audio frames of the plurality of audio frames, first feature data representing log Mel-filterbank energy features; processing the first feature data using a first convolutional recurrent neural network (CRNN) to determine first encoded representation data, the first CRNN configured as an encoder associated with a first acoustic event detector to detect an acoustic event from a predetermined set of acoustic events; processing the first feature data using a second CRNN to determine second encoded representation data, the second CRNN configured as an encoder associated with a second acoustic event detector different from the first acoustic event detector, the second acoustic event detector configured to detect an acoustic event from a custom set of acoustic events associated with the user profile; determining, using the first encoded representation data and the first acoustic event detector, a likelihood that a first acoustic event from the predetermined set of acoustic events is represented in the first audio frames; determining, using the second encoded representation data and the second acoustic event detector, comparison data representing that a second acoustic event from the custom set of acoustic events is represented in the first audio frames; and determining, based at least in part on the likelihood and the comparison data, output data indicating that at least one of the first acoustic event or the second acoustic event occurred. 2. The computer-implemented method of claim 1 , wherein determining the likelihood that the first acoustic event is represented in the first audio frames comprises: processing the first encoded representation data using a classifier of the first acoustic event detector, the classifier configured to detect occurrence of one or more of the predetermined set of acoustic events; determining, based on processing by the classifier, the likelihood that the first acoustic event occurred; and determining, based on the likelihood, that the first acoustic event is represented in the first audio frames. 3. The computer-implemented method of claim 1 , wherein determining the comparison data representing that the second acoustic event is represented in the first audio frames comprises: using a comparison component of the second acoustic event detector to process the second encoded representation data with respect to stored custom event profile data associated with the second acoustic event and the user profile; determining the comparison data representing a cosine similarity between the second encoded representation data and the stored custom event profile data; and determining, based on the comparison data satisfying a threshold associated with the stored custom event profile data, that the second acoustic event is represented in the first audio frames. 4. The computer-implemented method of claim 3 , further comprising, prior to receiving the first audio data: receiving second audio data representing occurrence of the second acoustic event; determining, using the second CRNN and the second audio data, third encoded representation data; receiving third audio data representing occurrence of the second acoustic event; determining, using the second CRNN and the third audio data, fourth encoded representation data; determining, using the third encoded representation data and the fourth encoded representation data, the stored custom event profile data corresponding to the second acoustic event; and determining, using the third encoded representation data and the fourth encoded representation data, the threshold corresponding to detection of the second acoustic event. 5. A computer-implemented method comprising: receiving, by a device, first audio data; determining, using the first audio data, first acoustic feature data; determining, by processing the first acoustic feature data using a first acoustic event detection (AED) component configured to detect occurrence of one or more acoustic events from a predetermined set of acoustic events, first event detection data representing a likelihood that at least one acoustic event from the predetermined set of acoustic events is represented in the first audio data, wherein the first AED component is a classifier-based AED component; determining, by processing the first acoustic feature data using a second AED component configured to detect occurrence of one or more acoustic events from a custom set of acoustic events associated with the device, second event detection data based at least in part on a comparison of the first acoustic feature data with stored event data representing the custom set of acoustic events, wherein the second AED component is a comparison-based AED component; determining, based at least in part on the first event detection data and the second event detection data, that at least one of a first acoustic event from the predetermined set of acoustic events or a second acoustic event from the custom set of acoustic events is represented in the first audio data; and determining output data indicating that at least one of the first acoustic event or the second acoustic event occurred. 6. The computer-implemented method of claim 5 , wherein processing the first acoustic feature data using the first AED component comprises: processing the first acoustic feature data using a convolutional recurrent neural network (CRNN) to determine encoded representation data, wherein the CRNN is configured as an encoder associated with the first AED component to detect an acoustic event from the predetermined set of acoustic events; processing the encoded representation data using a classifier of the first AED component configured to detect occurrence of one or more of the predetermined set of acoustic events; and determining, based on processing by the classifier, that the first acoustic event is represented in the first audio data. 7. The computer-implemented method of claim 6 , further comprising: determining, using the first acoustic feature data and a feature normalization component associated with the first AED component, normalized feature data, wherein the feature normalization component is configured using audio samples corresponding to the predetermined set of acoustic events; and processing the normalized feature data using the CRNN. 8. The computer-implemented method of claim 5 , wherein processing the first acoustic feature data using the second AED component comprises: processing the first acoustic feature data using a CRNN to determine first encoded representation data, wherein the CRNN is configured as an encoder associated with the second AED component to detect an acoustic event from the custom set of acoustic events; processing the first encoded representation data with respect to stored custom event profile data associated with a user profile associated with the device; and determining, based on processing the first encoded representation data with respect to stored custom event profile data, that the second acoustic event is represented in the first audio data. 9. The computer-implemented method of claim 8 , further comprising: determining, using the first acoustic feature data and a feature normalization component associated with the second AED component, normalized feature data, wherein the feature normalization component is configured using audio samples corresponding to a plurality of acoustic events; and processing the normalized feature data using the CRNN. 10. The computer-implemented method of
Related publications grouped by family.
Answers are generated from the same data shown on this page.