Audio event detection

US10803885B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10803885-B1
Application numberUS-201816023923-A
CountryUS
Kind codeB1
Filing dateJun 29, 2018
Priority dateJun 29, 2018
Publication dateOct 13, 2020
Grant dateOct 13, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An audio event detection system that processes audio data into audio feature data and processes the audio feature data using pre-configured candidate interval lengths to identify top candidate regions of the feature data that may include an audio event. The feature data from the top candidate regions are then scored by a classifier, where the score indicates a likelihood that the candidate region corresponds to a desired audio event. The scores are compared to a threshold, and if the threshold is satisfied, the top scoring candidate region is determined to include an audio event.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting an audio event, the method comprising: receiving audio data; processing the audio data using a recurrent trained model to determine audio feature data; determining a first portion of the audio feature data corresponding to a first time window, wherein the first time window corresponds to a first pre-configured length of time; processing the first portion using a second trained model to determine a first score and an adjusted first time window; determining a second portion of the audio feature data corresponding to a second time window, wherein the second time window: corresponds to a second pre-configured length of time longer than the first pre-configured length of time, and includes, and is longer than, the first time window; processing the second portion using the second trained model to determine a second score and an adjusted second time window; determining an adjusted first portion of the audio feature data corresponding to the adjusted first time window; processing the adjusted first portion using a third trained model to determine a third score corresponding to a likelihood that an audio event is represented in the adjusted first portion; determining an adjusted second portion of the audio feature data corresponding to the adjusted second time window; processing the adjusted second portion using the third trained model to determine a fourth score corresponding to a likelihood that the audio event is represented in the adjusted second portion; determining the third score is higher than the fourth score; and storing an indication that the audio event occurred during the adjusted first time window. 2. The method of claim 1 , further comprising: outputting, by the second trained model, first indicator data corresponding to the adjusted first time window; processing, by the third trained model, the first indicator data to determine the adjusted first portion; processing, by the third trained model, the adjusted first portion to determine a feature vector having a pre-established length; and processing the feature vector using at least one dense layer to determine the third score. 3. The method of claim 1 , further comprising: determining a third portion of the audio feature data corresponding to a third time window, wherein the third time window: is different from the first time window, and corresponds to the first pre-configured length of time; processing the third portion using the second trained model to determine a fifth score and an adjusted third time window; determining a fourth portion of the audio feature data corresponding to a fourth time window, wherein the fourth time window: corresponds to the second pre-configured length of time, and is longer than the third time window; and processing the fourth portion using the second trained model to determine a fifth score and an adjusted fourth time window. 4. A method comprising: receiving audio data; processing the audio data using a recurrent trained model to determine audio feature data; determining a first portion of the audio feature data corresponding to a first time window; processing the first portion using a first model to determine a first score and an adjusted first time window; determining an adjusted first portion of the audio feature data corresponding to the adjusted first time window; and processing the adjusted first portion using a second model to determine a second score corresponding to a likelihood that an audio event is represented in the adjusted first portion. 5. The method of claim 4 , wherein: the first model is configured to determine respective scores corresponding to segments of feature data for a plurality of pre-configured lengths of time including at least a first length of time and a second length of time; and the first time window corresponds to the first length of time. 6. The method of claim 4 , wherein the first model comprises at least a first layer configured to determine a plurality of values corresponding to the first portion of the audio feature data and a second layer configured to output the first score. 7. The method of claim 5 , further comprising: determining a second portion of the audio feature data corresponding to a second time window, wherein the second time window: corresponds to the second length of time, and includes, and is longer than, the first time window; processing the second portion using the first model to determine a third score and an adjusted second time window; and determining that the third score is less than the first score. 8. The method of claim 5 , further comprising: determining a second portion of the audio feature data corresponding to a second time window, wherein the second time window: is different from the first time window, and corresponds to the first length of time; processing the second portion using the first model to determine a third score and an adjusted second time window; determining a third portion of the audio feature data corresponding to a third time window, wherein the third time window: corresponds to the second length of time, and includes, and is longer than, the second time window; and processing the third portion using the first model to determine a fourth score and an adjusted third time window. 9. The method of claim 8 , further comprising: determining an adjusted second portion of the audio feature data corresponding to the adjusted second time window; processing the adjusted second portion using the second model to determine a fifth score corresponding to a likelihood that the audio event is represented in the adjusted second time window; determining an adjusted third portion of the audio feature data corresponding to the adjusted third time window; processing the adjusted third portion using the second model to determine a sixth score corresponding to a likelihood that the audio event is represented in the adjusted third time window; and determining that the second score is greater than the fifth score and the sixth score. 10. The method of claim 4 , further comprising: receiving log filter bank energy data representing the audio data; and processing the log filter bank energy data using the recurrent trained model to determine the audio feature data. 11. The method of claim 4 , wherein determining the first portion of the audio feature data comprises: determining a first plurality of audio feature data values corresponding to a center of the first time window; determining a second plurality of audio feature data values, the second plurality corresponding to a first portion of the first time window prior to the center; determining a third plurality of audio feature data values, the second plurality corresponding to a second portion of the first time window subsequent to the center; and including, in the first portion of audio feature data, the first plurality of audio feature data values, the second plurality of audio feature data values, and the third plurality of audio feature data values. 12. The method of claim 4 , wherein the second model comprises a classifier including at least a first layer configured to combine the adjusted first portion of the audio feature data and a second layer, subsequent to the first layer, configured to output the second score. 13. The method of claim 4 , further comprising: determining the second score is above a threshold; and causing an action to be performed in response to the second score being above the threshold. 14. The method of claim 13 , wherein the action comprises storing an indicati

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10803885B1 cover?
An audio event detection system that processes audio data into audio feature data and processes the audio feature data using pre-configured candidate interval lengths to identify top candidate regions of the feature data that may include an audio event. The feature data from the top candidate regions are then scored by a classifier, where the score indicates a likelihood that the candidate regi…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 13 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).