Method for siren detection based on audio samples

US10140998B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10140998-B2
Application numberUS-201715718524-A
CountryUS
Kind codeB2
Filing dateSep 28, 2017
Priority dateDec 3, 2013
Publication dateNov 27, 2018
Grant dateNov 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: an audio unit configured to receive an audio signal; a control unit configured to operate the apparatus; and a processing unit configured to: process the audio signal from the audio unit to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time; determine a likelihood that the first windowed audio sample comprises a siren signal based on a detection of a group of features in the first windowed audio sample associated with a siren-classification profile; based on the first windowed audio sample indicating a likelihood of a siren signal below a threshold, determine a likelihood that the second windowed audio sample includes a siren signal based on a detection of a group of features of the second windowed audio sample with the siren-classification profile; and alter control of the apparatus by the control unit based on the likelihood of at least one of the first windowed audio sample and the second windowed audio sample including a siren signal being above the threshold. 2. The apparatus of claim 1 , wherein one or more of the group of features in the first windowed audio sample and the group of features in the second windowed audio sample is determined based on at least a linear classifier. 3. The apparatus of claim 1 , wherein the processing unit is further configured to determine the likelihood using a linear classifier by analyzing the group of features of a respective audio signal and wherein each group of features further comprises at least one of a monotonicity estimation associated with a reference siren signal, mel-frequency cepstrum coefficients (MFCCs) associated with the reference siren signal, and a spectral energy concentration estimation associated with the reference siren signal. 4. The apparatus of claim 1 , wherein the audio unit is further configured to periodically receive the audio signal. 5. The apparatus of claim 1 , wherein the processing unit is further configured to: determine a fingerprint-based likelihood that the first windowed audio sample comprises a siren signal based on a comparison of the first windowed audio sample with a group of audio fingerprints, wherein the group of audio fingerprints comprises at least one audio fingerprint of a siren signal; and based on the first windowed audio sample indicating a fingerprint-based likelihood of a siren signal below the threshold, determine a fingerprint-based likelihood that the second windowed audio sample comprises a siren signal based on a comparison of the second windowed audio sample with the group of audio fingerprints. 6. The apparatus of claim 1 , further comprising a communication unit, wherein the communication unit is configured to receive the siren-classification profile from a remote system. 7. The apparatus of claim 1 , further comprising an input device, wherein the input device is configured to receive an input, wherein the input comprises an override indication to provide an indication of a false siren detection. 8. The apparatus of claim 7 , wherein the processing unit is further configured to adjust the siren-classification profile based on the input comprising the override indication. 9. A method comprising: receiving an audio signal with an audio unit; processing, with a processor, the audio signal from the audio unit to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time; determining a likelihood that the first windowed audio sample comprises a siren signal based on the detection of a group of features of the first windowed audio sample; based on the first windowed audio sample indicating a likelihood of the first windowed audio sample including a siren signal below a threshold, determining a likelihood that the second windowed audio sample comprises a siren signal based on the detection of a group of features of the second windowed audio sample; and providing instructions to control an apparatus based on the likelihood of at least one of the first windowed audio sample and the second windowed audio sample including a siren signal being above the threshold. 10. The method of claim 9 , wherein one or more of the group of features in the first windowed audio sample and the group of features in the second windowed audio sample is determined based on at least a linear classifier. 11. The method of claim 9 , wherein the processor is configured to determine the likelihood using a linear classifier by analyzing the group of features of a respective audio signal and wherein each group of features further comprises at least one of a monotonicity estimation associated with a reference siren signal, mel-frequency cepstrum coefficients (MFCCs) associated with the reference siren signal, and a spectral energy concentration estimation associated with the reference siren signal. 12. The method of claim 9 , wherein receiving an audio signal with an audio unit comprises periodically receiving the audio signal. 13. The method of claim 9 , further comprising: determining a fingerprint-based likelihood that the first windowed audio sample comprises a siren signal based on a comparison of the first windowed audio sample with a group of audio fingerprints, wherein the group of audio fingerprints comprises at least one audio fingerprint of a siren signal; and based on first windowed audio sample indicating a fingerprint-based likelihood of a siren signal below the threshold, determining a fingerprint-based likelihood that the second windowed audio sample comprises a siren signal based on a comparison of the second windowed audio sample with the group of audio fingerprints. 14. The method of claim 9 , further comprising receiving a siren-classification profile from a remote system, wherein the detection of a group of features of the first windowed audio sample is based on the siren-classification profile. 15. The method of claim 14 , further comprising receiving an input, wherein the input comprises an override indication to provide an indication of a false siren detection. 16. The method of claim 15 , further comprising adjusting the siren-classification profile based on the input comprising the override indication. 17. A non-transitory computer-readable medium having stored thereon program instructions that when executed by a computing system that includes at least one processor cause the computing system to perform operations comprising: receiving an audio signal; processing the audio signal to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample has a first length of time and the second windowed audio sample has a second length of time longer than the first length of time; determining a likelihood that the first windowed audio sample comprises a siren signal based on the detection of a group of features of the first windowed audio sample; based on the first windowed audio sample indicating a low likelihood of the first windowed audio sample including a siren signal, determining a likelihood that the second windowed audio sample comprises a siren signal based on the detection of a group of features of the second windowed audio sample; and providing ins

Assignees

Inventors

Classifications

  • Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring · CPC title

  • Electricity · mapped topic

  • using electric transmission; using electromagnetic transmission · CPC title

  • H04R29/00Primary

    Monitoring arrangements; Testing arrangements {(for hearing aids H04R25/30; detection of loudspeaker connection H04R5/04; sound-field adaptation dependent on speaker detection H04S7/308)} · CPC title

  • G10L19/06Primary

    Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10140998B2 cover?
The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indica…
Who is the assignee on this patent?
Waymo Llc
What technology area does this patent fall under?
Primary CPC classification H04R29/00. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).