System and method for cluster-based audio event detection

US10867621B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10867621-B2
Application numberUS-201816200283-A
CountryUS
Kind codeB2
Filing dateNov 26, 2018
Priority dateJun 28, 2016
Publication dateDec 15, 2020
Grant dateDec 15, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for audio event detection, comprising: partitioning, by a computer, an audio signal into a plurality of audio frames; clustering, by the computer, the plurality of audio frames into a plurality of clusters containing audio frames having similar features, wherein the plurality of clusters include at least one multi-class cluster; and detecting, by the computer utilizing a supervised classifier of a plurality of supervised classifiers, an audio event in the at least one multi-class cluster of the plurality of clusters, wherein at least one supervised classifier is a supervised multi-class classifier trained on multi-class training clusters. 2. The computer-implemented method of claim 1 , further comprising utilizing, by the computer, K-means to identify an initial partition of the audio signal from the plurality of audio frames. 3. The computer-implemented method of claim 1 , wherein the computer; utilizes at least one Gaussian mixture model to cluster the plurality of audio frames to the plurality of clusters. 4. The computer-implemented method of claim 1 , further comprising: extracting, by the computer, an i-vector for the at least one multi-class cluster; and detecting, by the computer, the audio event in the at least one multi-class cluster based upon the extracted i-vector. 5. The computer-implemented method of claim 1 , wherein the supervised classifier utilizes probabilistic linear discriminant analysis. 6. The computer-implemented method of claim 1 , wherein the supervised classifier utilizes a support vector machine. 7. The computer-implemented method of claim 1 , wherein the supervised classifier utilizes a Gaussian mixture model. 8. The computer-implemented method of claim 1 , further comprising: generating, by the computer, a plurality of segments from the audio signal using generalized likelihood ratio and Bayesian information criterion. 9. The computer-implemented method of claim 8 , further comprising: detecting, by the computer, a set of candidates for segment boundaries utilizing the general likelihood ratio; and filtering out, by the computer, at least one of the candidates utilizing the Bayesian information criterion. 10. The computer-implemented method of claim 8 , further comprising: clustering, by the computer, the plurality of segments utilizing hierarchical agglomerative clustering. 11. A system comprising: a non-transitory storage medium storing a plurality of computer program instructions; a processor electrically coupled to the non-transitory storage medium and configured to execute the plurality of computer program instructions to: partition an audio signal into a plurality of audio frames; cluster the plurality of audio frames into a plurality of clusters containing audio frames having similar features, wherein the plurality of clusters include at least one multi-class cluster; and detect utilizing a supervised classifier of a plurality of classifiers, an audio event in the at least one multi-class cluster of the plurality of clusters, wherein at least one supervised classifier is a supervised multi-class classifier trained on multi-class training clusters. 12. The system of claim 11 , wherein the computer utilizes K-means to identify an initial partition of the audio signal from the plurality of audio frames. 13. The system of claim 11 , wherein the computer utilizes at least one Gaussian mixture model to cluster the plurality of audio frames to the plurality of clusters. 14. The system of claim 11 , wherein the processor is configured to further execute the plurality of computer program instructions to: extract an i-vector for the at least one multi-class cluster; and detect the audio event in the at least one multi-class cluster based upon the extracted i-vector. 15. The system of claim 11 , wherein the supervised classifier utilizes probabilistic linear discriminant analysis. 16. The system of claim 11 , wherein the supervised classifier utilizes a support vector machine. 17. The system of claim 11 , wherein the supervised classifier utilizes a Gaussian mixture model. 18. The system of claim 11 , wherein the processor is configured to further execute the plurality of computer program instructions to: generate a plurality of segments from the audio signal using generalized likelihood ratio and Bayesian information criterion. 19. The system of claim 18 , wherein the processor is configured to further execute the plurality of computer program instructions to: detect a set of candidates for segment boundaries utilizing the general likelihood ratio; and filter out at least one of the candidates utilizing the Bayesian information criterion. 20. The system of claim 18 , wherein the processor is configured to further execute the plurality of computer program instructions to: cluster the plurality of segments utilizing hierarchical agglomerative clustering.

Assignees

Inventors

Classifications

  • G10L25/51Primary

    for comparison or discrimination · CPC title

  • characterised by the analysis technique · CPC title

  • G10L25/45Primary

    characterised by the type of analysis window · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10867621B2 cover?
Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by…
Who is the assignee on this patent?
Pindrop Security Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 15 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).