Acoustic camera based audio visual scene analysis

US9736580B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9736580-B2
Application numberUS-201514662880-A
CountryUS
Kind codeB2
Filing dateMar 19, 2015
Priority dateMar 19, 2015
Publication dateAug 15, 2017
Grant dateAug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for scene analysis including the use of acoustic imaging and computer audio vision processes for monitoring applications. In some embodiments, an acoustic image device is utilized with a microphone array, image sensor, acoustic image controller, and a controller. In some cases, the controller analyzes at least a portion of the spatial spectrum within the acoustic image data to detect sound variations by identifying regions of pixels having intensities exceeding a particular threshold. In addition, the controller can detect two or more co-occurring sound events based on the relative distance between pixels with intensities exceeding the threshold. The resulting data fusion of image pixel data, audio sample data, and acoustic image data can be analyzed using computer audio vision, sound/voice recognition, and acoustic signature techniques to recognize/identify audio and visual features associated with the event and to empirically or theoretically determine one or more conditions causing each event.

First claim

Opening claim text (preview).

What is claimed is: 1. An acoustic monitoring system, comprising: an array of microphone devices; an acoustic image controller communicatively coupled to the array of microphone devices and configured to output acoustic image data based on a plurality of audio signals received from the array of microphone devices, the acoustic image data comprising a 2-dimensional grid of pixels wherein intensity of each pixel represents sound intensity from a unique angle of arrival; and a computer audio vision (CAV) controller communicatively coupled to the acoustic image controller and including an event recognition mode and configured to analyze at least a portion of the acoustic image data to detect one or more sound events within an observed scene, and to determine at least one condition causing the one or more sound events, generate, in response to detecting one or more sound events, a multi-dimensional event signature for each respective sound event, each multi-dimensional event signature includes at least a portion of the acoustic image data and a set of spatially filtered sound signals based on the plurality of audio signals, for each respective sound event of the one or more sound events, score the multi-dimensional event signature against one or more predefined event class models, and classify a condition causing at least one sound event of the one or more events based on the one or more scored event class models. 2. The system of claim 1 , wherein the CAV controller is further configured to correlate a position of the one or more sound events to a corresponding portion of image frames captured by a visual image sensor. 3. The system of claim 2 , wherein the CAV controller is further configured to: extract a first set of visual features from a correlated region of one or more acoustic image frames for each respective sound event of the one or more sound events; extract a second set of visual features from a correlated region of one or more image frames for each respective sound event of the one or more sound events; and extract audio features from the spatially filtered sound signals for each respective sound event of the one or more sound events. 4. The system of claim 1 , wherein each microphone device of the array of microphone devices comprises at least one of a unidirectional, a bi-directional, a shotgun, a contact and a parabolic microphone type. 5. The system of claim 1 , further comprising: a user interface configured to present sound event information in response to at least one sound event detected within the observed scene, wherein the user interface provides an augmented reality presentation such that sound event information is overlaid on to one or more visual images of the observed scene, and wherein the augmented reality presentation further comprises a semi-transparent acoustic heat map overlaid on to the one or more images of the observed scene. 6. The system of claim 5 , wherein the sound event information includes at least one of an object identifier, a user-defined label, and a geo-location identifier. 7. A system-on-chip (SOC) comprising the system of claim 1 . 8. A mobile computing device comprising the system of claim 1 , wherein the mobile computing device comprises a wearable device, a smartphone, a tablet, or a laptop computer. 9. At least one non-transitory computer program product encoded with instructions that when executed by one or more processors cause a process to be carried out, the process comprising: receiving a plurality of acoustic image frames and a plurality of spatially filtered sound signals from an acoustic imaging controller, the plurality of acoustic image frames and the plurality of spatially filtered sound signals representing a spatial spectrum of an observed scene, the acoustic image frame comprising a 2-dimensional grid of pixels wherein intensity of each pixel represents sound intensity from a unique angle of arrival; determining a position of one or more sound events within the plurality of acoustic image frames; generating, in response to determining the position of one or more sound events, a multi-dimensional event signature for each respective sound event, wherein each multi-dimensional event signature includes at least a portion of the acoustic image frames and a set of spatially filtered sound signals from the plurality of spatially filtered sound signals; for each respective sound event of the one or more sound events, scoring the multi-dimensional event signature against one or more predefined event class models; and classifying a condition causing at least one sound event of the one or more sound events based on the one or more scored event class models. 10. The computer program product of claim 9 , the process further comprising receiving a plurality of image frames representing the observed scene. 11. The computer program product of claim 9 , wherein the act of determining the position of one or more sound events further comprises utilizing a peak-picking algorithm on delta images, the delta images being generated from the plurality of acoustic image frames, wherein only those pixels within the delta images having pixel intensities exceeding a predefined threshold are registered as a sound event. 12. The computer program product of claim 11 , wherein the position for each sound event of the one or more sound events is correlated to a geometric region of those pixels of acoustic image data exceeding the predefined threshold. 13. The computer program product of claim 12 , the process further comprising correlating the position of the one or more sound events to a corresponding portion of image frames. 14. The computer program product of claim 13 , further comprising summing the set of spatially filtered sound signals for each respective sound event of the one or more sound events. 15. The computer program product of claim 14 , the process further comprising: extracting a first set of visual features from a correlated region of one or more acoustic image frames for each respective sound event of the one or more sound events; extracting a second set of visual features from a correlated region of one or more image frames for each respective sound event of the one or more sound events; and extracting audio features from the summed spatially filtered sound signals for each respective sound event of the one or more sound events. 16. The computer program product of claim 15 , wherein the generated multi-dimensional event signature for each respective sound event includes at least a portion of the first set of extracted visual features, a portion of the second set of extracted visual features, and a portion of the extracted audio features. 17. The computer program product of claim 1 , wherein the one or more predefined event class models each comprise a Gaussian Mixture Model (GMM). 18. A method for condition monitoring, the method comprising: receiving, by a processor, a plurality of acoustic image frames and a plurality of spatially filtered sound signals, the plurality of acoustic image frames and the plurality of spatially filtered sound signals representing a spatial spectrum of an observed scene, the acoustic image frame comprising a 2-dimensional grid of pixels wherein intensity of each pixel represents sound intensity from a unique angle of arrival; determining a position of one or more sound events within the plurality of acoustic image frames; generating, in response to determining the position of one or more sound events, a multi-dimensional event signature for each respective sou

Assignees

Inventors

Classifications

  • Cameras or camera modules comprising electronic image sensors; Control thereof · CPC title

  • Control of cameras or camera modules · CPC title

  • determining direction of source · CPC title

  • H04R3/005Primary

    for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • Casings; Cabinets {; Supports therefor;} Mountings therein (H04R1/28 takes precedence {; attachments for microphones H04R1/08; mounting of transducers in earpieces H04R1/1075}) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9736580B2 cover?
Techniques are disclosed for scene analysis including the use of acoustic imaging and computer audio vision processes for monitoring applications. In some embodiments, an acoustic image device is utilized with a microphone array, image sensor, acoustic image controller, and a controller. In some cases, the controller analyzes at least a portion of the spatial spectrum within the acoustic image …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification H04R3/005. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).