Sound event detecting apparatus and operation method thereof
US-2016150338-A1 · May 26, 2016 · US
US9668073B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9668073-B2 |
| Application number | US-201514877680-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 7, 2015 |
| Priority date | Oct 7, 2015 |
| Publication date | May 30, 2017 |
| Grant date | May 30, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of operating an audio monitoring system includes generating with a sound sensor audio data corresponding to a sound event generated by an object in a scene around the sound sensor, identifying with a processor a type and action of the object in the scene that generated the sound with reference to the audio data, generating with the processor a timestamp corresponding to a time of the detection of the sound event, and updating a scene state model corresponding to sound events generated by a plurality of objects in the scene with reference to the identified type of object, action taken by the object, and the timestamp. The method further includes identifying a sound event in the scene with reference to the scene state model and a predetermined scene grammar stored in a memory, and generating with the processor an output corresponding to the sound event.
Opening claim text (preview).
What is claimed: 1. A method of training an audio monitoring system comprising: receiving with a processor in the audio monitoring system first registration information for a first object in a first scene around a sound sensor in the audio monitoring system; training with the processor a first classifier for a first predetermined action of the first object in the first scene, the first predetermined action generating sound detected by the sound sensor; receiving with the processor second registration information for a second object in the first scene around the sound sensor; training with the processor a second classifier for a second predetermined action of the second object in the first scene, the second predetermined action generating sound detected by the sound sensor; receiving with the processor object relationship data corresponding to a relationship between the first object and the second object in the first scene; generating with the processor a specific scene grammar including a first sound event formed from with reference to a predetermined general scene grammar stored in a memory, the first registration information, the second registration information, and the object relationship data; and storing with the processor the specific scene grammar in the memory in association with the first classifier and the second classifier for identification of a subsequent occurrence of the first sound event including the first predetermined action of the first object and the second predetermined action of the second object. 2. The method of claim 1 , the training of the first classifier further comprising: generating with a sound sensor in the audio monitoring system first audio data corresponding to a first predetermined action of the first object; extracting with the processor a first plurality of features from the first audio data; generating with the processor a first classifier corresponding to the predetermined sound event from the first object with reference to the first plurality of features; and storing with the processor the first classifier in the memory on association with the first predetermined action of the first object and the specific scene grammar. 3. The method of claim 2 , the extracting of the first plurality of features further comprising: extracting with the processor at least one of a mel spectrogram, mel-frequency cepstrum (MFCC), delta, and chroma feature from the audio data. 4. The method of claim 1 further comprising: receiving with the processor a relationship identifier indicating presence of the first object and the second object within the first scene; and generating with the processor the specific scene grammar including the first sound event with reference to the first predetermined action and the second predetermined action. 5. The method of claim 1 , the receiving of the object relationship data further comprising: receiving with the processor a relationship identifier indicating a functional relationship including data specifying a temporal order of the first predetermined action and the second predetermined action; and generating with the processor the specific scene grammar including the first sound event with reference to the temporal order between the first predetermined action and the second predetermined action. 6. The method of claim 1 , the generation of the specific scene grammar further comprising: retrieving with the processor a predetermined general scene grammar from the memory, the predetermined general scene grammar including a plurality of sound events corresponding to actions performed by a plurality of objects; identifying with the processor one sound event in the plurality of sound events in the predetermined general scene grammar including objects corresponding to the first object and the second object with reference to the first registration information and the second registration information; and generating with the processor the specific scene grammar including the one event identified in the predetermined general scene grammar. 7. The method of claim 1 further comprising: generating with the processor a hierarchical scene grammar including the specific scene grammar corresponding to the first scene and at least one other specific scene grammar corresponding to a second scene; and storing with the processor the hierarchical scene grammar in the memory with a relationship between the specific scene grammar of the first scene and the specific scene grammar of the second scene for identification of another sound event corresponding to sounds from object actions that occur in both the first scene and the second scene. 8. A method of operating an audio monitoring system comprising: generating with a sound sensor audio data corresponding to sound produced by an action performed by an object in a first scene around the sound sensor; identifying with a processor a type of object in the first scene that generated the sound with reference to the audio data; identifying with the processor the action taken by the object to generate a sound event with reference to the audio data; generating with the processor a timestamp corresponding to a time of the detection of the sound; updating with the processor a scene state model corresponding to a plurality of sound events generated by a plurality of objects in the first scene around the sound sensor with reference to the identified type of object, action taken by the object, and the timestamp; identifying with the processor one sound event in the plurality of sound events for the first scene with reference to the first scene state model and a predetermined scene grammar stored in a memory; and generating with the processor an output corresponding to the one sound event. 9. The method of claim 8 further comprising: filtering with the processor audio data corresponding to a human voice from the audio data received from the sound sensor prior to identification of the type of object in the first scene that generated the sound. 10. The method of claim 8 , the identification of the type of object and action taken by the object further comprising: selecting with the processor at least one classifier from a plurality of classifiers stored in the memory, the first classifier being selected with reference to the first scene state model for the first scene prior to updating the first scene state model and the predetermined scene grammar to select the at least one classifier corresponding to an expected object action for the one sound event in the predetermined scene grammar; and applying with the processor the at least one classifier to identify the type of object and the action taken by the object based on a result from the at least one classifier that produces a highest confidence score. 11. The method of claim 8 further comprising: identifying with the processor that the first scene state model does not correspond to any sound event in the plurality of sound events in the first scene grammar; and generating with the processor an output indicating an anomaly in the first scene. 12. The method of claim 11 , the generation of the output further comprising: transmitting with the processor a message including the identified type of object, action taken by the object, timestamp, and a copy of the audio data to a monitoring service. 13. An audio monitoring system comprising: a sound sensor configured to generate audio data corresponding to sound produced by an action performed by an object in a first scene around the sound sensor; an output device; and a processor operatively connected to the sound sensor, the output device, and a
Monitoring arrangements; Testing arrangements {(for hearing aids H04R25/30; detection of loudspeaker connection H04R5/04; sound-field adaptation dependent on speaker detection H04S7/308)} · CPC title
for microphones (H04R29/007 takes precedence) · CPC title
for comparison or discrimination · CPC title
using electric transmission {; transformation of alarm signals to electrical signals from a different medium, e.g. transmission of an electric alarm signal upon detection of an audible alarm signal} · CPC title
Presence detectors to detect unsafe condition, e.g. infrared sensor, microphone (G08B21/0476 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.