Theme detection for object-recognition-based notifications
US-12183330-B2 · Dec 31, 2024 · US
US2023086355A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023086355-A1 |
| Application number | US-202117478916-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 19, 2021 |
| Priority date | Sep 19, 2021 |
| Publication date | Mar 23, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.
Opening claim text (preview).
Claimed is: 1 . An audio processing system for detecting anomalous sound, comprising: at least one processor; and memory having instructions stored thereon that, when executed by the at least one processor, cause the system to: receive a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram, wherein a value of each element of the spectrogram is identified by a coordinate in the time-frequency domain; partition the time-frequency domain of the spectrogram into a context region and a target region; submit values of elements of the context region and coordinates of the elements of the context region into a neural network including an attentive neural process architecture to recover values of the spectrogram for elements with coordinates in the target region; determine an anomaly score for detecting the anomalous sound of the audio signal based on a comparison of the recovered values of the elements of the target region and values of elements of the partitioned target region; and perform a control action based on the anomaly score. 2 . The audio processing system of claim 1 , wherein the at least one processor is configured to: partition the spectrogram into different combinations of context regions and target regions to produce a set of context regions and a corresponding set of target regions; execute the neural network multiple times, once for each context region in the set of context regions to produce a set of recovered target regions; compare each recovered target region in the set of recovered target regions with the corresponding target region of the set of target regions to determine a set of anomaly scores; and determine the anomaly score based on a pooling operation on the set of anomaly scores. 3 . The audio processing system of claim 2 , wherein the context region is a first context region, the target region is a first target region, and the anomaly score is a first anomaly score, and wherein the processor is configured to: identify a second partition of the time-frequency domain based on the first anomaly score; perform the second partition of the spectrogram into a second context region and a second target region; repeat the execution of the neural network with values and coordinates of the second context region to recover the second target region and to produce a second anomaly score based on a comparison of the recovered second target region and the partitioned second target region; and perform a second control action based on the second anomaly score, a combination of the first anomaly score and the second anomaly score, or both. 4 . The audio processing system of claim 1 , wherein the neural network is trained by randomly or pseudo-randomly selecting different partitions of training spectrograms into context and target regions, and wherein during execution of the neural network, the processor is configured to produce multiple partitions of the spectrogram and corresponding anomaly scores according to a predetermined protocol to perform the control action based on a maximum anomaly score. 5 . The audio processing system of claim 1 , wherein the at least one processor is further configured to: create a library of anomalous spectrograms based on known anomalous behavior; identify difficult-to-predict target regions using the library of anomalous spectrograms; and utilize the identified target regions as one or multiple hypotheses to detect the maximum anomaly score. 6 . The audio processing system of claim 5 , wherein the at least one processor is configured to test the one or multiple hypotheses to determine the target region with the maximum anomaly score, wherein the one or multiple hypotheses include: a middle frame hypothesis procedure aiming to recover a temporal middle portion of the spectrogram from side portions of the spectrogram sandwiching a middle portion of a frame of the spectrogram from opposite sides of the frame, a frequency masking hypothesis procedure aiming to recover certain frequency regions of the spectrogram from unmasked surrounding regions of the spectrogram, wherein the recovery of the certain frequency regions corresponds to at least reconstructing high frequencies of the spectrogram from low frequencies of the spectrogram, or reconstructing the low frequencies from the high frequencies of the spectrogram; a frequency masking hypothesis procedure aiming to recover individual frequency band from neighboring and/or harmonically related frequency bands of the spectrogram; an energy based hypothesis procedure aiming to recover high energy time frequency units of the spectrogram from remaining unmasked time-frequency units of the spectrogram; a procedure aiming to recover a randomly selected subset of masked frequency bands and time frames from the unmasked remaining regions of the spectrogram; a likelihood bootstrapping procedure that performs multiple passes with different context regions of the spectrogram determined by first sampling different percentages of time-frequency units as context of the spectrogram and reconstructing entire spectrogram, wherein time-frequency regions of the reconstructed spectrogram with high reconstruction likelihood are determined and reconstructing time-frequency regions with low reconstruction likelihood using the time-frequency regions of the reconstructed spectrogram with the high reconstruction likelihood as context; an ensembling procedure where multiple of the above hypothesis generation procedures is combined to find the maximum anomaly score. 7 . The audio processing system of claim 1 , wherein the attentive neural process architecture comprises: an encoder neural network trained to receive an input set of arbitrary size, the input set corresponds to the values and coordinates of elements of the context region, and the encoder produces an embedding vector for each element of the input set; a cross attention module trained to compute a unique embedding vector for each element of the target region by attending to the embedding vectors of the elements of the context region at neighboring coordinates; and a decoder neural network that outputs a probability distribution for each element of the target region based on the target region coordinates and the unique embedding vector for that target region element. 8 . The audio processing system of claim 7 , wherein the encoder neural network uses a self-attention mechanism to jointly encode all elements of the context region. 9 . The audio processing system of claim 7 , wherein the cross attention module uses a multi-head attention. 10 . The audio processing system of claim 7 , wherein the decoder neural network outputs at least one of: multiple parameters of a conditionally independent Gaussian distribution and multiple parameters of a conditionally independent mixture of Gaussian distributions. 11 . The audio processing system of claim 1 , wherein the at least one processor is configured to implement a sliding window on the spectrogram, the sliding window is processed by the neural network using the attentive neural network architecture to determine the anomaly score for detecting the anomalous sound. 12 . A computer-implemented method for detecting anomalous sound, comprising: receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain, wherein a value of each element of the spectrogram is identified by a coordinate in the time-frequency domain; partitioning the time-frequency domain of the spectrogram into a context region and a target region; submitting values of elements of the context region and coordinat
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
using neural networks · CPC title
by displaying time domain information · CPC title
Learning methods · CPC title
by displaying frequency domain information · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.