What technology area does this patent fall under?

Primary CPC classification G06F40/51. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for anomaly detection in unlabeled collections of audio recording

US12554943B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12554943-B2
Application number	US-202318222409-A
Country	US
Kind code	B2
Filing date	Jul 14, 2023
Priority date	Jul 14, 2023
Publication date	Feb 17, 2026
Grant date	Feb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some implementations, the device may include receiving a first and second audio dataset. In addition, the device may generate a first, a second, a third, and a fourth audio sample. Moreover, the device may include determining a level of similarity between the first and second audio samples. Also, the device may include combining the first and second audio samples into an audio pair. Further, the device may include training a machine learning model to map audio samples to a latent space visualization in view of time and the similarities between the first and second audio samples to yield a trained machine learning model. In addition, the device may include mapping, by the machine learning model, in the latent space visualization, the third and fourth audio samples, where placement of the third and fourth audio samples depends on the level of similarity between the third and fourth audio samples.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method for detecting an anomaly in audio data, comprising: receiving, from a microphone, a first audio dataset and a second audio dataset; generating, based on the first audio dataset, a first audio sample and a second audio sample, where each of the first and second audio samples are smaller than the first audio dataset and the first audio sample is distinct from the second audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms of the first audio sample and second audio sample; generating, based on the second audio dataset, a third audio sample and a fourth audio sample, where each of the third and fourth audio samples are smaller than the second audio dataset and the third audio sample is distinct from the fourth audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms; determining a level of similarity between the first audio sample and the second audio sample utilizing a pretext classifier that includes a neural network configured to predict a probability of one of the audio samples belonging to an associated cluster in response to identifying an anomaly score; combining the first audio sample and the second audio sample into an audio pair in response to the level of similarity between the first audio sample and the second audio sample being above a first predetermined threshold; training a machine learning model, based on the audio pair, to map audio samples to a latent space visualization in view of time and the similarities between the first audio sample and the second audio sample to yield a trained machine learning model; and mapping, by the trained machine learning model, in the latent space visualization, the third audio sample and the fourth audio sample where placement of the third audio sample and the fourth audio sample depends on the level of similarity of the third audio sample and the fourth audio sample, as determined by the trained machine learning model. 2 . A computer-implemented method of claim 1 , comprising: labeling the third audio sample and the fourth audio sample based on a number of clusters in the latent space visualization. 3 . A computer-implemented method of claim 2 , comprising: receiving a fifth audio sample; generating a probability score for the fifth audio sample wherein the probability score indicates a probability that the fifth audio sample is associated with the cluster; comparing the probability score with a second predetermined threshold; and associating the fifth audio sample with the cluster in response to the probability score being greater than the second predetermined threshold. 4 . The computer-implemented method of claim 1 , wherein the first audio sample and the second audio sample do not overlap in view of the first audio dataset. 5 . The computer-implemented method of claim 1 , wherein the training of the machine learning model is performed via a self-supervised contrastive learning objectives. 6 . The computer-implemented method of claim 1 , wherein the first audio dataset and the second audio dataset do not include human annotations. 7 . The computer-implemented method of claim 1 , wherein the mapping of the third audio sample and the fourth audio sample in the latent space visualization creates a cluster and the method further comprising: determining a shared attribute between the third audio sample and the fourth audio sample; and labeling the cluster based on the shared attributes of the third audio sample and the fourth audio sample. 8 . A system for detecting an anomaly in audio data comprising: one or more processors configured to: receive, from a microphone, a first audio dataset and a second audio dataset; generate, based on the first audio dataset, a first audio sample and a second audio sample, where each of the first and second audio samples are smaller than the first audio dataset and the first audio sample is distinct from the second audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms of the first audio sample and second audio sample; generate, based on the second audio dataset, a third audio sample and a fourth audio sample, where each of the third and fourth audio samples are smaller than the second audio dataset and the third audio sample is distinct from the fourth audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms; determine a level of similarity between the first audio sample and the second audio sample utilizing a pretext classifier that includes a neural network configured to predict a probability of one of the audio samples belonging to an associated cluster in response to identifying an anomaly score; combine the first audio sample and the second audio sample into an audio pair in response to the level of similarity between the first audio sample and the second audio sample being above a first predetermined threshold; train a machine learning model, based on the audio pair, to map audio samples to a latent space visualization in view of time and the similarities between the first audio sample and the second audio sample to yield a trained machine learning model; and map, by the trained machine learning model, in the latent space visualization, the third audio sample and the fourth audio sample where placement of the third audio sample and the fourth audio sample depends on the level of similarity of the third audio sample and the fourth audio sample, as determined by the trained machine learning model. 9 . The system of claim 8 , wherein the mapping of the third audio sample and the fourth audio sample in the latent space visualization creates a cluster and the one or more processors, are further configured to: label the third audio sample and the fourth audio sample based on a number of clusters in the latent space visualization. 10 . The system of claim 9 , wherein the one or more processors are further configured to: receive a fifth audio sample; generate a probability score for the fifth audio sample wherein the probability score indicates a probability that the fifth audio sample is associated with the cluster; compare the probability score with a second predetermined threshold; and associate the fifth audio sample with the cluster in response to the probability score being greater than the second predetermined threshold. 11 . The system of claim 8 , wherein the first audio sample and the second audio sample do not overlap in view of the first audio dataset. 12 . The system of claim 8 , wherein the training of the machine learning model is performed via a self-supervised contrastive learning objectives. 13 . The system of claim 8 , wherein the first audio dataset and the second audio dataset do not include human annotations. 14 . The system of claim 8 , wherein the mapping of the third audio sample and the fourth audio sample in the latent space visualization creates a cluster and the one or more processors, are further configured to: determine a shared attribute between the third audio sample and the fourth audio sample; and label the cluster based on the shared attributes of the third audio sample and the fourth audio sample. 15 . A non-transitory computer-readable medium storing a set of instructions for detecting an anomaly in audio data, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the device to: receive, from a microphone

Assignees

Bosch Gmbh Robert

Inventors

Classifications

G10L25/30
using neural networks · CPC title
G06F40/51Primary
Translation evaluation · CPC title
G10L25/51Primary
for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

View patent family 93930581

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12554943B2 cover?: In some implementations, the device may include receiving a first and second audio dataset. In addition, the device may generate a first, a second, a third, and a fourth audio sample. Moreover, the device may include determining a level of similarity between the first and second audio samples. Also, the device may include combining the first and second audio samples into an audio pair. Further,…
Who is the assignee on this patent?: Bosch Gmbh Robert
What technology area does this patent fall under?: Primary CPC classification G06F40/51. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and systems for respiratory sound classification

Determining audio and video representations using self-supervised learning

Systems and methods for separating and identifying audio in an audio file using machine learning

Automated sound matching within an audio recording

System and methods for automatically mixing audio for acoustic scenes

Representation learning from video with spatial audio

Unsupervised Learning of Semantic Audio Representations

Frequently asked questions