Method and systems for respiratory sound classification
US-2024341715-A1 · Oct 17, 2024 · US
US12554943B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12554943-B2 |
| Application number | US-202318222409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 14, 2023 |
| Priority date | Jul 14, 2023 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In some implementations, the device may include receiving a first and second audio dataset. In addition, the device may generate a first, a second, a third, and a fourth audio sample. Moreover, the device may include determining a level of similarity between the first and second audio samples. Also, the device may include combining the first and second audio samples into an audio pair. Further, the device may include training a machine learning model to map audio samples to a latent space visualization in view of time and the similarities between the first and second audio samples to yield a trained machine learning model. In addition, the device may include mapping, by the machine learning model, in the latent space visualization, the third and fourth audio samples, where placement of the third and fourth audio samples depends on the level of similarity between the third and fourth audio samples.
Opening claim text (preview).
The invention claimed is: 1 . A computer-implemented method for detecting an anomaly in audio data, comprising: receiving, from a microphone, a first audio dataset and a second audio dataset; generating, based on the first audio dataset, a first audio sample and a second audio sample, where each of the first and second audio samples are smaller than the first audio dataset and the first audio sample is distinct from the second audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms of the first audio sample and second audio sample; generating, based on the second audio dataset, a third audio sample and a fourth audio sample, where each of the third and fourth audio samples are smaller than the second audio dataset and the third audio sample is distinct from the fourth audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms; determining a level of similarity between the first audio sample and the second audio sample utilizing a pretext classifier that includes a neural network configured to predict a probability of one of the audio samples belonging to an associated cluster in response to identifying an anomaly score; combining the first audio sample and the second audio sample into an audio pair in response to the level of similarity between the first audio sample and the second audio sample being above a first predetermined threshold; training a machine learning model, based on the audio pair, to map audio samples to a latent space visualization in view of time and the similarities between the first audio sample and the second audio sample to yield a trained machine learning model; and mapping, by the trained machine learning model, in the latent space visualization, the third audio sample and the fourth audio sample where placement of the third audio sample and the fourth audio sample depends on the level of similarity of the third audio sample and the fourth audio sample, as determined by the trained machine learning model. 2 . A computer-implemented method of claim 1 , comprising: labeling the third audio sample and the fourth audio sample based on a number of clusters in the latent space visualization. 3 . A computer-implemented method of claim 2 , comprising: receiving a fifth audio sample; generating a probability score for the fifth audio sample wherein the probability score indicates a probability that the fifth audio sample is associated with the cluster; comparing the probability score with a second predetermined threshold; and associating the fifth audio sample with the cluster in response to the probability score being greater than the second predetermined threshold. 4 . The computer-implemented method of claim 1 , wherein the first audio sample and the second audio sample do not overlap in view of the first audio dataset. 5 . The computer-implemented method of claim 1 , wherein the training of the machine learning model is performed via a self-supervised contrastive learning objectives. 6 . The computer-implemented method of claim 1 , wherein the first audio dataset and the second audio dataset do not include human annotations. 7 . The computer-implemented method of claim 1 , wherein the mapping of the third audio sample and the fourth audio sample in the latent space visualization creates a cluster and the method further comprising: determining a shared attribute between the third audio sample and the fourth audio sample; and labeling the cluster based on the shared attributes of the third audio sample and the fourth audio sample. 8 . A system for detecting an anomaly in audio data comprising: one or more processors configured to: receive, from a microphone, a first audio dataset and a second audio dataset; generate, based on the first audio dataset, a first audio sample and a second audio sample, where each of the first and second audio samples are smaller than the first audio dataset and the first audio sample is distinct from the second audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms of the first audio sample and second audio sample; generate, based on the second audio dataset, a third audio sample and a fourth audio sample, where each of the third and fourth audio samples are smaller than the second audio dataset and the third audio sample is distinct from the fourth audio sample, wherein the generating is conducted by sampling in a time domain by segmenting raw waveforms; determine a level of similarity between the first audio sample and the second audio sample utilizing a pretext classifier that includes a neural network configured to predict a probability of one of the audio samples belonging to an associated cluster in response to identifying an anomaly score; combine the first audio sample and the second audio sample into an audio pair in response to the level of similarity between the first audio sample and the second audio sample being above a first predetermined threshold; train a machine learning model, based on the audio pair, to map audio samples to a latent space visualization in view of time and the similarities between the first audio sample and the second audio sample to yield a trained machine learning model; and map, by the trained machine learning model, in the latent space visualization, the third audio sample and the fourth audio sample where placement of the third audio sample and the fourth audio sample depends on the level of similarity of the third audio sample and the fourth audio sample, as determined by the trained machine learning model. 9 . The system of claim 8 , wherein the mapping of the third audio sample and the fourth audio sample in the latent space visualization creates a cluster and the one or more processors, are further configured to: label the third audio sample and the fourth audio sample based on a number of clusters in the latent space visualization. 10 . The system of claim 9 , wherein the one or more processors are further configured to: receive a fifth audio sample; generate a probability score for the fifth audio sample wherein the probability score indicates a probability that the fifth audio sample is associated with the cluster; compare the probability score with a second predetermined threshold; and associate the fifth audio sample with the cluster in response to the probability score being greater than the second predetermined threshold. 11 . The system of claim 8 , wherein the first audio sample and the second audio sample do not overlap in view of the first audio dataset. 12 . The system of claim 8 , wherein the training of the machine learning model is performed via a self-supervised contrastive learning objectives. 13 . The system of claim 8 , wherein the first audio dataset and the second audio dataset do not include human annotations. 14 . The system of claim 8 , wherein the mapping of the third audio sample and the fourth audio sample in the latent space visualization creates a cluster and the one or more processors, are further configured to: determine a shared attribute between the third audio sample and the fourth audio sample; and label the cluster based on the shared attributes of the third audio sample and the fourth audio sample. 15 . A non-transitory computer-readable medium storing a set of instructions for detecting an anomaly in audio data, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the device to: receive, from a microphone
Related publications grouped by family.
Answers are generated from the same data shown on this page.