System and method with diffusion-based outlier synthesis for anomaly detection

US12530875B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12530875-B2
Application numberUS-202318476614-A
CountryUS
Kind codeB2
Filing dateSep 28, 2023
Priority dateSep 28, 2023
Publication dateJan 20, 2026
Grant dateJan 20, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented system and method relate to anomaly detection. Latent code of a source image is obtained. The latent code is designated as a target image. Source embedding data is generated form the source image. Text data, which is of a different domain than that of the source image, is obtained. Text embedding data is generated from the text data. Additional embedding data is generated using the source embedding data and the text embedding data. The additional embedding data provides guidance for modifying the source image. A modified image is generated via an iterative process that includes at least one iteration, where each iteration includes at least (i) encoding the target image to generate target embedding data, (ii) generating updated embedding data by combining the target embedding data and the additional embedding data, (iii) decoding the updated embedding data to generate a new image, and (iv) assigning the new image as the target image and the modified image. A non-anomalous label is generated for the source image and an anomalous label is generated for the modified image. A machine learning model is trained or fine-tuned using a dataset, which includes at least the source image with the non-anomalous label and the modified image with the anomalous label.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for anomaly detection, the computer-implemented method comprising: receiving a source image associated with a first domain; obtaining a latent code of the source image, the latent code being designated as a target image; encoding, via a first image encoder, the source image to generate source embedding data; obtaining text data associated with a second domain; encoding, via a first text encoder, the text data to generate text embedding data; generating additional embedding data using the source embedding data and the text embedding data, the additional embedding data providing guidance for modifying the source image; generating a modified image via an iterative process that includes at least one iteration, each iteration including: encoding, via a second image encoder, the target image to generate target embedding data, generating updated embedding data by combining the target embedding data and the additional embedding data, decoding, via an image decoder, the updated embedding data to generate a new image, and assigning the new image as the target image and the modified image, generating a dataset that includes at least the source image and the modified image; and training or fine-tuning a machine learning model using the dataset. 2 . The computer-implemented method of claim 1 , wherein: the machine learning model is configured to perform anomaly detection; and the machine learning model comprises a classifier or a regression model. 3 . The computer-implemented method of claim 1 , wherein: the first image encoder is a part of a pretrained vision language model; the first text encoder is a part of the pretrained vision language model; the second image encoder is a part of a pretrained diffusion model; and the image decoder is a part of the pretrained diffusion model. 4 . The computer-implemented method of claim 1 , further comprising: generating directional loss data for the source image using the source embedding data and the text embedding data, wherein the additional embedding data is generated at least by minimizing the directional loss data. 5 . The computer-implemented method of claim 4 , further comprising: computing a first difference term between the text embedding data and the source embedding data; and computing a second difference term between the target embedding data and the source embedding data, wherein the directional loss data is generated via computations that use the first difference term and the second difference term. 6 . The computer-implemented method of claim 1 , further comprising: defining a strength level to indicate an amount of modifying the source image with respect to the text data, wherein the updated embedding data is generated using the strength level to affect an impact of the additional embedding data. 7 . The computer-implemented method of claim 1 , further comprising: employing the machine learning model to generate prediction data in response to receiving current sensor data, the prediction data indicating that the current sensor data is anomalous data or non-anomalous data; and controlling an actuator using the prediction data. 8 . A system for anomaly detection, the system comprising: one or more processors; at least one non-transitory computer readable medium in data communication with the one or more processors, the at least one non-transitory computer readable medium having computer readable data including instructions stored thereon that, when executed by the one or more processors is configured to cause the one or more processors to perform a method that comprises: receiving a source image associated with a first domain; obtaining a noisy image to use as a target image, the noisy image comprising Gaussian noise; encoding, via a first image encoder, the source image to generate source embedding data; obtaining text data associated with a second domain; encoding, via a first text encoder, the text data to generate text embedding data; generating additional embedding data using the source embedding data and the text embedding data, the additional embedding data providing guidance for modifying the source image; generating a modified image via an iterative process that includes at least one iteration, each iteration including: encoding, via a second image encoder, the target image to generate target embedding data, generating updated embedding data by combining the target embedding data and the additional embedding data, decoding, via an image decoder, the updated embedding data to generate a new image, and assigning the new image as the target image and the modified image, generating a dataset that includes at least the source image and the modified image; and training or fine-tuning a machine learning model using the dataset. 9 . The system of claim 8 , wherein: the machine learning model is configured to perform anomaly detection; and the machine learning model comprises a classifier or a regression model. 10 . The system of claim 8 , wherein: the first image encoder is a part of a pretrained vision language model; the first text encoder is a part of the pretrained vision language model; the second image encoder is a part of a pretrained diffusion model; and the image decoder is a part of the pretrained diffusion model. 11 . The system of claim 8 , wherein: generating directional loss data for the source image using the source embedding data and the text embedding data, wherein the additional embedding data is generated at least by minimizing the directional loss data. 12 . The system of claim 11 , further comprising: computing a first difference term between the text embedding data and the source embedding data; and computing a second difference term between the target embedding data and the source embedding data, wherein the directional loss data is generated via computations that use the first difference term and the second difference term. 13 . The system of claim 8 , further comprising: defining a strength level to indicate an amount of modifying the source image with respect to the text data, wherein the updated embedding data is generated using the strength level to affect an impact of the additional embedding data. 14 . The system of claim 8 , further comprising: employing the machine learning model to generate prediction data in response to receiving current sensor data, the prediction data indicating that the current sensor data is anomalous data or non-anomalous data; and controlling an actuator using the prediction data. 15 . A non-transitory computer readable medium having computer readable data including instructions stored thereon, the computer readable data being executable by one or more processors to perform a method that comprises: receiving a source image associated with a first domain; obtaining a noisy image to use as a target image, the noisy image comprising Gaussian noise; encoding, via a first image encoder, the source image to generate source embedding data; obtaining text data associated with a second domain; encoding, via a first text encoder, the text data to generate text embedding data; generating additional embedding data using the source embedding data and the text embedding data, the additional embedding data providing guidance for modifying the source image; generating a modified image via an iterative process that includes at least one iteration, each iteration including: encoding, via a second image encoder, the target image to generate ta

Assignees

Inventors

Classifications

  • Validation; Performance evaluation · CPC title

  • Spoof detection, e.g. liveness detection · CPC title

  • Embedding additional information in the video signal during the compression process (H04N19/517, H04N19/68, H04N19/70 take precedence) · CPC title

  • using neural networks · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12530875B2 cover?
A computer-implemented system and method relate to anomaly detection. Latent code of a source image is obtained. The latent code is designated as a target image. Source embedding data is generated form the source image. Text data, which is of a different domain than that of the source image, is obtained. Text embedding data is generated from the text data. Additional embedding data is generated…
Who is the assignee on this patent?
Bosch Gmbh Robert
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).