Color conditioned diffusion prior

US12586271B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12586271-B2
Application numberUS-202318329111-A
CountryUS
Kind codeB2
Filing dateJun 5, 2023
Priority dateJun 5, 2023
Publication dateMar 24, 2026
Grant dateMar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image processing are described. Embodiments of the present disclosure, via a multi-modal encoder of an image processing apparatus, encodes a text prompt to obtain a text embedding. A color encoder of the image processing apparatus encodes a color prompt to obtain a color embedding. A diffusion prior model of the image processing apparatus generates an image embedding based on the text embedding and the color embedding. A latent diffusion model of the image processing apparatus generates an image based on the image embedding, where the image includes an element from the text prompt and a color from the color prompt.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: encoding a text prompt to obtain a text embedding; encoding a color prompt to obtain a color embedding, wherein the color prompt comprises a different modality than the text prompt; generating an image embedding using a diffusion prior model based on the text embedding and the color embedding, wherein the image embedding represents the text prompt and the color prompt; and generating an image by denoising a noise input based on the image embedding that represents the text prompt and the color prompt using a latent diffusion model (LDM), wherein the image includes an element from the text prompt and a color from the color prompt. 2 . The method of claim 1 , wherein: the text embedding and the image embedding are in a multi-modal embedding space. 3 . The method of claim 2 , wherein: the text embedding is in a first region of the multi-modal embedding space corresponding to text and the image embedding is in a second region of the multi-modal embedding space corresponding to images. 4 . The method of claim 1 , wherein generating the image embedding comprises: performing an attention process using shared attention weights among the text embedding and the color embedding. 5 . The method of claim 1 , wherein: the color embedding comprises a color histogram. 6 . The method of claim 1 , further comprising: identifying a candidate image embedding for a candidate image; comparing the image embedding to the candidate image embedding; and providing the candidate image as a search result based on the comparison. 7 . The method of claim 1 , further comprising: generating a plurality of image embeddings based on the text embedding and the color embedding, wherein the image embedding is selected from the plurality of image embeddings. 8 . The method of claim 7 , further comprising: generating a plurality of images based on the plurality of image embeddings, respectively, wherein each of the plurality of images includes the element from the text prompt and the color from the color prompt. 9 . The method of claim 1 , further comprising: generating a plurality of images based on the image embedding, wherein each of the plurality of images includes the element from the text prompt and the color from the color prompt. 10 . The method of claim 1 , further comprising: generating a modified image embedding using the LDM, wherein the image is generated based on the modified image embedding. 11 . A method comprising: obtaining training data including a text embedding representing a text prompt and a color embedding representing a color prompt, wherein the color prompt comprises a different modality than the text prompt; initializing a diffusion prior model; and training the diffusion prior model to generate an image embedding based on the text embedding and the color embedding, wherein the image embedding represents features corresponding to the text embedding and a color corresponding to the color embedding, and wherein the image embedding represents the text prompt and the color prompt. 12 . The method of claim 11 , further comprising: encoding the text prompt describing a ground-truth image to obtain the text embedding; and encoding the ground-truth image to obtain the color embedding. 13 . The method of claim 11 , further comprising: training a latent diffusion model (LDM) to generate an image by denoising a noise input based on the image embedding that represents the text prompt and the color prompt. 14 . The method of claim 11 , further comprising: generating a predicted image embedding using the diffusion prior model; and computing a loss function by comparing the predicted image embedding to a ground-truth image embedding, wherein the diffusion prior model is trained based on the loss function. 15 . An apparatus comprising: at least one processor; and at least one memory including instructions executable by the at least one processor to perform operations including: encoding, using a multi-modal encoder, a text prompt to obtain a text embedding; encoding, using a color encoder, a color prompt to obtain a color embedding, wherein the color prompt comprises a different modality than the text prompt; generating, using a diffusion prior model, an image embedding based on the text embedding and the color embedding, wherein the image embedding represents the text prompt and the color prompt; and generating, using a latent diffusion model (LDM), an image by denoising a noise input based on the image embedding that represents the text prompt and the color prompt, wherein the image includes an element from the text prompt and a color from the color prompt. 16 . The apparatus of claim 15 , wherein: the color encoder comprises a color histogram extractor configured to extract a color histogram from the color prompt. 17 . The apparatus of claim 15 , wherein: the diffusion prior model comprises a transformer architecture. 18 . The apparatus of claim 15 , wherein: the LDM comprises a U-Net architecture. 19 . The apparatus of claim 15 , wherein: the LDM comprises an image decoder. 20 . The apparatus of claim 15 , further comprising: a training component configured to train the diffusion prior model.

Assignees

Inventors

Classifications

  • Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • Combinations of networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12586271B2 cover?
Systems and methods for image processing are described. Embodiments of the present disclosure, via a multi-modal encoder of an image processing apparatus, encodes a text prompt to obtain a text embedding. A color encoder of the image processing apparatus encodes a color prompt to obtain a color embedding. A diffusion prior model of the image processing apparatus generates an image embedding bas…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).