Embedding an input image to a diffusion model

US2024161462A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024161462-A1
Application numberUS-202218053556-A
CountryUS
Kind codeA1
Filing dateNov 8, 2022
Priority dateNov 8, 2022
Publication dateMay 16, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image editing are described. Embodiments of the present disclosure include obtaining an image and a prompt for editing the image. A diffusion model is tuned based on the image to generate different versions of the image. The prompt is then encoded to obtain a guidance vector, and the diffusion model generates a modified image based on the image and the encoded text prompt.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: obtaining an image and a prompt for editing the image; encoding the prompt to obtain a guidance vector; and generating a modified image based on the image and the prompt using a diffusion model that has been trained on the image to generate different versions of the image. 2 . The method of claim 1 , further comprising: receiving the prompt from a user via a text field of a user interface; and displaying the modified image to the user via the user interface. 3 . The method of claim 1 , further comprising: initializing a plurality of noise maps; generating a plurality of intermediate images corresponding to the plurality of noise maps at different noise levels based on the plurality of noise maps using the diffusion model; and computing a loss function by comparing each of the plurality of intermediate images to the image, wherein the diffusion model is based on the loss function. 4 . The method of claim 3 , further comprising: selecting the plurality of intermediate images at random from a superset of intermediate images generated by the diffusion model. 5 . The method of claim 3 , further comprising: adding noise at the different noise levels to the image to obtain a plurality of noisy images, wherein the comparison is based on an intermediate image of the plurality of intermediate images and a corresponding noisy image of the plurality of noisy images having a corresponding noise level. 6 . The method of claim 1 , wherein: the prompt comprises text that describes a modification to the image, wherein the modified image includes the modification. 7 . The method of claim 1 , wherein: the modified image retains an identity of an object in the image. 8 . The method of claim 1 , further comprising: combining the guidance vector with image features within the diffusion model, wherein the modified image is based on the guidance vector. 9 . The method of claim 1 , further comprising: initializing the diffusion model; training the diffusion model based on a diverse training set to obtain a pre-trained diffusion model; and fine-tuning the pre-trained diffusion model based on the image. 10 . The method of claim 9 , wherein: the fine-tuning configures the diffusion model to generate an output resembling the image based on any input provided. 11 . The method of claim 9 , wherein: a first weight for a loss function is used for training the diffusion model and a second weight for the loss function that is different from the first weight is used for fine-tuning the pre-trained diffusion model. 12 . A non-transitory computer-readable medium comprising instructions, that, when executed by a processor, are configured to perform operations of: fine-tuning a pre-trained diffusion model based on a single image to obtain a tuned diffusion model; receiving a prompt including additional content for the single image; and generating a modified image based on the single image and the prompt using the tuned diffusion model. 13 . The non-transitory computer-readable medium of claim 12 , wherein the instructions are further configured to perform: initializing a plurality of noise maps; generating a plurality of intermediate images corresponding to the plurality of noise maps at different noise levels based on the plurality of noise maps using the pre-trained diffusion model; and computing a loss function by comparing each of the plurality of intermediate images to the single image, wherein the tuned diffusion model is based on the loss function. 14 . The non-transitory computer-readable medium of claim 13 , wherein the instructions are further configured to perform: selecting the plurality of intermediate images at random from a superset of intermediate images generated by the pre-trained diffusion model. 15 . The non-transitory computer-readable medium of claim 13 , wherein the instructions are further configured to perform: adding noise at the different noise levels to the single image to obtain a plurality of noisy images, wherein the comparison is based on an intermediate image of the plurality of intermediate images and a corresponding noisy image of the plurality of noisy images having a corresponding noise level. 16 . The non-transitory computer-readable medium of claim 12 , wherein the instructions are further configured to perform: encoding the prompt to obtain a guidance vector; and combining the guidance vector with image features within the tuned diffusion model, wherein the modified image is based on the guidance vector. 17 . An apparatus for image processing, comprising: one or more processors; and one or more memories including instructions executable by the one or more processors to: obtain an image and a prompt for editing the image; fine-tune a pre-trained diffusion model based on the image to obtain a tuned diffusion model; and generate a modified image based on the image and the prompt using the tuned diffusion model. 18 . The apparatus of claim 17 , wherein the instructions are further executable by the one or more processors to: encode the prompt to obtain a guidance vector using a text encoder, wherein the modified image is based on the guidance vector. 19 . The apparatus of claim 17 , wherein the instructions are further executable by the one or more processors to: receive the prompt from a user via a text field of a user interface, and display the modified image to the user. 20 . The apparatus of claim 17 , wherein: the diffusion model comprises a Denoising Diffusion Probabilistic Model (DDPM).

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024161462A1 cover?
Systems and methods for image editing are described. Embodiments of the present disclosure include obtaining an image and a prompt for editing the image. A diffusion model is tuned based on the image to generate different versions of the image. The prompt is then encoded to obtain a guidance vector, and the diffusion model generates a modified image based on the image and the encoded text prompt.
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).