What technology area does this patent fall under?

Primary CPC classification G06T11/60. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu May 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Embedding an input image to a diffusion model

US2024161462A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2024161462-A1
Application number	US-202218053556-A
Country	US
Kind code	A1
Filing date	Nov 8, 2022
Priority date	Nov 8, 2022
Publication date	May 16, 2024
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image editing are described. Embodiments of the present disclosure include obtaining an image and a prompt for editing the image. A diffusion model is tuned based on the image to generate different versions of the image. The prompt is then encoded to obtain a guidance vector, and the diffusion model generates a modified image based on the image and the encoded text prompt.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: obtaining an image and a prompt for editing the image; encoding the prompt to obtain a guidance vector; and generating a modified image based on the image and the prompt using a diffusion model that has been trained on the image to generate different versions of the image. 2 . The method of claim 1 , further comprising: receiving the prompt from a user via a text field of a user interface; and displaying the modified image to the user via the user interface. 3 . The method of claim 1 , further comprising: initializing a plurality of noise maps; generating a plurality of intermediate images corresponding to the plurality of noise maps at different noise levels based on the plurality of noise maps using the diffusion model; and computing a loss function by comparing each of the plurality of intermediate images to the image, wherein the diffusion model is based on the loss function. 4 . The method of claim 3 , further comprising: selecting the plurality of intermediate images at random from a superset of intermediate images generated by the diffusion model. 5 . The method of claim 3 , further comprising: adding noise at the different noise levels to the image to obtain a plurality of noisy images, wherein the comparison is based on an intermediate image of the plurality of intermediate images and a corresponding noisy image of the plurality of noisy images having a corresponding noise level. 6 . The method of claim 1 , wherein: the prompt comprises text that describes a modification to the image, wherein the modified image includes the modification. 7 . The method of claim 1 , wherein: the modified image retains an identity of an object in the image. 8 . The method of claim 1 , further comprising: combining the guidance vector with image features within the diffusion model, wherein the modified image is based on the guidance vector. 9 . The method of claim 1 , further comprising: initializing the diffusion model; training the diffusion model based on a diverse training set to obtain a pre-trained diffusion model; and fine-tuning the pre-trained diffusion model based on the image. 10 . The method of claim 9 , wherein: the fine-tuning configures the diffusion model to generate an output resembling the image based on any input provided. 11 . The method of claim 9 , wherein: a first weight for a loss function is used for training the diffusion model and a second weight for the loss function that is different from the first weight is used for fine-tuning the pre-trained diffusion model. 12 . A non-transitory computer-readable medium comprising instructions, that, when executed by a processor, are configured to perform operations of: fine-tuning a pre-trained diffusion model based on a single image to obtain a tuned diffusion model; receiving a prompt including additional content for the single image; and generating a modified image based on the single image and the prompt using the tuned diffusion model. 13 . The non-transitory computer-readable medium of claim 12 , wherein the instructions are further configured to perform: initializing a plurality of noise maps; generating a plurality of intermediate images corresponding to the plurality of noise maps at different noise levels based on the plurality of noise maps using the pre-trained diffusion model; and computing a loss function by comparing each of the plurality of intermediate images to the single image, wherein the tuned diffusion model is based on the loss function. 14 . The non-transitory computer-readable medium of claim 13 , wherein the instructions are further configured to perform: selecting the plurality of intermediate images at random from a superset of intermediate images generated by the pre-trained diffusion model. 15 . The non-transitory computer-readable medium of claim 13 , wherein the instructions are further configured to perform: adding noise at the different noise levels to the single image to obtain a plurality of noisy images, wherein the comparison is based on an intermediate image of the plurality of intermediate images and a corresponding noisy image of the plurality of noisy images having a corresponding noise level. 16 . The non-transitory computer-readable medium of claim 12 , wherein the instructions are further configured to perform: encoding the prompt to obtain a guidance vector; and combining the guidance vector with image features within the tuned diffusion model, wherein the modified image is based on the guidance vector. 17 . An apparatus for image processing, comprising: one or more processors; and one or more memories including instructions executable by the one or more processors to: obtain an image and a prompt for editing the image; fine-tune a pre-trained diffusion model based on the image to obtain a tuned diffusion model; and generate a modified image based on the image and the prompt using the tuned diffusion model. 18 . The apparatus of claim 17 , wherein the instructions are further executable by the one or more processors to: encode the prompt to obtain a guidance vector using a text encoder, wherein the modified image is based on the guidance vector. 19 . The apparatus of claim 17 , wherein the instructions are further executable by the one or more processors to: receive the prompt from a user via a text field of a user interface, and display the modified image to the user. 20 . The apparatus of claim 17 , wherein: the diffusion model comprises a Denoising Diffusion Probabilistic Model (DDPM).

Assignees

Adobe Inc

Inventors

Classifications

G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/20081
Training; Learning · CPC title
G06N3/08
Learning methods · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F40/30
Semantic analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 90731940

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024161462A1 cover?: Systems and methods for image editing are described. Embodiments of the present disclosure include obtaining an image and a prompt for editing the image. A diffusion model is tuned based on the image to generate different versions of the image. The prompt is then encoded to obtain a guidance vector, and the diffusion model generates a modified image based on the image and the encoded text prompt.
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu May 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).