Utilizing a diffusion prior neural network for text guided digital image editing

US12530822B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12530822-B2
Application numberUS-202318308017-A
CountryUS
Kind codeB2
Filing dateApr 27, 2023
Priority dateApr 27, 2023
Publication dateJan 20, 2026
Grant dateJan 20, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, methods, and non-transitory computer readable media for utilizing a diffusion prior neural network for text guided digital image editing. For example, in one or more embodiments the disclosed systems utilize a text-image encoder to generate a base image embedding from the base digital image and an edit text embedding from edit text. Moreover, the disclosed systems utilize a diffusion prior neural network to generate a text-image embedding. In particular, the disclosed systems inject the base image embedding at a conceptual editing step of the diffusion prior neural network and condition a set of steps of the diffusion prior neural network after the conceptual editing step utilizing the edit text embedding. Furthermore, the disclosed systems utilize a diffusion neural network to create a modified digital image from the text-edited image embedding and the base image embedding.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: generating, utilizing a trained text-image encoder, a base image embedding from a base digital image; generating, utilizing the trained text-image encoder, an edit text embedding from edit text corresponding to the base digital image; generating, utilizing a diffusion prior neural network, a text-edited image embedding from the base image embedding and the edit text embedding; and creating, utilizing a diffusion neural network, a modified digital image from the text-edited image embedding and the base image embedding. 2 . The computer-implemented method of claim 1 , further comprising: generating, utilizing the trained text-image encoder, an edit text embedding from the edit text; and generating, utilizing the diffusion prior neural network, the text-edited image embedding from the base image embedding and the edit text embedding. 3 . The computer-implemented method of claim 2 , wherein generating, utilizing the diffusion prior neural network, the text-edited image embedding from the base image embedding and the edit text embedding comprises injecting the base image embedding at a conceptual editing step of the diffusion prior neural network. 4 . The computer-implemented method of claim 3 , wherein generating, utilizing the diffusion prior neural network, the text-edited image embedding from the base image embedding and the edit text embedding comprises conditioning a set of steps of the diffusion prior neural network after the conceptual editing step utilizing the edit text embedding. 5 . The computer-implemented method of claim 3 , further comprising: providing, for display via a user interface of a client device, a conceptual edit controller; and determining the conceptual editing step based on user interaction with the conceptual edit controller. 6 . The computer-implemented method of claim 1 , wherein creating, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding further comprises generating, utilizing a structural number of noising steps of a reverse diffusion neural network culminating at a structural noising transition step, a base image noise map from the base image embedding. 7 . The computer-implemented method of claim 6 , wherein creating, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding further comprises, generating the modified digital image from the base image noise map by conditioning a structural number of denoising steps of the diffusion neural network on the text-edited image embedding. 8 . The computer-implemented method of claim 7 , further comprising: providing, for display via a user interface of a client device, a structural edit controller; and determining the structural number of noising steps and the structural number of denoising steps based on user interaction with the structural edit controller. 9 . A system comprising: one or more memory devices comprising a base digital image, edit text for modifying the base digital image, a trained text-image encoder, a diffusion prior neural network, and a diffusion neural network; and one or more processors configured to cause the system to: generate, utilizing the trained text-image encoder, a base image embedding from the base digital image and an edit text embedding from the edit text; generate, utilizing the diffusion prior neural network, a text-edited image embedding by: injecting the base image embedding at a conceptual editing step of the diffusion prior neural network; and conditioning a set of steps of the diffusion prior neural network after the conceptual editing step utilizing the edit text embedding; and create, utilizing a diffusion neural network, a modified digital image from the text-edited image embedding and the base image embedding. 10 . The system of claim 9 , wherein the one or more processors are further configured to cause the system to generate the text-edited image embedding by selecting the conceptual editing step from a plurality of steps of the diffusion prior neural network. 11 . The system of claim 10 , wherein the one or more processors are further configured to cause the system to: select an alternative conceptual editing step from the plurality of steps; and generate an additional text-edited image embedding by injecting the base image embedding at the alternative conceptual editing step. 12 . The system of claim 11 , wherein the one or more processors are further configured to cause the system to generate an additional modified digital image from the additional text-edited image embedding. 13 . The system of claim 9 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by generating a base image noise map from the base image embedding through a structural number of diffusion steps culminating at a structural transition step. 14 . The system of claim 13 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by denoising the base image noise map for a structural number of denoising steps of the diffusion neural network. 15 . The system of claim 14 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by conditioning the denoising steps on the text-edited image embedding. 16 . The system of claim 13 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by selecting the structural number of diffusion steps based on user interaction via a user interface of a client device. 17 . A non-transitory computer readable medium storing executable instructions which, when executed by a processing device, cause the processing device to perform operations comprising: providing, for display via a user interface of a client device, a base digital image, edit text, and a conceptual edit controller; receiving a conceptual edit strength parameter based on user interaction with the conceptual edit controller; determining a conceptual editing step based on the conceptual edit strength parameter; generating, utilizing a diffusion prior neural network, a text-edited image embedding by utilizing a base image embedding of the base digital image and an edit text embedding from the edit text according to the conceptual editing step; and generating, utilizing a diffusion neural network, a modified digital image from the text-edited image embedding and the base image embedding. 18 . The non-transitory computer readable medium of claim 17 , wherein generating the modified digital image from the text-edited image embedding comprises generating, utilizing a diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding. 19 . The non-transitory computer readable medium of claim 17 , wherein generating, utilize a diffusion prior neural network, the text-edited

Assignees

Inventors

Classifications

  • Denoising; Smoothing · CPC title

  • involving graphical user interfaces [GUIs] · CPC title

  • Artificial neural networks [ANN] · CPC title

  • G06T11/60Primary

    Creating or editing images; Combining images with text · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12530822B2 cover?
The present disclosure relates to systems, methods, and non-transitory computer readable media for utilizing a diffusion prior neural network for text guided digital image editing. For example, in one or more embodiments the disclosed systems utilize a text-image encoder to generate a base image embedding from the base digital image and an edit text embedding from edit text. Moreover, the discl…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).