User interface for generating and manipulating molecular images with natural language instructions
US-2024331235-A1 · Oct 3, 2024 · US
US12530822B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12530822-B2 |
| Application number | US-202318308017-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 27, 2023 |
| Priority date | Apr 27, 2023 |
| Publication date | Jan 20, 2026 |
| Grant date | Jan 20, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to systems, methods, and non-transitory computer readable media for utilizing a diffusion prior neural network for text guided digital image editing. For example, in one or more embodiments the disclosed systems utilize a text-image encoder to generate a base image embedding from the base digital image and an edit text embedding from edit text. Moreover, the disclosed systems utilize a diffusion prior neural network to generate a text-image embedding. In particular, the disclosed systems inject the base image embedding at a conceptual editing step of the diffusion prior neural network and condition a set of steps of the diffusion prior neural network after the conceptual editing step utilizing the edit text embedding. Furthermore, the disclosed systems utilize a diffusion neural network to create a modified digital image from the text-edited image embedding and the base image embedding.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: generating, utilizing a trained text-image encoder, a base image embedding from a base digital image; generating, utilizing the trained text-image encoder, an edit text embedding from edit text corresponding to the base digital image; generating, utilizing a diffusion prior neural network, a text-edited image embedding from the base image embedding and the edit text embedding; and creating, utilizing a diffusion neural network, a modified digital image from the text-edited image embedding and the base image embedding. 2 . The computer-implemented method of claim 1 , further comprising: generating, utilizing the trained text-image encoder, an edit text embedding from the edit text; and generating, utilizing the diffusion prior neural network, the text-edited image embedding from the base image embedding and the edit text embedding. 3 . The computer-implemented method of claim 2 , wherein generating, utilizing the diffusion prior neural network, the text-edited image embedding from the base image embedding and the edit text embedding comprises injecting the base image embedding at a conceptual editing step of the diffusion prior neural network. 4 . The computer-implemented method of claim 3 , wherein generating, utilizing the diffusion prior neural network, the text-edited image embedding from the base image embedding and the edit text embedding comprises conditioning a set of steps of the diffusion prior neural network after the conceptual editing step utilizing the edit text embedding. 5 . The computer-implemented method of claim 3 , further comprising: providing, for display via a user interface of a client device, a conceptual edit controller; and determining the conceptual editing step based on user interaction with the conceptual edit controller. 6 . The computer-implemented method of claim 1 , wherein creating, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding further comprises generating, utilizing a structural number of noising steps of a reverse diffusion neural network culminating at a structural noising transition step, a base image noise map from the base image embedding. 7 . The computer-implemented method of claim 6 , wherein creating, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding further comprises, generating the modified digital image from the base image noise map by conditioning a structural number of denoising steps of the diffusion neural network on the text-edited image embedding. 8 . The computer-implemented method of claim 7 , further comprising: providing, for display via a user interface of a client device, a structural edit controller; and determining the structural number of noising steps and the structural number of denoising steps based on user interaction with the structural edit controller. 9 . A system comprising: one or more memory devices comprising a base digital image, edit text for modifying the base digital image, a trained text-image encoder, a diffusion prior neural network, and a diffusion neural network; and one or more processors configured to cause the system to: generate, utilizing the trained text-image encoder, a base image embedding from the base digital image and an edit text embedding from the edit text; generate, utilizing the diffusion prior neural network, a text-edited image embedding by: injecting the base image embedding at a conceptual editing step of the diffusion prior neural network; and conditioning a set of steps of the diffusion prior neural network after the conceptual editing step utilizing the edit text embedding; and create, utilizing a diffusion neural network, a modified digital image from the text-edited image embedding and the base image embedding. 10 . The system of claim 9 , wherein the one or more processors are further configured to cause the system to generate the text-edited image embedding by selecting the conceptual editing step from a plurality of steps of the diffusion prior neural network. 11 . The system of claim 10 , wherein the one or more processors are further configured to cause the system to: select an alternative conceptual editing step from the plurality of steps; and generate an additional text-edited image embedding by injecting the base image embedding at the alternative conceptual editing step. 12 . The system of claim 11 , wherein the one or more processors are further configured to cause the system to generate an additional modified digital image from the additional text-edited image embedding. 13 . The system of claim 9 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by generating a base image noise map from the base image embedding through a structural number of diffusion steps culminating at a structural transition step. 14 . The system of claim 13 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by denoising the base image noise map for a structural number of denoising steps of the diffusion neural network. 15 . The system of claim 14 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by conditioning the denoising steps on the text-edited image embedding. 16 . The system of claim 13 , wherein the one or more processors are further configured to cause the system to create, utilizing the diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding by selecting the structural number of diffusion steps based on user interaction via a user interface of a client device. 17 . A non-transitory computer readable medium storing executable instructions which, when executed by a processing device, cause the processing device to perform operations comprising: providing, for display via a user interface of a client device, a base digital image, edit text, and a conceptual edit controller; receiving a conceptual edit strength parameter based on user interaction with the conceptual edit controller; determining a conceptual editing step based on the conceptual edit strength parameter; generating, utilizing a diffusion prior neural network, a text-edited image embedding by utilizing a base image embedding of the base digital image and an edit text embedding from the edit text according to the conceptual editing step; and generating, utilizing a diffusion neural network, a modified digital image from the text-edited image embedding and the base image embedding. 18 . The non-transitory computer readable medium of claim 17 , wherein generating the modified digital image from the text-edited image embedding comprises generating, utilizing a diffusion neural network, the modified digital image from the text-edited image embedding and the base image embedding. 19 . The non-transitory computer readable medium of claim 17 , wherein generating, utilize a diffusion prior neural network, the text-edited
Denoising; Smoothing · CPC title
involving graphical user interfaces [GUIs] · CPC title
Artificial neural networks [ANN] · CPC title
Creating or editing images; Combining images with text · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.