Systems and methods for reversible transformations using diffusion models
US-2024161248-A1 · May 16, 2024 · US
US12536713B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12536713-B2 |
| Application number | US-202318477764-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 29, 2023 |
| Priority date | May 16, 2023 |
| Publication date | Jan 27, 2026 |
| Grant date | Jan 27, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide a method of image generation. The method includes a fixed diffusion model, and a trainable diffusion model. The fixed diffusion model may be pretrained on a large training corpus. The trainable diffusion model may be used to control the image generation of the fixed diffusion model by modifying internal representations of the fixed diffusion model. A task instruction may be provided in addition to a text prompt, and the task instruction may guide the trainable diffusion model together with the visual conditions. The visual conditions may be adapted according to the task instruction. During training, a fixed number of task instructions may be used. At inference, unseen task instructions may be used by combining convolutional kernels of the visual condition adapter.
Opening claim text (preview).
What is claimed is: 1 . A method of image generation, the method comprising: receiving, via a data interface, a text prompt, an input image, and a task instruction distinct from the text prompt; generating, via an adapter, a task-specific feature map based on the input image, the text prompt, and the task instruction; generating, by a first neural network based image model, a first latent representation based on the task-specific feature map; generating, via a task encoder, a task embedding based on the task instruction; modifying a second latent representation of a second neural network based image model based on the first latent representation and the task embedding, wherein the second neural network based image model is fixed such that its parameters are not updated after pretraining and the first neural network based image model is a trainable copy of at least an encoder of the second neural network based image model; and generating, by a decoder of the second neural network based image model, an output image based on the second latent representation and the text prompt. 2 . The method of claim 1 , wherein the generating the task-specific feature map comprises: selecting one or more convolutional kernels from a set of convolutional kernels based on the task instruction; and generating the task-specific feature map based on the input image and the selected one or more convolutional kernel. 3 . The method of claim 2 , wherein the one or more convolutional kernels are selected based on a comparison of the task instruction to one or more predefined task instructions. 4 . The method of claim 3 , wherein the generating the task-specific feature map further includes: estimating a respective weight for each of the selected convolutional kernels based on the comparison. 5 . The method of claim 1 , further comprising: receiving, via the data interface, a target image; computing a loss objective based on the output image and the target image; and updating parameters of the first neural network based image model, based on the computed loss objective via backpropagation while keeping the second neural network based image model unchanged. 6 . The method of claim 1 , further comprising: receiving, via the data interface, a training dataset including training samples corresponding to a plurality of task instructions, wherein each of the plurality of task instructions is one of a predefined set of task instructions; and training the first neural network based image model using the training samples corresponding to the plurality of task instructions. 7 . The method of claim 6 , wherein the task instruction is different than any task instruction of the predefined set of task instructions that have been used in training the first neural network based image model. 8 . A system for image generation, the system comprising: a memory that stores a first neural network based image model, a second neural network based image model, and a plurality of processor executable instructions; a communication interface that receives a text prompt, an input image, and a task instruction distinct from the text prompt; and one or more hardware processors configured to read and execute the plurality of processor-executable instructions from the memory to perform operations comprising: generating, via an adapter, a task-specific feature map based on the input image, the text prompt, and the task instruction; generating, by a first neural network based image model, a first latent representation based on the task-specific feature map; generating, via a task encoder, a task embedding based on the task instruction; modifying a second latent representation of a second neural network based image model based on the first latent representation and the task embedding, wherein the second neural network based image model is fixed such that its parameters are not updated after pretraining and the first neural network based image model is a trainable copy of at least an encoder of the second neural network based image mode; and generating, by a decoder of the second neural network based image model, an output image based on the second latent representation and the text prompt. 9 . The system of claim 8 , wherein the generating the task-specific feature map comprises: selecting one or more convolutional kernels from a set of convolutional kernels based on the task instruction; and generating the task-specific feature map based on the input image and the selected one or more convolutional kernel. 10 . The system of claim 9 , wherein the one or more convolutional kernels are selected based on a comparison of the task instruction to one or more predefined task instructions. 11 . The system of claim 10 , wherein the generating the task-specific feature map further includes: estimating a respective weight for each of the selected convolutional kernels based on the comparison. 12 . The system of claim 8 , the operations further comprising: receiving, via a data interface, a target image; computing a loss objective based on the output image and the target image; and updating parameters of the first neural network based image model, based on the computed loss objective via backpropagation while keeping the second neural network based image model unchanged. 13 . The system of claim 8 , the operations further comprising: receiving, via a data interface, a training dataset including training samples corresponding to a plurality of task instructions, wherein each of the plurality of task instructions is one of a predefined set of task instructions; and training the first neural network based image model using the training samples corresponding to the plurality of task instructions. 14 . The system of claim 13 , wherein the task instruction is different than any task instruction of the predefined set of task instructions that have been used in training the first neural network based image model. 15 . A non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: receiving, via a data interface, a text prompt, an input image, and a task instruction distinct from the text prompt; generating, via an adapter, a task-specific feature map based on the input image, the text prompt, and the task instruction; generating, by a first neural network based image model, a first latent representation based on the task-specific feature map; generating, via a task encoder, a task embedding based on the task instruction; modifying a second latent representation of a second neural network based image model based on the first latent representation and the task embedding, wherein the second neural network based image model is fixed such that its parameters are not updated after pretraining and the first neural network based image model is a trainable copy of at least an encoder of the second neural network based image model; and generating, by a decoder of the second neural network based image model, an output image based on the second latent representation and the text prompt. 16 . The non-transitory machine-readable medium of claim 15 , wherein the generating the task-specific feature map comprises: selecting one or more convolutional kernels from a set of convolutional kernels based on the task instruction; and generating the task-specific feature map based on the input image and the selected one or more convolutional kernel.
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title
using local operators · CPC title
Two-dimensional [2D] image generation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.