Diffusion models having continuous scaling through patch-wise image generation

US12437437B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12437437-B2
Application numberUS-202218052658-A
CountryUS
Kind codeB2
Filing dateNov 4, 2022
Priority dateNov 4, 2022
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the methods, apparatus, non-transitory computer readable medium, and systems include obtaining a noise map and a global image code encoded from an original image and representing semantic content of the original image; generating a plurality of image patches based on the noise map and the global image code using a diffusion model; and combining the plurality of image patches to produce an output image including the semantic content.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a noise map and a global image code encoded from an original image and representing semantic content of the original image; generating a plurality of image patches based on the noise map and the global image code using a diffusion model, wherein each image patch of the plurality of image patches is generated by denoising a noisy patch of the noise map based on the global image code; and combining the plurality of image patches to produce an output image including the semantic content. 2. The method of claim 1 , wherein: the diffusion model is conditioned based on the global image code. 3. The method of claim 1 , further comprising: identifying a text prompt; and encoding the text prompt to obtain the global image code. 4. The method of claim 1 , wherein the original image is a high resolution image. 5. The method of claim 1 , wherein: the noise map comprises a same resolution as the output image. 6. The method of claim 1 , wherein: each of the plurality of image patches is generated based on a region of the noise map that overlaps at least one other region used to generate another of the plurality of image patches. 7. The method of claim 6 , wherein: the plurality of image patches do not overlap each other. 8. The method of claim 1 , further comprising: identifying a position indicator corresponding to each of the plurality of image patches, wherein each of the plurality of image patches is generated based on the corresponding position indicator and the plurality of image patches are combined based on the position indicator. 9. The method of claim 1 , further comprising: training the diffusion model to generate the plurality of image patches based on the global image code. 10. The method of claim 1 , wherein: the global image code includes information representing a spatial layout of the output image. 11. A method comprising: initializing parameters of a diffusion model; obtaining a noise map and a global image code encoded from a training image and representing semantic content of the training image; generating a plurality of predicted image patches based on the noise map and the global image code using the diffusion model, wherein each predicted image patch of the plurality of predicted image patches is generated by denoising a noisy patch of the noise map based on the global image code; computing a loss function based on the plurality of predicted image patches; and training the diffusion model to generate image patches by updating the parameters based on the loss function. 12. The method of claim 11 , wherein: identifying a high-resolution training image; generating a high-resolution noise map and a low-resolution noise map based on the high-resolution training image; generating a first image patch based on the high-resolution noise map and a second image patch based on the low-resolution noise map; and computing a patch consistency loss by comparing the first image patch and the second image patch, wherein the loss function includes the patch consistency loss. 13. The method of claim 12 , further comprising: cropping the high-resolution training image to obtain a high-resolution training patch; and adding noise to the high-resolution training patch to obtain the high-resolution noise map. 14. The method of claim 12 , further comprising: down-sampling the high-resolution training image to obtain a low-resolution training image; cropping the low-resolution training image to obtain a low-resolution training patch; and adding noise to the low-resolution training patch to obtain the low-resolution noise map. 15. The method of claim 11 , further comprising: combining the plurality of predicted image patches to produce a predicted image; and computing a reconstruction loss by comparing the predicted image to a ground truth image, wherein the loss function includes the reconstruction loss. 16. The method of claim 11 , further comprising: identifying a position indicator corresponding each of the plurality of predicted image patches, wherein each of the plurality of predicted image patches is generated based on the corresponding position indicator. 17. An apparatus comprising: one or more processors; and one or memories including instructions executable by the one or more processors to: obtain a noise map and a global image code encoded from an original image and representing semantic content of the original image; generate a plurality of image patches based on the noise map and the global image code using a diffusion model, wherein each image patch of the plurality of image patches is generated by denoising a noisy patch of the noise map based on the global image code; and combine the plurality of image patches to produce an output image including the semantic content. 18. The apparatus of claim 17 , wherein: the diffusion model comprises a U-Net architecture. 19. The apparatus of claim 17 , wherein the instructions are further executable to: encode an input prompt to obtain the global image code. 20. The apparatus of claim 19 , wherein: the global image code is encoded using a multimodal encoder, and wherein the original image is a high resolution image.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437437B2 cover?
Aspects of the methods, apparatus, non-transitory computer readable medium, and systems include obtaining a noise map and a global image code encoded from an original image and representing semantic content of the original image; generating a plurality of image patches based on the noise map and the global image code using a diffusion model; and combining the plurality of image patches to produ…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).