Zero-Shot Prompt Ensembling for Zero-Shot Classification with Text-Image Models
US-2024282131-A1 · Aug 22, 2024 · US
US12518358B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12518358-B2 |
| Application number | US-202318178212-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 3, 2023 |
| Priority date | Mar 3, 2023 |
| Publication date | Jan 6, 2026 |
| Grant date | Jan 6, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate modified digital images. In particular, in some embodiments, the disclosed systems generate image editing directions between textual identifiers of two visual features utilizing a language prediction machine learning model and a text encoder. In some embodiments, the disclosed systems generated an inversion of a digital image utilizing a regularized inversion model to guide forward diffusion of the digital image. In some embodiments, the disclosed systems utilize cross-attention guidance to preserve structural details of a source digital image when generating a modified digital image with a diffusion neural network.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: generating, utilizing a diffusion layer of a diffusion neural network, a noise map from a source digital image; generating a shifted noise map by shifting the noise map by an offset value; determining a pairwise correlation loss by comparing one or more regions of the noise map and one or more regions of the shifted noise map; generating a modified noise map based on the pairwise correlation loss; and generating, from the modified noise map utilizing a denoising layer of the diffusion neural network conditioned with an image editing encoding representing an edit to the source digital image, a modified digital image including the edit to the source digital image while preserving structural details of the source digital image. 2 . The computer-implemented method of claim 1 , wherein generating the modified noise map based on the pairwise correlation loss comprises modifying the noise map to reduce a similarity metric between the one or more regions of the noise map and the one or more regions of the shifted noise map. 3 . The computer-implemented method of claim 1 , wherein determining the pairwise correlation loss comprises: generating a pyramid of noise maps at different resolutions from the noise map; generating a pyramid of shifted noise map at the different resolutions from the shifted noise map; and determining the pairwise correlation loss by comparing the pyramid of noise maps and the pyramid of shifted noise maps. 4 . The computer-implemented method of claim 1 , further comprising: generating, utilizing one or more subsequent additional diffusion layers of the diffusion neural network, an inversion of the source digital image from the modified noise map; and generating the modified digital image from the modified noise map by utilizing a plurality of denoising layers of the diffusion neural network to generate the modified digital image from the inversion of the source digital image. 5 . The computer-implemented method of claim 1 , further comprising: generating, utilizing a subsequent diffusion layer of the diffusion neural network, an additional noise map from the modified noise map; determining an additional pairwise correlation loss by comparing one or more regions of the additional noise map with one or more regions of an additional shifted noise map; and generating an additional modified noise map from the additional noise map based on the additional pairwise correlation loss. 6 . The computer-implemented method of claim 5 , further comprising generating the additional shifted noise map by shifting the additional noise map by an additional offset value different than the offset value of the shifted noise map. 7 . The computer-implemented method of claim 1 , wherein generating the modified noise map further comprises: determining a divergence loss for the noise map relative to a standard distribution; determining an auto-correlation regularization loss by combining the pairwise correlation loss and the divergence loss; and generating the modified noise map based on the auto-correlation regularization loss. 8 . The computer-implemented method of claim 7 , wherein combining the pairwise correlation loss and the divergence loss comprises weighting the divergence loss by a first weight. 9 . A system comprising: one or more memory devices; and one or more processors coupled to the one or more memory devices that cause the system to perform operations comprising: generating, utilizing a diffusion layer of a diffusion neural network, a noise map from a source digital image; determining a pairwise correlation loss by comparing one or more regions of a noise map and one or more regions of a shifted noise map; determining a divergence loss for the noise map relative to a standard distribution; determining an auto-correlation regularization loss by combining the pairwise correlation loss and the divergence loss; generating a modified noise map based on the auto-correlation regularization loss; and generating, from the modified noise map utilizing a denoising layer of the diffusion neural network conditioned with an image editing encoding representing an edit to the source digital image, a modified digital image including the edit to the source digital image while preserving structural details of the source digital image. 10 . The system of claim 9 , wherein generating the noise map from the source digital image comprises generating, utilizing the diffusion layer of the diffusion neural network, the noise map from an initial latent vector corresponding to the source digital image. 11 . The system of claim 10 , wherein inverting the initial latent vector corresponding to the source digital image to generate the noise map comprises utilizing a deterministic forward diffusion model conditioned on a reference encoding of the source digital image. 12 . The system of claim 11 , wherein the operations further comprise generating, utilizing a text encoder, the reference encoding from an image caption describing the source digital image. 13 . The system of claim 9 , further comprising determining the divergence loss of the noise map relative to a reference mean and a unit variance. 14 . The system of claim 9 , wherein combining the pairwise correlation loss and the divergence loss comprises weighting the divergence loss by a first weight and weighting the pairwise correlation loss by a second weight. 15 . The system of claim 9 , wherein the operations further comprise generating an inversion of the source digital image from the modified noise map utilizing subsequent diffusion layers of the diffusion neural network. 16 . A non-transitory computer readable medium storing instructions thereon that, when executed by at least one processor, cause the at least one processor to perform operations comprising: generating, utilizing a diffusion layer of a diffusion neural network, a noise map from a source digital image; generating a shifted noise map by shifting the noise map by an offset value; determining a pairwise correlation loss by comparing one or more regions of the noise map and one or more regions of the shifted noise map; generating a modified noise map based on the pairwise correlation loss; and generating, from the modified noise map utilizing a denoising layer of the diffusion neural network conditioned with an image editing encoding representing an edit to the source digital image, a modified digital image including the edit to the source digital image while preserving structural details of the source digital image. 17 . The non-transitory computer readable medium of claim 16 , wherein generating the shifted noise map comprises randomly sampling the offset value. 18 . The non-transitory computer readable medium of claim 16 , wherein the operations further comprise generating an inversion of the source digital image from the modified noise map utilizing subsequent diffusion layers of the diffusion neural network conditioned by a reference encoding of the source digital image. 19 . The non-transitory computer readable medium of claim 18 , wherein generating the modified digital image from the modified noise map comprises generating, utilizing denoising layers of the diffusion neural network, the modified digital image from the inversion of the source digital image. 20 . The non-transitory computer readable medium of claim 19 , wherein generating the modified digital image from the
Character encoding · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
using two or more images, e.g. averaging or subtraction · CPC title
Denoising; Smoothing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.