Text-based image generation

US12524937B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12524937-B2
Application numberUS-202318170963-A
CountryUS
Kind codeB2
Filing dateFeb 17, 2023
Priority dateFeb 17, 2023
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image generation are provided. An aspect of the systems and methods includes obtaining a text prompt, generating a style vector based on the text prompt, generating an adaptive convolution filter based on the style vector, and generating an image corresponding to the text prompt based on the adaptive convolution filter.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for image generation, comprising: obtaining a text prompt; generating a style vector based on the text prompt; generating an adaptive convolution filter by averaging a plurality of weights across a plurality of convolution filters of a convolution layer based on the style vector, wherein the adaptive convolution filter comprises a convolution matrix corresponding to a style of the style vector; and generating an image corresponding to the text prompt based on the adaptive convolution filter. 2 . The method of claim 1 , further comprising: encoding the text prompt to obtain a text embedding; and transforming the text embedding to obtain a global vector corresponding to the text prompt as a whole and a plurality of local vectors corresponding to individual tokens of the text prompt, wherein the style vector is generated based on the global vector and the image is generated based on the plurality of local vectors. 3 . The method of claim 2 , further comprising: performing a cross-attention process based on the plurality of local vectors, wherein the image is generated based on the cross-attention process. 4 . The method of claim 2 , further comprising: obtaining a noise vector, wherein the style vector is based on the noise vector. 5 . The method of claim 1 , further comprising: initializing a feature map; and performing a convolution process on the feature map based on the adaptive convolution filter, wherein the image is generated based on the convolution process. 6 . The method of claim 5 , further comprising: performing a self-attention process based on the feature map, wherein the image is generated based on the self-attention process. 7 . The method of claim 6 , wherein: the self-attention process is based on an L2 distance. 8 . The method of claim 1 , further comprising: identifying a plurality of predetermined convolution filters; and combining the plurality of predetermined convolution filters based on the style vector to obtain the adaptive convolution filter. 9 . The method of claim 1 , further comprising: identifying a diversity parameter; and truncating the style vector based on the diversity parameter to obtain a truncated style vector, wherein the image is generated based on the truncated style vector. 10 . An apparatus for image generation, comprising: at least one processor; at least one memory storing instructions executable by the at least one processor; the apparatus further comprising a text encoder network comprising encoder parameters stored in the at least one memory, wherein the text encoder network is configured to encode a text prompt to obtain a global vector corresponding to the text prompt and a plurality of local vectors corresponding to individual tokens of the text prompt; a mapping network comprising mapping parameters stored in the at least one memory, wherein the mapping network is configured to generate a style vector based on the global vector and a noise vector; and an image generation network comprising image generation parameters stored in the at least one memory, wherein the image generation network is configured to generate an image corresponding to the text prompt based on the style vector and the plurality of local vectors. 11 . The apparatus of claim 10 , wherein: the text encoder network comprises a pretrained encoder and a learned encoder that is trained together with the image generation network. 12 . The apparatus of claim 10 , wherein: the image generation network comprises a generative adversarial network (GAN). 13 . The apparatus of claim 10 , wherein: the image generation network includes a convolution layer, a self-attention layer, and a cross-attention layer. 14 . The apparatus of claim 10 , wherein: the image generation network includes an adaptive convolution component configured to generate an adaptive convolution filter based on the style vector, wherein the image is generated based on the adaptive convolution filter. 15 . The apparatus of claim 10 , further comprising: a discriminator network configured to generate an image embedding and a conditioning embedding, wherein the discriminator network is trained together with the image generation network using an adversarial training loss based on the image embedding and the conditioning embedding. 16 . A method for image generation, comprising: obtaining a training dataset including a training image and text describing the training image; generating a predicted style vector based on the text and a noise vector using a mapping network; generating a predicted image based on the predicted style vector using an image generation network; generating an image embedding based on the predicted image and a conditioning embedding based on the text using a discriminator network; and training the image generation network based on the image embedding and the conditioning embedding. 17 . The method of claim 16 , further comprising: computing a generative adversarial network (GAN) loss based on the image embedding and the conditioning embedding, wherein the image generation network is trained based on the GAN loss. 18 . The method of claim 16 , further comprising: generating a mixed conditioning embedding based on an unrelated text; and computing a mixing loss based on the image embedding and the mixed conditioning embedding, wherein the image generation network is trained based on the mixing loss. 19 . The method of claim 16 , further comprising: encoding the text using a text encoder network that includes a pretrained encoder and a learned encoder, wherein the learned encoder is trained together with the image generation network. 20 . The method of claim 16 , further comprising: learning a feature map for an initial input to the image generation network.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524937B2 cover?
Systems and methods for image generation are provided. An aspect of the systems and methods includes obtaining a text prompt, generating a style vector based on the text prompt, generating an adaptive convolution filter based on the style vector, and generating an image corresponding to the text prompt based on the adaptive convolution filter.
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).