What technology area does this patent fall under?

Primary CPC classification G06F40/284. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Text-based image generation

US12524937B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12524937-B2
Application number	US-202318170963-A
Country	US
Kind code	B2
Filing date	Feb 17, 2023
Priority date	Feb 17, 2023
Publication date	Jan 13, 2026
Grant date	Jan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image generation are provided. An aspect of the systems and methods includes obtaining a text prompt, generating a style vector based on the text prompt, generating an adaptive convolution filter based on the style vector, and generating an image corresponding to the text prompt based on the adaptive convolution filter.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for image generation, comprising: obtaining a text prompt; generating a style vector based on the text prompt; generating an adaptive convolution filter by averaging a plurality of weights across a plurality of convolution filters of a convolution layer based on the style vector, wherein the adaptive convolution filter comprises a convolution matrix corresponding to a style of the style vector; and generating an image corresponding to the text prompt based on the adaptive convolution filter. 2 . The method of claim 1 , further comprising: encoding the text prompt to obtain a text embedding; and transforming the text embedding to obtain a global vector corresponding to the text prompt as a whole and a plurality of local vectors corresponding to individual tokens of the text prompt, wherein the style vector is generated based on the global vector and the image is generated based on the plurality of local vectors. 3 . The method of claim 2 , further comprising: performing a cross-attention process based on the plurality of local vectors, wherein the image is generated based on the cross-attention process. 4 . The method of claim 2 , further comprising: obtaining a noise vector, wherein the style vector is based on the noise vector. 5 . The method of claim 1 , further comprising: initializing a feature map; and performing a convolution process on the feature map based on the adaptive convolution filter, wherein the image is generated based on the convolution process. 6 . The method of claim 5 , further comprising: performing a self-attention process based on the feature map, wherein the image is generated based on the self-attention process. 7 . The method of claim 6 , wherein: the self-attention process is based on an L2 distance. 8 . The method of claim 1 , further comprising: identifying a plurality of predetermined convolution filters; and combining the plurality of predetermined convolution filters based on the style vector to obtain the adaptive convolution filter. 9 . The method of claim 1 , further comprising: identifying a diversity parameter; and truncating the style vector based on the diversity parameter to obtain a truncated style vector, wherein the image is generated based on the truncated style vector. 10 . An apparatus for image generation, comprising: at least one processor; at least one memory storing instructions executable by the at least one processor; the apparatus further comprising a text encoder network comprising encoder parameters stored in the at least one memory, wherein the text encoder network is configured to encode a text prompt to obtain a global vector corresponding to the text prompt and a plurality of local vectors corresponding to individual tokens of the text prompt; a mapping network comprising mapping parameters stored in the at least one memory, wherein the mapping network is configured to generate a style vector based on the global vector and a noise vector; and an image generation network comprising image generation parameters stored in the at least one memory, wherein the image generation network is configured to generate an image corresponding to the text prompt based on the style vector and the plurality of local vectors. 11 . The apparatus of claim 10 , wherein: the text encoder network comprises a pretrained encoder and a learned encoder that is trained together with the image generation network. 12 . The apparatus of claim 10 , wherein: the image generation network comprises a generative adversarial network (GAN). 13 . The apparatus of claim 10 , wherein: the image generation network includes a convolution layer, a self-attention layer, and a cross-attention layer. 14 . The apparatus of claim 10 , wherein: the image generation network includes an adaptive convolution component configured to generate an adaptive convolution filter based on the style vector, wherein the image is generated based on the adaptive convolution filter. 15 . The apparatus of claim 10 , further comprising: a discriminator network configured to generate an image embedding and a conditioning embedding, wherein the discriminator network is trained together with the image generation network using an adversarial training loss based on the image embedding and the conditioning embedding. 16 . A method for image generation, comprising: obtaining a training dataset including a training image and text describing the training image; generating a predicted style vector based on the text and a noise vector using a mapping network; generating a predicted image based on the predicted style vector using an image generation network; generating an image embedding based on the predicted image and a conditioning embedding based on the text using a discriminator network; and training the image generation network based on the image embedding and the conditioning embedding. 17 . The method of claim 16 , further comprising: computing a generative adversarial network (GAN) loss based on the image embedding and the conditioning embedding, wherein the image generation network is trained based on the GAN loss. 18 . The method of claim 16 , further comprising: generating a mixed conditioning embedding based on an unrelated text; and computing a mixing loss based on the image embedding and the mixed conditioning embedding, wherein the image generation network is trained based on the mixing loss. 19 . The method of claim 16 , further comprising: encoding the text using a text encoder network that includes a pretrained encoder and a learned encoder, wherein the learned encoder is trained together with the image generation network. 20 . The method of claim 16 , further comprising: learning a feature map for an initial input to the image generation network.

Assignees

Adobe Inc

Inventors

Classifications

G06F40/151
Transformation · CPC title
G06T2207/20004
Adaptive image processing · CPC title
G06T2207/20081
Training; Learning · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title

Patent family

Related publications grouped by family.

View patent family 92304499

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524937B2 cover?: Systems and methods for image generation are provided. An aspect of the systems and methods includes obtaining a text prompt, generating a style vector based on the text prompt, generating an adaptive convolution filter based on the style vector, and generating an image corresponding to the text prompt based on the adaptive convolution filter.
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).