Scene-Based Text-to-Image Generation with Human Priors
US-2024221235-A1 · Jul 4, 2024 · US
US12586368B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586368-B2 |
| Application number | US-202318502704-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 6, 2023 |
| Priority date | Oct 13, 2023 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure relate to a method, an electronic device, and a computer program product for generating an image. The method includes acquiring a semantic segmentation graph by performing semantic segmentation on a source image. The method further includes acquiring a key word for describing a feature of a to-be-generated target image. The method further includes transforming the semantic segmentation graph by using the key word so as to acquire a transformed semantic segmentation graph. The method further includes generating the target image based on the transformed semantic segmentation graph. According to the method of embodiments of the present disclosure, a semantic segmentation graph of a source image and a key word can be used to generate a target image, so as to make the generated target image have a target feature and have semantic consistency with the source image, thereby generating a high-quality target image.
Opening claim text (preview).
What is claimed is: 1 . A method for generating an image, comprising: acquiring a semantic segmentation graph by performing semantic segmentation on a source image; acquiring a key word for describing a feature of a to-be-generated target image; transforming the semantic segmentation graph by using the key word so as to acquire a transformed semantic segmentation graph; and generating the target image based on the transformed semantic segmentation graph; wherein generating the target image based on the transformed semantic segmentation graph comprises determining the target image based on (i) comparison to an additional image using a first threshold and (ii) comparison to the source image using a second threshold. 2 . The method according to claim 1 , wherein the source image is an image in a training dataset, and the training dataset is used for training of a predetermined semantic segmentation model. 3 . The method according to claim 2 , further comprising: including the target image and the transformed semantic segmentation graph in the training dataset to enhance the training dataset, wherein the transformed semantic segmentation graph is used as annotation information of the target image. 4 . The method according to claim 2 , further comprising: mapping the source image and the key word to a predetermined feature space, wherein in the predetermined feature space, a distance between matched images and key words is less than a first predetermined distance, and a distance between mismatched images and key words is greater than a second predetermined distance; and wherein generating the target image based on the transformed semantic segmentation graph comprises that: when the generated target image is mapped to the predetermined feature space, a distance between the target image and the key word is less than the first predetermined distance, and a distance between the target image as well as the key word and the source image is greater than the second predetermined distance. 5 . The method according to claim 4 , wherein generating the target image based on the transformed semantic segmentation graph further comprises: making a difference between the generated target image and a real-world image less than a predetermined difference threshold, and making a similarity between the generated target image and the source image greater than a predetermined similarity threshold. 6 . The method according to claim 5 , wherein the method is executed by using a trained neural network model. 7 . The method according to claim 6 , wherein the trained neural network model comprises a first subnetwork model, a second subnetwork model, and a third subnetwork model, the first subnetwork model is used to map the source image and the key word to the predetermined feature space, the second subnetwork model is used to acquire the semantic segmentation graph by performing semantic segmentation on the source image, and the third subnetwork model is used to transform the semantic segmentation graph by using the key word so as to acquire the transformed semantic segmentation graph and generate the target image. 8 . The method according to claim 7 , further comprising: acquiring the first subnetwork model by training a first neural network model and a second neural network model, wherein the first neural network model is used to map an image to an image feature space, and the second neural network model is used to map a key word to a word feature space; acquiring a trained semantic segmentation model as the second subnetwork model, wherein the trained semantic segmentation model is different from the predetermined semantic segmentation model; and acquiring the third subnetwork model by training a third neural network model, and the third neural network model is based on a generative adversarial network (GAN) architecture. 9 . The method according to claim 8 , wherein training the first neural network model and the second neural network model comprises: performing joint training on the first neural network model and the second neural network model, so as to configure the trained first neural network model and second neural network model to map an input image and an input key word together to the predetermined feature space. 10 . The method according to claim 8 , wherein the third neural network model comprises a generator model and a discriminator model, the generator model is used to generate an output image based on an input semantic segmentation graph and an input key word, and the discriminator model is used to determine whether an image is a real-world image. 11 . The method according to claim 10 , wherein training the third neural network model comprises: performing joint training on the generator model and the discriminator model, so as to cause the trained discriminator model to determine an image having a difference from a real-world image less than the predetermined difference threshold as a real-world image, and cause the trained generator model to generate an output image meeting a predetermined condition, wherein the predetermined condition comprises that the output image is determined by the trained discriminator model as a real-world image. 12 . The method according to claim 11 , wherein the predetermined condition further comprises that: in the predetermined feature space, a distance between the output image and the input key word is less than the first predetermined distance; in the predetermined feature space, a distance between the output image as well as the input key word and an input image is greater than the second predetermined distance; and a similarity between the output image and the input image is greater than the predetermined similarity threshold. 13 . The method according to claim 1 , wherein the source image comprises at least one image, and the target image comprises at least one image corresponding to the source image. 14 . An electronic device, comprising: at least one processor; and memory coupled to the at least one processor and having instructions stored thereon, wherein the instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: acquiring a semantic segmentation graph by performing semantic segmentation on a source image; acquiring a key word for describing a feature of a to-be-generated target image; transforming the semantic segmentation graph by using the key word so as to acquire a transformed semantic segmentation graph; and generating the target image based on the transformed semantic segmentation graph; wherein generating the target image based on the transformed semantic segmentation graph comprises determining the target image based on (i) comparison to an additional image using a first threshold and (ii) comparison to the source image using a second threshold. 15 . The electronic device according to claim 14 , wherein the source image is an image in a training dataset, and the training dataset is used for training of a predetermined semantic segmentation model. 16 . The electronic device according to claim 15 , wherein the actions further comprise: including the target image and the transformed semantic segmentation graph in the training dataset to enhance the training dataset, wherein the transformed semantic segmentation graph is used as annotation information of the target image. 17 . The electronic device according to claim 15 , wherein the actions further comprise: mapping the source image and the key word to a pre
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
using neural networks · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Two-dimensional [2D] image generation · CPC title
using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognition; using graph matching · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.