Systems and methods for unified vision-language understanding and generation
US-2023237773-A1 · Jul 27, 2023 · US
US2025069280A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025069280-A1 |
| Application number | US-202218724633-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 28, 2022 |
| Priority date | May 20, 2022 |
| Publication date | Feb 27, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An image generating method and apparatus, and a device and a medium are disclosed. The method comprises: acquiring weakly correlated image-text data pairs, and creating an image-text data set according to the weakly correlated image-text data pairs, wherein the weakly correlated image-text data pairs are image-text data pairs in which images and texts have weak correlations (S 11 ); training, by using the image-text data set, an image generation model which is preconstructed on the basis of an adversarial network, so as to obtain a trained image generation model, wherein the image generation model includes a generator for generating an image, and a discriminator for identifying the authenticity of the image and calculating a corresponding loss value (S 12 ); and after when text data to be processed has been acquired, generating, by using the trained image generation model, an image corresponding to the said text data (S 13 ).
Opening claim text (preview).
1 . An image generation method, comprising: acquiring weakly correlated image-text data pairs, and creating an image-text dataset based on the weakly correlated image-text data pairs, wherein the weakly correlated image-text data pairs are image-text data pairs with weak correlation between an image and text; training an image generation model pre-constructed based on an adversarial network by using the image-text dataset, and obtaining a trained image generation model, wherein the image generation model comprises a generator for generating an image and a discriminator for discriminating authenticity of an image and calculating a corresponding loss value; and generating an image corresponding to to-be-processed text data by using the trained image generation model when the to-be-processed text data is acquired. 2 . The image generation method according to claim 1 , wherein the training the image generation model pre-constructed based on the adversarial network by using the image-text dataset comprises: determining a target text from the image-text dataset and generating a corresponding first target image based on the target text by using the generator in the image generation model; determining a second target image corresponding to the target text from the image-text dataset performing a global feature comparison and a local feature comparison between the first target image and the second target image to obtain corresponding feature comparison results, and determining an adversarial loss value corresponding to the first target image based on the feature comparison results by using the discriminator in the image generation model, wherein the adversarial loss value is a probability value for indicating authenticity of an image; and determining an authenticity discrimination result of the first target image based on the adversarial loss value. 3 . The image generation method according to claim 2 , wherein the generating the corresponding first target image based on the target text comprises: processing the target text by using a predetermined language processing tool to determine a target entity in the target text; determining a to-be-expanded entity based on the target entity by using a predetermined knowledge-graph technique, and constructing a corresponding entity candidate set based on the to-be-expanded entity and the target entity; inputting the target text and the entity candidate set into a predetermined conversion model to obtain text semantic embedding and entity semantic embedding which are output by the conversion model and correspond to the target text and the entity candidate set respectively; and generating the first target image based on predetermined random noise, the text semantic embedding and the entity semantic embedding. 4 . The image generation method according to claim 3 , wherein the generating the first target image based on the predetermined random noise, the text semantic embedding and the entity semantic embedding comprises: inputting the predetermined random noise, the text semantic embedding and the entity semantic embedding into a predetermined multilayer perceptron, to obtain an affine transformation parameter; determining a target hidden-layer feature value based on the affine transformation parameter, and adjusting a current hidden-layer feature value to the target hidden-layer feature value, to obtain a global condition for constraining a pixel value of the generated first target image; and generating the first target image based on the global condition by using a pre-connected up-sampling layer. 5 . The image generation method according to claim 3 , wherein the method further comprises: calculating a loss value of the generator based on a predetermined batch size of text, an image corresponding to the text and an entity candidate set corresponding to the text by using a predetermined first loss function; calculating a loss value of the discriminator based on the same batch of text, the image corresponding to the text and the entity candidate set corresponding to the text by using a predetermined second loss function; and determining a network parameter affecting the loss value of the generator and the loss value of the discriminator, and optimizing and updating the network parameter by using a predetermined optimizer. 6 . The image generation method according to claim 5 , wherein after the optimizing and updating the network parameter by using the predetermined optimizer, the method further comprises: recording a number of times of optimizing and updating by using a predetermined counter; determining whether the number of times of optimizing and updating satisfies a predetermined target number of times of optimizing; and terminating the training when the number of times of optimizing and updating satisfies the predetermined target number of times of optimizing. 7 . The image generation method according to claim 1 , wherein the acquiring the weakly correlated image-text data pairs comprises: acquiring information about public social networking websites, and determining a target website based on the information about public social networking websites; and crawling weakly correlated image-text data in the target website, and generating weakly correlated image-text data pairs based on the weakly correlated image-text data. 8 . The image generation method according to claim 1 , wherein after the obtaining the trained image generation model, the method further comprises: testing the trained image generation model based on the weakly correlated image-text data pairs in the image-text dataset; and after the trained image generation model passes the testing, generating the image corresponding to the to-be-processed text data by using the trained image generation model when the to-be-processed text data is acquired. 9 . The image generation method according to claim 1 , wherein after the creating the image-text dataset based on the weakly correlated image-text data pairs and before the training the image generation model pre-constructed based on the adversarial network by using the image-text dataset, the method further comprises: expanding the image-text dataset based on a knowledge-graph technique of a knowledge base. 10 . The image generation method according to claim 4 , wherein the inputting the predetermined random noise, the text semantic embedding and the entity semantic embedding into the predetermined multilayer perceptron to obtain the affine transformation parameter comprises: connecting the predetermined random noise, the text semantic embedding and the entity semantic embedding based on a predetermined connection function; and inputting the predetermined random noise, the text semantic embedding and the entity semantic embedding which are connected into the predetermined multilayer perceptron to obtain the affine-transformation parameter. 11 . The image generation method according to claim 3 , wherein the constructing the corresponding entity candidate set based on the to-be-expanded entity and the target entity comprises: combining the target entity and the to-be-expanded entity, to obtain an entity candidate set. 12 . The image generation method according to claim 4 , wherein the adjusting the current hidden-layer feature value to the target hidden-layer feature value comprises: directly modifying the current hidden-layer feature value to the target hidden-layer feature value. 13 . The image generation method according to claim 4 , wherein the determining a target hidden-layer feature value based on the affine transformation parameter, and adjusting a current hidden-layer
Pattern recognition · CPC title
Recognition of textual entities · CPC title
Semantic analysis · CPC title
Two-dimensional [2D] image generation · CPC title
Image or video pattern matching; Proximity measures in feature spaces · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.