Device and method for training a machine learning system for generating images
US-2022262106-A1 · Aug 18, 2022 · US
US12586344B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586344-B2 |
| Application number | US-202217971169-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 21, 2022 |
| Priority date | Oct 21, 2022 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An image generation system implements a multi-branch GAN to generate images that each express visually similar content in a different modality. A generator portion of the multi-branch GAN includes multiple branches that are each tasked with generating one of the different modalities. A discriminator portion of the multi-branch GAN includes multiple fidelity discriminators, one for each of the generator branches, and a consistency discriminator, which constrains the outputs generated by the different generator branches to appear visually similar to one another. During training, outputs from each of the fidelity discriminators and the consistency discriminator are used to compute a non-saturating GAN loss. The non-saturating GAN loss is used to refine parameters of the multi-branch GAN during training until model convergence. The trained multi-branch GAN generates multiple images from a single input, where each of the multiple images depicts visually similar content expressed in a different modality.
Opening claim text (preview).
What is claimed is: 1 . A system comprising: a generator portion of a generative adversarial network (GAN) machine learning model that includes a plurality of branches that are each configured to generate a different image from a single input; and a discriminator portion of the GAN machine learning model that includes: a plurality of fidelity discriminators, each of the plurality of fidelity discriminators corresponding to a respective one of the plurality of branches of the generator portion; and a consistency discriminator configured to receive concatenated outputs from the plurality of branches of the generator portion and ensure, based on the concatenated outputs, that the different images generated by the plurality of branches of the generator portion are visually similar. 2 . The system of claim 1 , wherein the generator portion includes a plurality of shallow layers, each of the plurality of shallow layers configured to process the single input and generate a shared output that is provided as an input to each of the plurality of branches. 3 . The system of claim 1 , wherein each of the plurality of branches include a plurality of deep generator layers that are configured to generate a feature map and a convolutional layer configured to generate an image from the feature map according to a modality associated with the branch of the generator portion. 4 . The system of claim 3 , wherein the plurality of deep generator layers for one of the plurality of branches shares a common structure with the plurality of deep generator layers for others of the plurality of branches. 5 . The system of claim 1 , wherein the plurality of branches include: a first branch configured to generate a red, green, blue (RGB) image from the single input; a second branch configured to generate a segmentation map image from the single input; a third branch configured to generate a depth map image from the single input; and a fourth branch configured to generate a surface normal image from the single input. 6 . The system of claim 1 , wherein each of the plurality of fidelity discriminators is trained to output a judgment indicating whether an image generated by a corresponding one of the plurality of branches of the generator portion is visually realistic. 7 . The system of claim 1 , wherein internal weights of the generator portion, each of the plurality of fidelity discriminators, and the consistency discriminator are trained using a non-saturating GAN loss computed during training of the GAN machine learning model. 8 . A method comprising: generating, by a processing device, a plurality of different images from a single input using a generative adversarial network (GAN) machine learning model that includes a generator portion and a discriminator portion by: causing each of a plurality of different branches of the generator portion to generate one of the plurality of different images; and causing a consistency discriminator of the discriminator portion to ensure that the plurality of different images are visually similar to one another by processing concatenated outputs from the plurality of different branches of the generator portion. 9 . The method of claim 8 , wherein each of the plurality of different images depicts a common scene or environment expressed in a modality that is different from other ones of the plurality of different images. 10 . The method of claim 9 , wherein the modality comprises one of a red, green, blue (RGB) image, a segmentation map image, a depth map image, or a surface normal image. 11 . The method of claim 8 , wherein generating the plurality of different images from the single input comprises generating a shared output by processing the single input using a plurality of shallow layers of the generator portion and inputting the shared output to each of the plurality of different branches, wherein each of the plurality of different branches include deep layers of the generator portion. 12 . The method of claim 8 , wherein generating the plurality of different images from the single input further comprises causing, for each of the plurality of different branches of the generator portion, a fidelity discriminator in the discriminator portion to output a judgment indicating whether an output of the branch of the generator portion is realistic. 13 . The method of claim 8 , wherein generating the plurality of different images from the single input comprises generating modulation parameters by processing the single input using a mapping network and providing the modulation parameters as input to each layer of the generator portion of the GAN machine learning model. 14 . The method of claim 8 , wherein the single input comprises a latent space code. 15 . The method of claim 14 , further comprising determining the latent space code based on an input image, wherein the plurality of different images comprise two or more of a segmentation map image, a depth map image, or a surface normal image for the input image. 16 . The method of claim 8 , further comprising training the GAN machine learning model by: causing the plurality of different branches of the generator portion to output a plurality of different training images from a training input; causing, for each of the plurality of different branches of the generator portion, a corresponding one of a plurality of fidelity discriminators of the discriminator portion to output a judgment indicating whether the branch of the generator portion output a visually realistic training image; causing the consistency discriminator to output a judgment indicating whether the plurality of different training images are visually similar to one another; computing a loss function based on judgments output by the plurality of fidelity discriminators and the consistency discriminator; and updating at least one internal weight of the GAN machine learning model using the loss function. 17 . A method comprising: generating, by a processing device, a trained generative adversarial network (GAN) machine learning model to output a plurality of different images from a single input by: causing each of a plurality of different branches of a generator portion of the GAN machine learning model to generate a different image from the single input; causing, for each of the plurality of different branches of the generator portion, a corresponding one of a plurality of fidelity discriminators of the GAN machine learning model to output a judgment indicating whether the branch of the generator portion output a visually realistic image; causing a consistency discriminator of the GAN machine learning model to output a judgment indicating whether the plurality of different images are visually similar to one another by processing concatenated outputs from the plurality of different branches of the generator portion; computing a loss function based on judgments output by the plurality of fidelity discriminators and the consistency discriminator; and updating at least one internal weight of the GAN machine learning model based on the loss function. 18 . The method of claim 17 , wherein each of the plurality of different images expresses visual content using a different modality. 19 . The method of claim 18 , wherein the different modality expressed by one of the plurality of different images includes a red, green, blue (RGB) image and the different modality expressed by another one of the plurality of different images includes a depth map image, a segmentation map image,
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Volume rendering · CPC title
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
using neural networks · CPC title
using pattern recognition or machine learning (optical pattern recognition or electronic computations therefor G06V10/88) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.