Image generation using one or more neural networks
US-2021097691-A1 · Apr 1, 2021 · US
US11720994B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720994-B2 |
| Application number | US-202117321384-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 14, 2021 |
| Priority date | May 14, 2021 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and method directed to an inversion-consistent transfer learning framework for generating portrait stylization using only limited exemplars. In examples, an input image is received and encoded using a variational autoencoder to generate a latent vector. The latent vector may be provided to a generative adversarial network (GAN) generator to generate a stylized image. In examples, the variational autoencoder is trained using a plurality of images while keeping the weights of a pre-trained GAN generator fixed, where the pre-trained GAN generator acts as a decoder for the encoder. In other examples, a multi-path attribute aware generator is trained using a plurality of exemplar images and learning transfer using the pre-trained GAN generator.
Opening claim text (preview).
What is claimed is: 1. A method for generating a stylized image, the method comprising: receiving an input image; encoding the input image using a variational autoencoder to obtain a latent vector by: passing the received input image through a headless pyramid network to produce multiple levels of features maps at different sizes; encoding, for each of the levels of features maps at different sizes, each level's respective feature map at the different size with a separate encoder of a plurality of encoders to produce a code, and combining the encoded code of each level's respective feature map to obtain the latent vector; providing the latent vector to a pre-trained generative adversarial network (GAN) model; generating, by the pre-trained GAN model, a stylized image from the pre-trained GAN model, the generated stylized image being a cartoon style image of the input image; and providing the stylized image as an output, wherein the pre-trained GAN model includes a multi-path structure corresponding to two or more different attributes. 2. The method of claim 1 , further comprising: receiving a plurality of exemplar images; training a GAN model using transfer learning based on the received plurality of exemplar images; and terminating the process of training when the output of the GAN model satisfies a predetermined condition at a first time to produce the pre-trained GAN model. 3. The method of claim 2 , further comprising: receiving a plurality of training images; and training the variational autoencoder while keeping the weights of the pre-trained GAN model fixed. 4. The method of claim 1 , wherein the latent vector is sampled from a standard Gaussian distribution. 5. The method of claim 4 , further comprising: mapping the latent vector to an intermediate vector; and forwarding the intermediate vector to an affine transform within a style block of the pre-trained GAN model. 6. The method of claim 1 , wherein the pre-trained GAN model comprises a pre-trained StyleGAN2 model. 7. A system configured to generate a stylized image, the system comprising: a processor; and memory including instructions, which when executed by the processor, causes the processor to: receive an input image; encode the input image using a variational autoencoder to obtain a latent vector by: passing the received input image through a headless pyramid network to produce multiple levels of features maps at different sizes; encoding, for each of the levels of features maps at different sizes, each level's respective feature map at the different size with a separate encoder of a plurality of encoders to produce a code, and combining the encoded code of each level's respective feature map to obtain the latent vector; provide the latent vector to a pre-trained generative adversarial network (GAN) model; generate, by the pre-trained GAN model, a stylized image from the pre-trained GAN model, the generated stylized image being a cartoon style image of the input image; and provide the stylized image as an output, wherein the pre-trained GAN model includes a multi-path structure corresponding to two or more different attributes. 8. The system of claim 7 , wherein the instructions, when executed by the processor, cause the processor to: receive a plurality of exemplar images; train the GAN model using transfer learning based on a pre-trained GAN model and the received plurality of exemplar images and terminate the process of training when the output of the GAN model satisfies a predetermined condition at a first time to produce the pre-trained GAN model. 9. The system of claim 8 , wherein the instructions, when executed by the processor, cause the processor to: receive a plurality of training images; and training the variational autoencoder while keeping the weights of the pre-trained GAN model fixed. 10. The system of claim 7 , wherein the latent vector is sampled from a standard Gaussian distribution. 11. The system of claim 10 , wherein the instructions, when executed by the processor, cause the processor to: map the latent vector to an intermediate vector; and forward the intermediate vector to an affine transform within a style block of the pre-trained GAN model. 12. The system of claim 7 , wherein the pre-trained GAN model comprises a pre-trained StyleGAN2 model. 13. A non-transitory computer-readable storage medium including instructions, which when executed by a processor, cause the processor to: receive an input image; encode the input image using a variational autoencoder to obtain a latent vector by: passing the received input image through a headless pyramid network to produce multiple levels of features maps at different sizes; encoding, for each of the levels of features maps at different sizes, each level's respective feature map at the different size with a separate encoder of a plurality of encoders to produce a code, and combining the encoded code of each level's respective feature map to obtain the latent vector; provide the latent vector to a pre-trained generative adversarial network (GAN) model; generate, by the pre-trained GAN model, a stylized image from the pre-trained GAN model, the generated stylized image being a cartoon style image of the input image; and provide the stylized image as an output, wherein the pre-trained GAN model includes a multi-path structure corresponding to two or more different attributes. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions, which when executed by a processor, cause the processor to: map a latent vector sampled from a standard Gaussian distribution to an intermediate vector; and forward the intermediate vector to an affine transform within a style block of the pre-trained GAN model. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the combined code from each level's respective feature map to obtain the latent vector is passed to fully connected layers to generate means and standard deviations representing Gaussian importance distribution in a Z+ space. 16. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions, which when executed by a processor, cause the processor to: receive a plurality of exemplar images including cartoon characters; train GAN model using transfer learning based on the received plurality of exemplar images; and terminating the process of training after at most 1200 interactions to produce the pre-trained GAN model. 17. The non-transitory computer-readable storage medium of claim 13 , wherein the pre-trained GAN model comprises a pre-trained StyleGAN2 model.
Adversarial learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Transfer learning · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.