High-resolution portrait stylization frameworks using a hierarchical variational encoder

US11720994B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11720994-B2
Application numberUS-202117321384-A
CountryUS
Kind codeB2
Filing dateMay 14, 2021
Priority dateMay 14, 2021
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and method directed to an inversion-consistent transfer learning framework for generating portrait stylization using only limited exemplars. In examples, an input image is received and encoded using a variational autoencoder to generate a latent vector. The latent vector may be provided to a generative adversarial network (GAN) generator to generate a stylized image. In examples, the variational autoencoder is trained using a plurality of images while keeping the weights of a pre-trained GAN generator fixed, where the pre-trained GAN generator acts as a decoder for the encoder. In other examples, a multi-path attribute aware generator is trained using a plurality of exemplar images and learning transfer using the pre-trained GAN generator.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating a stylized image, the method comprising: receiving an input image; encoding the input image using a variational autoencoder to obtain a latent vector by: passing the received input image through a headless pyramid network to produce multiple levels of features maps at different sizes; encoding, for each of the levels of features maps at different sizes, each level's respective feature map at the different size with a separate encoder of a plurality of encoders to produce a code, and combining the encoded code of each level's respective feature map to obtain the latent vector; providing the latent vector to a pre-trained generative adversarial network (GAN) model; generating, by the pre-trained GAN model, a stylized image from the pre-trained GAN model, the generated stylized image being a cartoon style image of the input image; and providing the stylized image as an output, wherein the pre-trained GAN model includes a multi-path structure corresponding to two or more different attributes. 2. The method of claim 1 , further comprising: receiving a plurality of exemplar images; training a GAN model using transfer learning based on the received plurality of exemplar images; and terminating the process of training when the output of the GAN model satisfies a predetermined condition at a first time to produce the pre-trained GAN model. 3. The method of claim 2 , further comprising: receiving a plurality of training images; and training the variational autoencoder while keeping the weights of the pre-trained GAN model fixed. 4. The method of claim 1 , wherein the latent vector is sampled from a standard Gaussian distribution. 5. The method of claim 4 , further comprising: mapping the latent vector to an intermediate vector; and forwarding the intermediate vector to an affine transform within a style block of the pre-trained GAN model. 6. The method of claim 1 , wherein the pre-trained GAN model comprises a pre-trained StyleGAN2 model. 7. A system configured to generate a stylized image, the system comprising: a processor; and memory including instructions, which when executed by the processor, causes the processor to: receive an input image; encode the input image using a variational autoencoder to obtain a latent vector by: passing the received input image through a headless pyramid network to produce multiple levels of features maps at different sizes; encoding, for each of the levels of features maps at different sizes, each level's respective feature map at the different size with a separate encoder of a plurality of encoders to produce a code, and combining the encoded code of each level's respective feature map to obtain the latent vector; provide the latent vector to a pre-trained generative adversarial network (GAN) model; generate, by the pre-trained GAN model, a stylized image from the pre-trained GAN model, the generated stylized image being a cartoon style image of the input image; and provide the stylized image as an output, wherein the pre-trained GAN model includes a multi-path structure corresponding to two or more different attributes. 8. The system of claim 7 , wherein the instructions, when executed by the processor, cause the processor to: receive a plurality of exemplar images; train the GAN model using transfer learning based on a pre-trained GAN model and the received plurality of exemplar images and terminate the process of training when the output of the GAN model satisfies a predetermined condition at a first time to produce the pre-trained GAN model. 9. The system of claim 8 , wherein the instructions, when executed by the processor, cause the processor to: receive a plurality of training images; and training the variational autoencoder while keeping the weights of the pre-trained GAN model fixed. 10. The system of claim 7 , wherein the latent vector is sampled from a standard Gaussian distribution. 11. The system of claim 10 , wherein the instructions, when executed by the processor, cause the processor to: map the latent vector to an intermediate vector; and forward the intermediate vector to an affine transform within a style block of the pre-trained GAN model. 12. The system of claim 7 , wherein the pre-trained GAN model comprises a pre-trained StyleGAN2 model. 13. A non-transitory computer-readable storage medium including instructions, which when executed by a processor, cause the processor to: receive an input image; encode the input image using a variational autoencoder to obtain a latent vector by: passing the received input image through a headless pyramid network to produce multiple levels of features maps at different sizes; encoding, for each of the levels of features maps at different sizes, each level's respective feature map at the different size with a separate encoder of a plurality of encoders to produce a code, and combining the encoded code of each level's respective feature map to obtain the latent vector; provide the latent vector to a pre-trained generative adversarial network (GAN) model; generate, by the pre-trained GAN model, a stylized image from the pre-trained GAN model, the generated stylized image being a cartoon style image of the input image; and provide the stylized image as an output, wherein the pre-trained GAN model includes a multi-path structure corresponding to two or more different attributes. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions, which when executed by a processor, cause the processor to: map a latent vector sampled from a standard Gaussian distribution to an intermediate vector; and forward the intermediate vector to an affine transform within a style block of the pre-trained GAN model. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the combined code from each level's respective feature map to obtain the latent vector is passed to fully connected layers to generate means and standard deviations representing Gaussian importance distribution in a Z+ space. 16. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions, which when executed by a processor, cause the processor to: receive a plurality of exemplar images including cartoon characters; train GAN model using transfer learning based on the received plurality of exemplar images; and terminating the process of training after at most 1200 interactions to produce the pre-trained GAN model. 17. The non-transitory computer-readable storage medium of claim 13 , wherein the pre-trained GAN model comprises a pre-trained StyleGAN2 model.

Assignees

Inventors

Classifications

  • Adversarial learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Transfer learning · CPC title

  • Generative networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720994B2 cover?
Systems and method directed to an inversion-consistent transfer learning framework for generating portrait stylization using only limited exemplars. In examples, an input image is received and encoded using a variational autoencoder to generate a latent vector. The latent vector may be provided to a generative adversarial network (GAN) generator to generate a stylized image. In examples, the va…
Who is the assignee on this patent?
Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).