Realistic neural network based image style transfer
US-10891723-B1 · Jan 12, 2021 · US
US11257276B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11257276-B2 |
| Application number | US-202016895734-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 8, 2020 |
| Priority date | Mar 5, 2020 |
| Publication date | Feb 22, 2022 |
| Grant date | Feb 22, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for generating digital faces. In some examples, a style-based generator receives as inputs initial tensor(s) and style vector(s) corresponding to user-selected semantic attribute styles, such as the desired expression, gender, age, identity, and/or ethnicity of a digital face. The style-based generator is trained to process such inputs and output low-resolution appearance map(s) for the digital face, such as a texture map, a normal map, and/or a specular roughness map. The low-resolution appearance map(s) are further processed using a super-resolution generator that is trained to take the low-resolution appearance map(s) and low-resolution 3D geometry of the digital face as inputs and output high-resolution appearance map(s) that align with high-resolution 3D geometry of the digital face. Such high-resolution appearance map(s) and high-resolution 3D geometry can then be used to render standalone images or the frames of a video that include the digital face.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for rendering one or more images of a digital face, the method comprising: generating, via a first machine learning model, one or more first appearance maps based on a user selection of one or more styles associated with one or more attributes of a digital face; generating, via a second machine learning model, one or more second appearance maps and a first three-dimensional (3D) geometry associated with the digital face based on jointly upsampling the one or more first appearance maps and a second 3D geometry associated with the digital face, wherein the one or more second appearance maps are aligned with the first 3D geometry; and rendering one or more images including the digital face based on the one or more second appearance maps and the first 3D geometry. 2. The computer-implemented method of claim 1 , wherein the one or more first appearance maps have a lower resolution than the one or more second appearance maps, and the second 3D geometry has a lower resolution than the first 3D geometry. 3. The computer-implemented method of claim 1 , wherein generating the one or more first appearance maps based on the user selection of one or more styles comprises controlling one or more adaptive instance normalization (AdaIN) operations based on the user selection of one or more styles. 4. The computer-implemented method of claim 3 , wherein the one or more AdaIN operations are performed in conjunction with one or more convolution operations. 5. The computer-implemented method of claim 4 , wherein the first machine learning model comprises a plurality of semantics transfer blocks, and performing the one or more AdaIN operations in conjunction with the one or more convolution operations comprises, for each semantics transfer block included in the plurality of semantics transfer blocks: performing multiple sets of convolution operations and AdaIN operations in parallel; determining a weighted sum of outputs of the multiple sets of convolution operations and AdaIN operations; and upscaling the weighted sum of outputs. 6. The computer-implemented method of claim 1 , wherein the one or more second appearance maps include at least one of a texture map, a normal map, or a specular roughness map. 7. The computer-implemented method of claim 1 , wherein the one or more attributes of the digital face include at least one of expression, gender, age, identity, or ethnicity. 8. The computer-implemented method of claim 1 , wherein the second machine learning model comprises a super-resolution generator. 9. The computer-implemented method of claim 1 , wherein the first 3D geometry associated with the digital face conveys a non-neutral facial expression. 10. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to perform steps for rendering one or more images of a digital face, the steps comprising: generating, via a first machine learning model, one or more first appearance maps based on a user selection of one or more styles associated with one or more attributes of a digital face; generating, via a second machine learning model, one or more second appearance maps and a first three-dimensional (3D) geometry associated with the digital face based on jointly upsampling the one or more first appearance maps and a second 3D geometry associated with the digital face, wherein the one or more second appearance maps are aligned with the first 3D geometry; and rendering one or more images including the digital face based on the one or more second appearance maps and the first 3D geometry. 11. The computer-readable storage medium of claim 10 , wherein generating the one or more first appearance maps based on the user selection of one or more styles comprises controlling one or more adaptive instance normalization (AdaIN) operations based on the user selection of one or more styles. 12. The computer-readable storage medium of claim 11 , wherein the first machine learning model comprises a plurality of semantics transfer blocks, and generating the one or more first appearance maps based on the user selection of one or more styles includes, for each semantics transfer block included in the plurality of semantics transfer blocks: performing multiple sets of convolution operations and AdaIN operations in parallel; determining a weighted sum of outputs of the multiple sets of convolution operations and AdaIN operations; and upscaling the weighted sum of outputs. 13. The computer-readable storage medium of claim 12 , wherein one or more weights used in the weighted sum of outputs, the multiple sets of convolution operations and AdaIN operations, one or more initial tensors, and one or more style vectors associated with the one or more attributes of the digital face are determined while training the first machine learning model. 14. The computer-readable storage medium of claim 10 , wherein the first machine learning model is trained using a progressive training technique. 15. The computer-readable storage medium of claim 10 , wherein the second machine learning model is trained using ground truth and adversarial learning techniques. 16. The computer-readable storage medium of claim 10 , wherein the one or more first appearance maps have a lower resolution than the one or more second appearance maps, and the second 3D geometry has a lower resolution than the first 3D geometry. 17. The computer-readable storage medium of claim 10 , wherein the one or more second appearance maps include at least one of a texture map, a normal map, or a specular roughness map. 18. The computer-readable storage medium of claim 10 , wherein the one or more attributes of the digital face include at least one of expression, gender, age, identity, or ethnicity. 19. The computer-readable storage medium of claim 10 , wherein the second machine learning model comprises a super-resolution generator. 20. A computing device comprising: a memory storing an application; and a processor coupled to the memory, wherein when executed by the processor, the application causes the processor to: generate, via a first machine learning model, one or more first appearance maps based on a user selection of one or more styles associated with one or more attributes of a digital face; generate, via a second machine learning model, one or more second appearance maps and a first three-dimensional (3D) geometry associated with the digital face based on jointly upsampling the one or more first appearance maps and a second 3D geometry associated with the digital face, wherein the one or more second appearance maps are aligned with the first 3D geometry; and render one or more images including the digital face based on the one or more second appearance maps and the first 3D geometry.
Probabilistic or stochastic networks · CPC title
Combinations of networks · CPC title
Generative networks · CPC title
Supervised learning · CPC title
Adversarial learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.