Generating modified digital images utilizing a global and spatial autoencoder

US11893763B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11893763-B2
Application numberUS-202218058163-A
CountryUS
Kind codeB2
Filing dateNov 22, 2022
Priority dateMay 14, 2020
Publication dateFeb 6, 2024
Grant dateFeb 6, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, methods, and non-transitory computer readable media for generating a modified digital image from extracted spatial and global codes. For example, the disclosed systems can utilize a global and spatial autoencoder to extract spatial codes and global codes from digital images. The disclosed systems can further utilize the global and spatial autoencoder to generate a modified digital image by combining extracted spatial and global codes in various ways for various applications such as style swapping, style blending, and attribute editing.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computing device to: extract from a first digital image, utilizing an encoder neural network comprising a first set of blocks for encoding global features and a second set of blocks for encoding spatial features, a spatial code comprising features representing a geometric layout of the first digital image by extracting intermediate features using the second set of blocks to upsample features extracted from the first set of blocks; extract from a second digital image, utilizing the encoder neural network, a global code comprising features representing overall style properties of the second digital image; and generate a modified digital image by combining the spatial code with the global code utilizing a generator neural network. 2. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the global code by extracting, from the second digital image, global features utilizing the first set of blocks comprising residual blocks for increasing channel dimension and reducing resolution. 3. The non-transitory computer readable medium of claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the global features by utilizing the residual blocks to decrease spatial resolution and increase channel dimension for the global code. 4. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the spatial code by: extracting, from the first digital image, the intermediate features utilizing the second set of blocks within the encoder neural network, wherein the second set of blocks comprises layout blocks for upsampling features extracted by the first set of blocks; and determining a spatial average of the intermediate features from the second set of blocks. 5. The non-transitory computer readable medium of claim 4 , further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the spatial code by utilizing the layout blocks to increase spatial resolution and decrease channel dimension. 6. The non-transitory computer readable medium of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the modified digital image to reflect pixels arranged in the geometric layout of the first digital image and having a color scheme of the second digital image. 7. The non-transitory computer readable medium of claim 1 , wherein: the first set of blocks within the encoder neural network comprises residual blocks that include multiple branches of convolutional layers; and the second set of blocks within the encoder neural network comprises layout blocks that include convolutional layers, upsample layers, and blur layers. 8. A system comprising: one or more memory devices comprising a digital image and a global and spatial autoencoder comprising an encoder neural network and a generator neural network; and one or more computing devices that are configured to cause the system to: extract from the digital image, utilizing the encoder neural network, a spatial code comprising features representing a geometric layout of the digital image and a global code representing overall style properties of the digital image; determine an image attribute represented by one or more of the spatial code or the global code of the digital image; detect a modification to the image attribute represented by one or more of the spatial code or the global code; and generate a modified digital image to reflect the modification to the image attribute by modifying one or more of the spatial code or the global code utilizing the generator neural network. 9. The system of claim 8 , wherein the one or more computing devices are further configured to cause the system to generate an attribute direction defining the image attribute by: extracting a first set of latent codes from a first set of digital images depicting the image attribute; extracting a second set of latent codes from a second set of digital images not depicting the image attribute; and determining a difference between the first set of latent codes and the second set of latent codes. 10. The system of claim 8 , wherein the one or more computing devices are further configured to cause the system to detect the modification to the image attribute by receiving an indication of user interaction modifying a slider variable indicating a magnitude of the image attribute to depict in the modified digital image. 11. The system of claim 10 , wherein the one or more computing devices are further configured to cause the system to generate the modified digital image by utilizing the generator neural network to modify one or more of the spatial code or the global code according to the magnitude indicated by the slider variable. 12. The system of claim 8 , wherein the one or more computing devices are further configured to cause the system to: determine a spatial attribute component of the image attribute represented by one or more of the spatial code or the global code of the digital image; and determine a global attribute component of the image attribute represented by one or more of the spatial code or the global code of the digital image. 13. The system of claim 12 , wherein the one or more computing devices are further configured to cause the system to generate the modified digital image by modifying one or more of: the spatial code of the digital image in a spatial attribute direction corresponding to the spatial attribute component; or the global code of the digital image in a global attribute direction corresponding to the global attribute component. 14. The system of claim 8 , wherein the one or more computing devices are further configured to cause the system to generate the modified digital image by modifying one or more of the spatial code or the global code using spherical linear interpolation. 15. A computer-implemented method for deep image manipulation utilizing global and spatial autoencoders, the computer-implemented method comprising: extracting from a first digital image, utilizing an encoder neural network comprising a first set of blocks for encoding global features and a second set of blocks for encoding spatial features, a spatial code comprising features representing a geometric layout of the first digital image by extracting intermediate features using the second set of blocks to upsample features extracted from the first set of blocks; extracting from a second digital image, utilizing the encoder neural network, a global code comprising features representing overall style properties of the second digital image; and generating a modified digital image by combining the spatial code with the global code utilizing a generator neural network. 16. The computer-implemented method of claim 15 , wherein extracting the global code comprises extracting, from the second digital image, global features utilizing the first set of blocks comprising residual blocks for increasing channel dimension and reducing resolution. 17. The computer-implemented method of claim 16 , wherein extracting the global features comprises utilizing the residual blocks to decrease spatial resolution

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Adversarial learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Generative networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11893763B2 cover?
The present disclosure relates to systems, methods, and non-transitory computer readable media for generating a modified digital image from extracted spatial and global codes. For example, the disclosed systems can utilize a global and spatial autoencoder to extract spatial codes and global codes from digital images. The disclosed systems can further utilize the global and spatial autoencoder t…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T9/002. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).