End to End Network Model for High Resolution Image Segmentation
US-2021067848-A1 · Mar 4, 2021 · US
US11935217B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11935217-B2 |
| Application number | US-202117200338-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 12, 2021 |
| Priority date | Mar 12, 2021 |
| Publication date | Mar 19, 2024 |
| Grant date | Mar 19, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately, efficiently, and flexibly generating harmonized digital images utilizing a self-supervised image harmonization neural network. In particular, the disclosed systems can implement, and learn parameters for, a self-supervised image harmonization neural network to extract content from one digital image (disentangled from its appearance) and appearance from another from another digital image (disentangled from its content). For example, the disclosed systems can utilize a dual data augmentation method to generate diverse triplets for parameter learning (including input digital images, reference digital images, and pseudo ground truth digital images), via cropping a digital image with perturbations using three-dimensional color lookup tables (“LUTs”). Additionally, the disclosed systems can utilize the self-supervised image harmonization neural network to generate harmonized digital images that depict content from one digital image having the appearance of another digital image.
Opening claim text (preview).
What is claimed is: 1. A system comprising: one or more memory devices comprising a self-supervised image harmonization neural network comprising: a neural network appearance encoder that extracts an appearance code by disentangling appearance features from content features of a first digital image, the appearance code comprising a latent vector representing one or more appearance characteristics of the first digital image, wherein disentangling the appearance features comprises excluding one or more content features of the first digital image from the latent vector as part of extracting the appearance code; a neural network content encoder that extracts a content code by disentangling content features from appearance features of a second digital image, the content code comprising a latent vector representing a spatial arrangement of the second digital image, wherein disentangling the content features comprises excluding one or more appearance features of the second digital image from the latent vector as part of extracting the content code; and a neural network decoder that generates a modified digital image from the appearance code and the content code. 2. The system of claim 1 , further comprising one or more computing devices that are configured to cause the system to extract the appearance code from the first digital image by utilizing the neural network appearance encoder to extract features representing one or more of color, contrast, brightness, or saturation of the first digital image. 3. The system of claim 1 , further comprising one or more computing devices that are configured to cause the system to extract the content code from the second digital image by utilizing the neural network content encoder to extract features representing a spatial arrangement defining positions and shapes of objects in the second digital image. 4. The system of claim 1 , further comprising one or more computing devices that are configured to cause the system to generate a harmonized digital image by combining a portion of the modified digital image with the first digital image such that the portion of the modified digital image comprises a foreground of the harmonized digital image and the first digital image comprises a background of the harmonized digital image. 5. The system of claim 4 , wherein the one or more computing devices are further configured to generate the harmonized digital image by: receiving an indication of user interaction to generate a mask defining the portion of the modified digital image to combine with the first digital image; selecting the portion of the modified digital image indicated by the mask; and combining the portion of the modified digital image with the first digital image utilizing a fitting function to adapt resolutions. 6. The system of claim 1 , further comprising one or more computing devices that are configured to cause the system to receive an indication of user interaction selecting the first digital image and the second digital image to combine together to generate the modified digital image. 7. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computing device to: extract, from a reference digital image, an appearance code by disentangling appearance features from content features of the reference digital image utilizing a neural network appearance encoder, the appearance code comprising a latent vector representing one or more appearance characteristics of the reference digital image, wherein disentangling the appearance features comprises excluding one or more content features of the reference digital image from the latent vector as part of extracting the appearance code; extract, from an input digital image, a content code by disentangling content features from appearance features of the input digital image utilizing a neural network content encoder, the content code comprising a latent vector representing a spatial arrangement of the input digital image, wherein disentangling the content features comprises excluding one or more appearance features of the input digital image from the latent vector as part of extracting the content code; generate a modified digital image from the appearance code and the content code utilizing a neural network decoder, the modified digital image comprising the one or more appearance characteristics of the reference digital image and the spatial arrangement of the input digital image; and generate a harmonized digital image by combining a portion of the modified digital image with the reference digital image. 8. The non-transitory computer readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the harmonized digital image by utilizing a fitting function learned from low-resolution digital images to combine the portion of the modified digital image with the reference digital image in a high resolution. 9. The non-transitory computer readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the harmonized digital image by receiving a mask indicating the portion of the modified digital image to combine with the reference digital image. 10. The non-transitory computer readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the harmonized digital image in response to receiving indications of user selections of the reference digital image and the input digital image to combine together. 11. The non-transitory computer readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the content code by utilizing the neural network content encoder to extract features representing a spatial arrangement defining positions and shapes of objects in the input digital image. 12. The non-transitory computer readable medium of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the computing device to extract the appearance code by utilizing the neural network appearance encoder to extract features representing color of the reference digital image without representing texture. 13. A computer-implemented method comprising: extracting, utilizing a neural network appearance encoder, an appearance code by disentangling appearance features from content features of a first digital image, the appearance code comprising a latent vector representing one or more appearance characteristics of the first digital image, wherein disentangling the appearance features comprises excluding one or more content features of the first digital image from the latent vector as part of extracting the appearance code; extracting, utilizing a neural network content encoder, a content code by disentangling content features from appearance features of a second digital image, the content code comprising a latent vector representing a spatial arrangement of the second digital image, wherein disentangling the content features comprises excluding one or more appearance features of the second digital image from the latent vector as part of extracting the content code; and generating, utilizing a neural network decoder, a modified digital image from the appearance code and the content code. 14. The computer-implemented method of claim 13 , further comprising extracting the appearance code from the first digital image by utilizing the neural net
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
using two or more images, e.g. averaging or subtraction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.