Controllable image generation
US-2021335029-A1 · Oct 28, 2021 · US
US11514632B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11514632-B2 |
| Application number | US-202017091440-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 6, 2020 |
| Priority date | Nov 6, 2020 |
| Publication date | Nov 29, 2022 |
| Grant date | Nov 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes methods, non-transitory computer readable storage media, and systems that utilize a contrastive perceptual loss to modify neural networks for generating synthetic digital content items. For example, the disclosed systems generate a synthetic digital content item based on a guide input to a generative neural network. The disclosed systems utilize an encoder neural network to generate encoded representations of the synthetic digital content item and a corresponding ground-truth digital content item. Additionally, the disclosed systems sample patches from the encoded representations of the encoded digital content items and then determine a contrastive loss based on the perceptual distances between the patches in the encoded representations. Furthermore, the disclosed systems jointly update the parameters of the generative neural network and the encoder neural network utilizing the contrastive loss.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor, cause a computing device to: generate, utilizing a generative neural network, a synthetic digital content item within a first dimension space based on a guide input; generate an encoded synthetic digital content item within a second dimension space by processing the synthetic digital content item utilizing an encoder neural network, the second dimension space corresponding to encodings in a lower dimension space than the first dimension space; generate an encoded ground-truth digital content item within the second dimension space by processing a ground-truth digital content item corresponding to the guide input utilizing the encoder neural network; sample, from a first feature vector within the second dimension space, a patch from the encoded synthetic digital content item based on a corresponding spatial location in the synthetic digital content item; sample, from a second feature vector within the second dimension space, a plurality of patches from the encoded ground-truth digital content item based on corresponding spatial locations in the ground-truth digital content item; determine a contrastive loss by comparing the sampled patch from the first feature vector of the encoded synthetic digital content item in the second dimension space with the plurality of sampled patches from the second feature vector of the encoded ground-truth digital content item in the second dimension space; and update parameters of the generative neural network and parameters of the encoder neural network based on the contrastive loss. 2. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to determine the contrastive loss by: comparing the sampled patch in the first feature vector of the encoded synthetic digital content item to a positive patch in the second feature vector of the encoded ground-truth digital content item, wherein a location of the positive patch in the second feature vector of the encoded ground-truth digital content item corresponds to a location of the sampled patch in the first feature vector of the encoded synthetic digital content item; and comparing the sampled patch in the first feature vector of the encoded synthetic digital content item to a negative patch in the second feature vector of the encoded ground-truth digital content item, wherein a location of the negative patch in the second feature vector of the encoded ground-truth digital content item does not correspond to the location of the sampled patch in the first feature vector the encoded synthetic digital content item. 3. The non-transitory computer readable storage medium as recited in claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computing device to determine the contrastive loss by: determining a first set of intermediate feature representations of the synthetic digital content item; determining a second set of intermediate feature representations of the ground-truth digital content item; and determining a multilayer patch-wise contrastive loss based on the first set of intermediate feature representations and the second set of intermediate feature representations. 4. The non-transitory computer readable storage medium as recited in claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computing device to determine the contrastive loss by: determining a first perceptual distance between the sampled patch in the first feature vector of the encoded synthetic digital content item and the positive patch in the second feature vector of the encoded ground-truth digital content item; determining a second perceptual distance between the sampled patch in the first feature vector of the encoded synthetic digital content item and the negative patch in the second feature vector of the encoded ground-truth digital content item; and determining the contrastive loss based on the first perceptual distance and the second perceptual distance. 5. The non-transitory computer readable storage medium as recited in claim 4 , further comprising instructions that, when executed by the at least one processor, cause the computing device to update the parameters of the generative neural network and the parameters of the encoder neural network based on the contrastive loss by updating the parameters of the generative neural network and the parameters of the encoder neural network based on the contrastive loss to decrease the first perceptual distance and increase the second perceptual distance. 6. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: receive a new guide input corresponding to a new ground-truth digital content item; generate, based on the new guide input, a new synthetic digital content item utilizing the generative neural network with the parameters updated based on the contrastive loss; and provide the new ground-truth digital content item for display on a display device. 7. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: generate a first feature representation from the synthetic digital content item and a second feature representation from the ground-truth digital content item; convert, utilizing a multilayer perceptron layer of the encoder neural network, the first feature representation to the first feature vector in the second dimension space and the second feature representation to the second feature vector in the second dimension space; and select the sampled patch in the first feature vector of the encoded synthetic digital content item and the plurality of sampled patches in the second feature vector of the encoded ground-truth digital content item in the second dimension space. 8. The non-transitory computer readable storage medium as recited in claim 1 , wherein the synthetic digital content item comprises a synthetic digital image or a synthetic digital audio track. 9. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the guide input from the ground-truth digital content item. 10. A system comprising: a memory device comprising: a ground-truth digital content item; a guide input corresponding to the ground-truth digital content item; and a generative neural network and an encoder neural network; and one or more servers configured to cause the system to: generate, utilizing the generative neural network, a synthetic digital content item based within a first dimension space on a guide input; generate, utilizing the encoder neural network, an encoded synthetic digital content item within a second dimension space from the synthetic digital content item and an encoded ground-truth digital content item within the second dimension space from the ground-truth digital content item, the second dimension space corresponding to encodings in a lower dimension space than the first dimension space; sample a synthetic patch from a first feature vector corresponding to the encoded synthetic digital content item within the second dimension space based on a corresponding spatial location in the synthetic digital content item; sample a positive
Related publications grouped by family.
Answers are generated from the same data shown on this page.