Depicting Humans in Text-Defined Outfits
US-2021272341-A1 · Sep 2, 2021 · US
US11587271B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11587271-B2 |
| Application number | US-202017008964-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 1, 2020 |
| Priority date | Sep 1, 2020 |
| Publication date | Feb 21, 2023 |
| Grant date | Feb 21, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
First image data representing a first human wearing a first article of clothing may be received. The first image data, when rendered on a display, may include a first photometric artifact. A first generator network may be used to generate second image data from the first image data. The first photometric artifact may be removed from the second image data. A second generator network may be used to generate third image data from the second image data, the third image data representing the first human in a different pose relative to the first image data. Fourth image data representing the first article of clothing segmented from the first human may be generated and displayed on a display.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving first image data representing a first human wearing a first article of clothing, wherein the first image data, when rendered on a display, comprises a first photometric artifact; generating, using a first generator network, second image data from the first image data, wherein the first photometric artifact is removed from the second image data, when rendered on the display; generating, by a second generator network using the second image data, third image data representing the first human in a different pose relative to the first image data; generating, using the third image data, fourth image data representing the first article of clothing segmented from the first human; and generating code that, when executed by at least one processor, is effective to cause the fourth image data to be displayed on the display. 2. The method of claim 1 , further comprising training the first generator network using adversarial loss and perceptual loss to generate image data constrained by input images. 3. The method of claim 1 , further comprising: training the first generator network together with a discriminator network, wherein the discriminator network is trained using: a first pair of image data comprising a first natural image with a second photometric artifact and an edited version of the first natural image, edited to remove the second photometric artifact; and a second pair of image data comprising a second natural image with a third photometric artifact and a generated image generated by the first generator network to remove the third photometric artifact. 4. The method of claim 3 , wherein the first pair of image data further comprises first ground truth data indicating that the edited version of the first natural image is classified as “real,” and the second pair of image data further comprises second ground truth data indicating that the generated image is classified as “fake.” 5. The method of claim 1 , further comprising: generating, using a pre-trained network, a first activation map at a first layer of the pre-trained network, the first activation map being generated in response to inputting the first image data into the pre-trained network; generating, using the pre-trained network, a second activation map at the first layer of the pre-trained network, the second activation map generated in response to inputting the second image data into the pre-trained network; and comparing the first activation map and the second activation map to determine a perceptual loss associated with the second image data. 6. The method of claim 5 , further comprising: back-propagating the perceptual loss; and updating at least one parameter of the first generator network to minimize the perceptual loss. 7. The method of claim 1 , further comprising: determining, by a geometric transformation model, first data representing a first pose of the first human; inputting the first data into the second generator network; generating second pose data representing a symmetrically aligned pose of the first human; and generating the second image data with the first human in the different pose using the second pose data. 8. The method of claim 1 , wherein the first article of clothing, as represented in the fourth image data, is at least one of a different color, a different texture, or a different style relative to the first article of clothing as represented in the first image data. 9. The method of claim 1 , further comprising: determining a portion of the fourth image data corresponding to a portion of the first image data, wherein the portion of the first image data represents a portion of the first article of clothing that is occluded from view; and generating modified fourth image data by applying an image in-painting technique to the portion of the fourth image data. 10. A system comprising: at least one processor; and at least one non-transitory, computer-readable memory storing instructions that, when executed by the at least one processor, are effective to: receive first image data representing a first human wearing a first article of clothing, wherein the first image data, when rendered on a display, comprises a first photometric artifact; generate, using a first generator network, second image data from the first image data, wherein the first photometric artifact is removed from the second image data, when rendered on the display; generate, by a second generator network using the second image data, third image data representing the first human in a different pose relative to the first image data; generate, using the third image data, fourth image data representing the first article of clothing segmented from the first human; and generate code that, when executed by the at least one processor, is effective to cause the fourth image data to be displayed on the display. 11. The system of claim 10 , wherein the at least one non-transitory computer-readable memory stores further instructions that, when executed by the at least one processor, are further effective to train the first generator network using adversarial loss and perceptual loss to generate image data constrained by input images. 12. The system of claim 10 , wherein the at least one non-transitory computer-readable memory stores further instructions that, when executed by the at least one processor, are further effective to: train the first generator network together with a discriminator network, wherein the discriminator network is trained using: a first pair of image data comprising a first natural image with a second photometric artifact and an edited version of the first natural image, edited to remove the second photometric artifact; and a second pair of image data comprising a second natural image with a third photometric artifact and a generated image generated by the first generator network to remove the third photometric artifact. 13. The system of claim 12 , wherein the first pair of image data further comprises first ground truth data indicating that the edited version of the first natural image is classified as “real,” and the second pair of image data further comprises second ground truth data indicating that the generated image is classified as “fake.” 14. The system of claim 10 , wherein the at least one non-transitory computer-readable memory stores further instructions that, when executed by the at least one processor, are further effective to: generate, using a pre-trained network, a first activation map at a first layer of the pre-trained network, the first activation map being generated in response to inputting the first image data into the pre-trained network; generate, using the pre-trained network, a second activation map at the first layer of the pre-trained network, the second activation map generated in response to inputting the second image data into the pre-trained network; and compare the first activation map and the second activation map to determine a perceptual loss associated with the second image data. 15. The system of claim 14 , wherein the at least one non-transitory computer-readable memory stores further instructions that, when executed by the at least one processor, are further effective to: back propagate the perceptual loss; and update at least one parameter of the first generator network to minimize the perceptual loss. 16. The system of claim 10 , wherein the at least one non-transitory computer-readable memory stores further instructions that, when executed by the at least one processor, are further effective to:
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Adversarial learning · CPC title
Supervised learning · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.