Exemplar-based object appearance transfer driven by correspondence
US-2023351566-A1 · Nov 2, 2023 · US
US12555288B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12555288-B2 |
| Application number | US-202318459526-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 1, 2023 |
| Priority date | Sep 1, 2023 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, apparatus, and non-transitory computer readable medium for image generation are described. Embodiments of the present disclosure obtain a content input and a style input via a user interface or from a database. The content input includes a target spatial layout and the style input includes a target style. A content encoder of an image processing apparatus encodes the content input to obtain a spatial layout mask representing the target spatial layout. A style encoder of the image processing apparatus encodes the style input to obtain a style embedding representing the target style. An image generation model of the image processing apparatus generates an image based on the spatial layout mask and the style embedding, where the image includes the target spatial layout and the target style.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining a content input and a style input, wherein the content input comprises a target spatial layout and the style input comprises a target style; encoding, by a content encoder, the content input to obtain a spatial layout mask representing the target spatial layout; encoding, by a style encoder, the style input to obtain a style embedding representing the target style; and generating, by an image generation model, an image by denoising noisy features based on the spatial layout mask and the style embedding, wherein the image includes the target spatial layout and the target style. 2 . The method of claim 1 , wherein: the content input comprises a content image and the style input comprises a style image. 3 . The method of claim 1 , further comprising: performing a spatial-wise operation based on the spatial layout mask, wherein the image is generated based on the spatial-wise operation. 4 . The method of claim 1 , further comprising: performing a channel-wise operation based on the style embedding, wherein the image is generated based on the channel-wise operation. 5 . The method of claim 1 , further comprising: computing a content weight based on a diffusion timestep, wherein the image is generated based on the spatial layout mask according to the content weight. 6 . The method of claim 1 , further comprising: computing a style weight based on a diffusion timestep, wherein the image is generated based on the style embedding according to the style weight. 7 . The method of claim 1 , further comprising: generating a noise vector, wherein the image is generated based on the noise vector using a reverse diffusion process. 8 . The method of claim 1 , wherein: the style embedding includes global semantic information representing the target style. 9 . The method of claim 1 , wherein: the spatial layout mask comprises a plurality of values corresponding to a plurality of locations of the content input, respectively, and wherein the style embedding comprises a tuple of values that together represent the target style. 10 . A method comprising: initializing a content encoder, a style encoder, and an image generation model; receiving training data including an image comprising spatial content and a style attribute; computing an objective function based on the spatial content and the style attribute; and jointly training the content encoder, the style encoder, and the image generation model using an end-to-end process based on the objective function. 11 . The method of claim 10 , wherein: the content encoder is trained to generate a spatial layout mask representing a target spatial layout. 12 . The method of claim 10 , wherein: the style encoder is trained to generate a style embedding representing a target style. 13 . The method of claim 10 , wherein: the image generation model is trained to generate a predicted image including a target spatial layout and a target style based on an output of the content encoder and an output of the style encoder. 14 . The method of claim 10 , further comprising: generating a latent code based on the image using an image encoder; generating a noisy latent code based on the latent code using a forward diffusion process; and generating a predicted image using the image generation model, wherein the objective function is computed based on the predicted image. 15 . The method of claim 14 , further comprising: generating a predicted spatial layout mask using the content encoder; and generating a predicted style embedding using the style encoder, wherein the predicted image is generated based on the predicted spatial layout mask and the predicted style embedding. 16 . An apparatus comprising: at least one processor; at least one memory including instructions executable by the at least one processor; a content encoder comprising parameters stored in the at least one memory and trained to encode a content input to obtain a spatial layout mask representing a target spatial layout; a style encoder comprising parameters stored in the at least one memory and trained to encode a style input to obtain a style embedding representing a target style; and an image generation model comprising parameters stored in the at least one memory and trained to generate an image by denoising noisy features based on the spatial layout mask and the style embedding, wherein the image includes the target spatial layout and the target style. 17 . The apparatus of claim 16 , wherein: the content encoder and the style encoder each comprise a residual neural network. 18 . The apparatus of claim 16 , wherein: the image generation model comprises a denoising unit. 19 . The apparatus of claim 16 , further comprising: an image encoder configured to generate a latent code based on the image. 20 . The apparatus of claim 16 , further comprising: a timestep scheduling component configured to compute a content weight based on a diffusion timestep, wherein the image is generated based on the spatial layout mask according to the content weight, and to compute a style weight based on the diffusion timestep, wherein the image is generated based on the style embedding according to the style weight.
Related publications grouped by family.
Answers are generated from the same data shown on this page.