Text to 3d via sparse multi-view generation and reconstruction
US-2025104349-A1 · Mar 27, 2025 · US
US12561956B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12561956-B2 |
| Application number | US-202218058528-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 23, 2022 |
| Priority date | Nov 23, 2022 |
| Publication date | Feb 24, 2026 |
| Grant date | Feb 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for inserting an object into a background are described. Examples of the systems and methods include obtaining a background image including a region for inserting the object, and encoding the background image to obtain an encoded background. A modified image is then generated based on the encoded background using a diffusion model. The modified image depicts the object within the region.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining a background image including a region for inserting an object; encoding the background image to obtain an encoded background; obtaining an object image depicting the object; encoding the object image to obtain an encoded object; and generating a modified image by denoising input noise based on the encoded background using a diffusion model, wherein the diffusion model takes the encoded object as an input and the modified image depicts the object within the region, wherein the object in the object image has a first pose, and the modified image includes the object with a second pose different from the first pose, and wherein the second pose is determined by the diffusion model based on the background image. 2 . The method of claim 1 , wherein the background image includes a part of the object, and the region of the modified image includes a part of the object image as a remaining part of the object. 3 . The method of claim 1 , further comprising: receiving a preliminary object image depicting the object; identifying the object in the preliminary object image; and cropping the preliminary object image to obtain the object image. 4 . The method of claim 1 , further comprising: combining the encoded background with a noise map to obtain input features; denoising the input features using the diffusion model to obtain output features; and decoding the output features to obtain the modified image. 5 . The method of claim 4 , further comprising: combining the input features with an encoded object determined from an object image of the object using an attention block of the diffusion model, wherein the output features are based at least in part on an output of the attention block. 6 . The method of claim 1 , further comprising: receiving a mask input from a user, wherein the region for inserting the object is based on the mask input. 7 . An apparatus comprising: one or more processors; and one or more memories including instructions executable by the one or more processors to: obtain an object image depicting an object and a background image including a region for inserting the object; encode, using an image encoder, the object image to obtain an encoded object; encode, using a condition encoder, the background image to obtain an encoded background; and generate, using a diffusion model, a modified image by denoising input noise based on the encoded object and the encoded background, wherein the modified image depicts the object within the region, wherein the object in the object image has a first pose, and the modified image includes the object with a second pose different from the first pose, and wherein the second pose is determined by the diffusion model based on the background image. 8 . The apparatus of claim 7 , wherein the instructions are further executable to: decode, using an image decoder, an output of the diffusion model to obtain the modified image. 9 . The apparatus of claim 7 , wherein: the diffusion model comprises a U-Net architecture configured to incorporate the encoded object and the encoded background as input. 10 . The apparatus of claim 7 , wherein: the diffusion model comprises a Denoising Diffusion Implicit Model (DDIM). 11 . The apparatus of claim 7 , wherein: the diffusion model comprises an attention block configured to combined the encoded object and the encoded background. 12 . The apparatus of claim 7 , wherein: the condition encoder comprises a multimodal text and image encoder for encoding the background image. 13 . A non-transitory computer readable medium storing code for image processing, the code comprising instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: obtaining a background image including a region for inserting an object; encoding the background image to obtain an encoded background; combining the encoded background with a noise map to obtain input features; generating a modified image by denoising input noise based on the encoded background using a diffusion model, wherein the modified image depicts the object within the region, wherein the modified image is generated by: denoising the input features using the diffusion model to obtain output features, decoding the output features to obtain the modified image, and combining the input features with an encoded object determined from an object image of the object using an attention block of the diffusion model, wherein the output features are based at least in part on an output of the attention block. 14 . The non-transitory computer readable medium of claim 13 , wherein: the object in the object image has a first pose, and the modified image includes the object with a second pose different from the first pose. 15 . The non-transitory computer readable medium of claim 14 , wherein: the second pose is determined by the diffusion model based on the background image.
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
involving foreground-background segmentation · CPC title
Image fusion; Image merging · CPC title
using two or more images, e.g. averaging or subtraction · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.