Affordance-based reposing of an object in a scene

US12561956B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12561956-B2
Application numberUS-202218058528-A
CountryUS
Kind codeB2
Filing dateNov 23, 2022
Priority dateNov 23, 2022
Publication dateFeb 24, 2026
Grant dateFeb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for inserting an object into a background are described. Examples of the systems and methods include obtaining a background image including a region for inserting the object, and encoding the background image to obtain an encoded background. A modified image is then generated based on the encoded background using a diffusion model. The modified image depicts the object within the region.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: obtaining a background image including a region for inserting an object; encoding the background image to obtain an encoded background; obtaining an object image depicting the object; encoding the object image to obtain an encoded object; and generating a modified image by denoising input noise based on the encoded background using a diffusion model, wherein the diffusion model takes the encoded object as an input and the modified image depicts the object within the region, wherein the object in the object image has a first pose, and the modified image includes the object with a second pose different from the first pose, and wherein the second pose is determined by the diffusion model based on the background image. 2 . The method of claim 1 , wherein the background image includes a part of the object, and the region of the modified image includes a part of the object image as a remaining part of the object. 3 . The method of claim 1 , further comprising: receiving a preliminary object image depicting the object; identifying the object in the preliminary object image; and cropping the preliminary object image to obtain the object image. 4 . The method of claim 1 , further comprising: combining the encoded background with a noise map to obtain input features; denoising the input features using the diffusion model to obtain output features; and decoding the output features to obtain the modified image. 5 . The method of claim 4 , further comprising: combining the input features with an encoded object determined from an object image of the object using an attention block of the diffusion model, wherein the output features are based at least in part on an output of the attention block. 6 . The method of claim 1 , further comprising: receiving a mask input from a user, wherein the region for inserting the object is based on the mask input. 7 . An apparatus comprising: one or more processors; and one or more memories including instructions executable by the one or more processors to: obtain an object image depicting an object and a background image including a region for inserting the object; encode, using an image encoder, the object image to obtain an encoded object; encode, using a condition encoder, the background image to obtain an encoded background; and generate, using a diffusion model, a modified image by denoising input noise based on the encoded object and the encoded background, wherein the modified image depicts the object within the region, wherein the object in the object image has a first pose, and the modified image includes the object with a second pose different from the first pose, and wherein the second pose is determined by the diffusion model based on the background image. 8 . The apparatus of claim 7 , wherein the instructions are further executable to: decode, using an image decoder, an output of the diffusion model to obtain the modified image. 9 . The apparatus of claim 7 , wherein: the diffusion model comprises a U-Net architecture configured to incorporate the encoded object and the encoded background as input. 10 . The apparatus of claim 7 , wherein: the diffusion model comprises a Denoising Diffusion Implicit Model (DDIM). 11 . The apparatus of claim 7 , wherein: the diffusion model comprises an attention block configured to combined the encoded object and the encoded background. 12 . The apparatus of claim 7 , wherein: the condition encoder comprises a multimodal text and image encoder for encoding the background image. 13 . A non-transitory computer readable medium storing code for image processing, the code comprising instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: obtaining a background image including a region for inserting an object; encoding the background image to obtain an encoded background; combining the encoded background with a noise map to obtain input features; generating a modified image by denoising input noise based on the encoded background using a diffusion model, wherein the modified image depicts the object within the region, wherein the modified image is generated by: denoising the input features using the diffusion model to obtain output features, decoding the output features to obtain the modified image, and combining the input features with an encoded object determined from an object image of the object using an attention block of the diffusion model, wherein the output features are based at least in part on an output of the attention block. 14 . The non-transitory computer readable medium of claim 13 , wherein: the object in the object image has a first pose, and the modified image includes the object with a second pose different from the first pose. 15 . The non-transitory computer readable medium of claim 14 , wherein: the second pose is determined by the diffusion model based on the background image.

Assignees

Inventors

Classifications

  • G06T7/70Primary

    Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

  • involving foreground-background segmentation · CPC title

  • Image fusion; Image merging · CPC title

  • using two or more images, e.g. averaging or subtraction · CPC title

  • G06V10/774Primary

    Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12561956B2 cover?
Systems and methods for inserting an object into a background are described. Examples of the systems and methods include obtaining a background image including a region for inserting the object, and encoding the background image to obtain an encoded background. A modified image is then generated based on the encoded background using a diffusion model. The modified image depicts the object withi…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).