Generating an avatar from real time image data
US-9508197-B2 · Nov 29, 2016 · US
US12530847B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12530847-B2 |
| Application number | US-202318100546-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 23, 2023 |
| Priority date | Jan 23, 2023 |
| Publication date | Jan 20, 2026 |
| Grant date | Jan 20, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects of the present disclosure involve a system for generating images that depict a real-world object in a scene. The system receives image content comprising a depth map of a scene and a three-dimensional (3D) model of a real-world object. The system receives a textual description for a background. The system applies the image content and the textual description to a machine learning model to generate a scene image that depicts the real-world object on the background corresponding to the textual description.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, by one or more processors of a device, image content comprising a depth map of a scene and a three-dimensional (3D) model of a real-world object, the 3D model of the real-world object representing a viewpoint of the real-world object from 360 degrees including all textures of the real-world object from each angle, the depth map comprising a position of a pedestal; receiving input that selects the depth map for the scene from a list of predefined generic depth maps of scenes, each predefined generic depth map of the list of generic depth maps of scenes representing a different depth map that relates a position of a target 3D object model to a respective background and surface on which the target 3D object model is mounted; receiving a textual description for a background; and applying the image content and the textual description to a machine learning model to generate a simulated scene image that depicts the real-world object being placed on the pedestal on the background corresponding to the textual description, the simulated scene image being generated to depict the real-world object on top of the pedestal at the position of the pedestal in the depth map. 2 . The method of claim 1 , wherein the machine learning model comprises a generative artificial neural network comprising a diffusion module that blurs some of the background of the simulated scene image. 3 . The method of claim 1 , wherein the textual description comprises a caption. 4 . The method of claim 3 , further comprising overlaying the caption on the simulated scene image. 5 . The method of claim 4 , wherein the textual description specifies a position for the caption, wherein the caption is overlaid at the specified position in the simulated scene image. 6 . The method of claim 1 , wherein the real-world object comprises a shoe that is depicted as being placed on top of the pedestal. 7 . The method of claim 1 , further comprising: receiving input from a user that moves an augmented reality (AR) object representing the real-world object within the simulated scene image using the 3D model of the real-world object. 8 . The method of claim 1 , further comprising: capturing one or more images of the real-world object by a camera; and generating the 3D model of the real-world object based on the one or more images. 9 . The method of claim 1 , each predefined generic depth map representing a custom position of a shoe in relation to the respective background and pedestal on which the shoe is placed. 10 . The method of claim 1 , wherein the machine learning model is trained to map a texture of the 3D model of the real-world object to a position of the real-world object in the simulated scene image. 11 . The method of claim 1 , further comprising: receiving input that selects a 3D camera position; and updating the simulated scene image to depict the real-world object on the background from the selected 3D camera position. 12 . The method of claim 1 , further comprising: receiving multiple 3D scene depth images; and generating, based on the multiple 3D scene depth images, a video comprising a plurality of scene images that depict the real-world object on the background from multiple 3D camera positions, the video being generated after initially generating the scene based on a single depth image comprising the depth map and the target 3D object model. 13 . The method of claim 1 , wherein the image content includes a position of the 3D model in the depth map of the scene. 14 . The method of claim 1 , further comprising training the machine learning model by performing training operations comprising: receiving training data comprising a plurality of training image content representing training 3D object models on depth maps and corresponding ground truth images depicting the training object models in a training scene; applying the machine learning model to a first training image content of the plurality of training image content to generate an estimated image depicting a 3D object of the first training image content on a training scene; computing a deviation between the estimated image and the ground truth image associated with the first training image content; and updating parameters of the machine learning model based on the computed deviation. 15 . A system comprising: at least one processor of a device programmed to perform operations comprising: receiving image content comprising a depth map of a scene and a three-dimensional (3D) model of a real-world object, the 3D model of the real-world object representing a viewpoint of the real-world object from 360 degrees including all textures of the real-world object from each angle, the depth map comprising a position of a pedestal; receiving input that selects the depth map for the scene from a list of predefined generic depth maps of scenes, each predefined generic depth map of the list of generic depth maps of scenes representing a different depth map that relates a position of a target 3D object model to a respective background and surface on which the target 3D object model is mounted; receiving a textual description for a background; and applying the image content and the textual description to a machine learning model to generate a simulated scene image that depicts the real-world object being placed on the pedestal on the background corresponding to the textual description, the simulated scene image being generated to depict the real-world object on top of the pedestal at the position of the pedestal of the depth map. 16 . The system of claim 15 , wherein the machine learning model comprises a generative artificial neural network. 17 . The system of claim 15 , wherein the textual description comprises a caption. 18 . The system of claim 17 , the operations comprising overlaying the caption on the simulated scene image. 19 . A non-transitory machine-readable storage medium that includes instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising: receiving image content comprising a depth map of a scene and a three-dimensional (3D) model of a real-world object, the 3D model of the real-world object representing a viewpoint of the real-world object from 360 degrees including all textures of the real-world object from each angle, the depth map comprising a position of a pedestal; receiving input that selects the depth map for the scene from a list of predefined generic depth maps of scenes, each predefined generic depth map of the list of generic depth maps of scenes representing a different depth map that relates a position of a target 3D object model to a respective background and surface on which the target 3D object model is mounted; receiving a textual description for a background; and applying the image content and the textual description to a machine learning model to generate a simulated scene image that depicts the real-world object being placed on the pedestal on the background corresponding to the textual description, the simulated scene image being generated to depict the real-world object on top of the pedestal at the position of the pedestal of the depth map. 20 . The non-transitory machine-readable storage medium of claim 19 , the operations comprising overlaying a caption on the simulated scene image.
using two or more images, e.g. averaging or subtraction · CPC title
Advertisement creation · CPC title
Image combination · CPC title
Colour editing, changing, or manipulating; Use of colour codes · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.