Scene-aware synthetic human motion generation using neural networks

US2025232506A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025232506-A1
Application numberUS-202418415496-A
CountryUS
Kind codeA1
Filing dateJan 17, 2024
Priority dateJan 17, 2024
Publication dateJul 17, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A motion diffusion model may be pre-trained on motion data, and a scene-aware component (e.g., one or more layers of a neural network) may be connected and used to extract and inject a representation of scene information into the pre-trained motion diffusion model. For example, to predict orientations of joint waypoints along a path through a particular 3D scene, a scene-aware input channel that accepts a representation of the 3D structure of the scene may be added to a pre-trained motion diffusion model. To predict orientations of joint waypoints along a path that interacts with a 3D object in the 3D scene, a scene-aware input channel that accepts a representation of the 3D object and/or a surface thereof may be added to a pre-trained motion diffusion model. As such, the resulting scene-aware motion diffusion model(s) may be tuned on motion-scene data and used to generate human motion.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor comprising: one or more processing units to generate, based at least on processing a representation of at least a portion of a three-dimensional (3D) scene using a diffusion model comprising a scene-aware component and a pre-trained motion diffusion model, a representation of scene-aware motion comprising one or more orientations of one or more joint waypoints along one or more paths of a character at least partially depicted in the 3D scene. 2 . The processor of claim 1 , wherein the one or more processing units are further to generate the diffusion model based at least on adding the scene-aware component to the pre-trained motion diffusion model and tuning the diffusion model using motion-scene training data. 3 . The processor of claim 1 , wherein the processing using the diffusion model comprises injecting a top-down height map of the 3D scene into the pre-trained motion diffusion model. 4 . The processor of claim 1 , wherein the processing using the diffusion model comprises injecting a 3D point cloud representing at least a portion of a 3D object in the 3D scene into the pre-trained motion diffusion model. 5 . The processor of claim 1 , wherein the processing using the diffusion model comprises injecting classification data representing one or more classified locations of one or more classified objects in the 3D scene into the pre-trained motion diffusion model. 6 . The processor of claim 1 , wherein the processing using the diffusion model comprises injecting classification data representing one or more classified locations of one or more other characters or one or more audio sources in the 3D scene into the pre-trained motion diffusion model. 7 . The processor of claim 1 , wherein the one or more processing units are further to update the diffusion model using training data generated based at least on retargeting motion data comprising one or more contact locations with a first object to one or more corresponding locations where a modeled body surface makes contact with a target object. 8 . The processor of claim 1 , wherein the processor is comprised in at least one of: a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 9 . A system comprising one or more processing units to generate, based at least on processing a representation of at least a portion of a three-dimensional (3D) scene using a diffusion model, a representation of scene-aware motion corresponding to a character at least partially depicted in the 3D scene. 10 . The system of claim 9 , wherein the one or more processing units are further to generate the diffusion model based at least on adding a scene-aware component to a pre-trained motion diffusion model and tuning the diffusion model using motion-scene training data. 11 . The system of claim 9 , wherein the processing using the diffusion model comprises injecting a top-down height map of the 3D scene into a pre-trained motion diffusion model of the diffusion model. 12 . The system of claim 9 , wherein the processing using the diffusion model comprises injecting a 3D point cloud representing at least a portion of a 3D object in the 3D scene into a pre-trained motion diffusion model of the diffusion model. 13 . The system of claim 9 , wherein the processing using the diffusion model comprises injecting classification data representing one or more classified locations of one or more classified objects in the 3D scene into a pre-trained motion diffusion model of the diffusion model. 14 . The system of claim 9 , wherein the processing using the diffusion model comprises injecting classification data representing one or more classified locations of one or more other characters or one or more audio sources in the 3D scene into a pre-trained motion diffusion model of the diffusion model. 15 . The system of claim 9 , wherein the one or more processing units are further to update the diffusion model using training data generated based at least on retargeting motion data comprising one or more contact locations with a first object to one or more corresponding locations where a surface of a modeled body makes contact with a target object. 16 . The system of claim 9 , wherein the system is comprised in at least one of: a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for generating synthetic data; a system for generating synthetic data using AI; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 17 . A method comprising: generating, based at least on injecting a representation of at least a portion of a three-dimensional (3D) scene into a pre-trained diffusion model, a representation of one or more orientations of one or more waypoints along one or more paths of a character in the 3D scene. 18 . The method of claim 17 , further comprising generating a diffusion model based at least on adding a scene-aware component to the pre-trained diffusion model and tuning the diffusion model using motion-scene training data. 19 . The method of claim 17 , further comprising updating a diffusion model comprising the pre-trained diffusion model using training data generated based at least on retargeting motion data comprising one or more contact locations with a first object to one or more corresponding locations where a modeled body surface makes contact with a target object. 20 . The method of claim 17 , wherein the method is performed by at least one of: a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for generating synthetic data;

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using neural networks · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025232506A1 cover?
A motion diffusion model may be pre-trained on motion data, and a scene-aware component (e.g., one or more layers of a neural network) may be connected and used to extract and inject a representation of scene information into the pre-trained motion diffusion model. For example, to predict orientations of joint waypoints along a path through a particular 3D scene, a scene-aware input channel tha…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06V40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).