Action-conditional implicit dynamics of deformable objects

US12165258B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12165258-B2
Application numberUS-202217691723-A
CountryUS
Kind codeB2
Filing dateMar 10, 2022
Priority dateMar 10, 2022
Publication dateDec 10, 2024
Grant dateDec 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One or more machine learning models (MLMs) may learn implicit 3D representations of geometry of an object and of dynamics of the object from performing an action on the object. Implicit neural representations may be used to reconstruct high-fidelity full geometry of the object and predict a flow-based dynamics field from one or more images, which may provide a partial view of the object. Correspondences between locations of an object may be learned based at least on distances between the locations on a surface corresponding to the object, such as geodesic distances. The distances may be incorporated into a contrastive learning loss function to train one or more MLMs to learn correspondences between locations of the object, such as a correspondence embedding field. The correspondences may be used to evaluate state changes when evaluating one or more actions that may be performed on the object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: generating, using sensor data capturing at least a portion of an object in an environment, first features defining a three-dimensional (3D) representation of geometry of the object; generating, using the sensor data, second features defining a 3D representation of an action on the object; applying the first features to one or more first machine learning models (MLMs) trained to generate an implicit 3D representation of geometry of the object; applying the first features and the second features to one or more second MLMs trained to generate an implicit 3D representation of flow dynamics that would result at locations of the object from performing the action on the object; and performing one or more control operations for a machine based at least on applying using the implicit 3D representation of flow dynamics of the object, the flow dynamics to the locations using the implicit 3D representation of geometry. 2. The method of claim 1 , wherein the generating of the first features is from a partial view of the object in the environment and the one or more first MLMs are trained to predict at least a portion of the object using the first features. 3. The method of claim 1 , wherein the performing of the action is on a first physical state of the object, the applying of the flow dynamics to the locations produces a second physical state of the object caused by the performing of the action on the first physical state of the object, and the performing of the one or more control operations is based at least on comparing the second physical state of the object to a goal physical state for the object. 4. The method of claim 1 , wherein the action represents a physical manipulation of the object by an external force. 5. The method of claim 1 , wherein the one or more first MLMs are trained to generate the implicit 3D representation of geometry of the object as occupancy predictions of the object for the locations in the environment using at least the first features. 6. The method of claim 1 , further comprising applying the first features to one or more third MLMs trained to generate an implicit 3D representation of correspondences between positions on the object using the first features based at least on distances between the positions along a surface corresponding to the object, wherein the performing of the one or more control operations is further based on the implicit 3D representation of correspondences. 7. The method of claim 1 , wherein the implicit 3D representation of geometry of the object is jointly learned with the implicit 3D representation of flow dynamics of the object. 8. The method of claim 1 , wherein the generating of the second features includes: determining, using the sensor data, one or more locations of the object; and computing one or more distances between one or more grasp locations associated with the action and the one or more locations of the object, wherein the 3D representation of the action on the object is based at least on the one or more distances. 9. A system comprising: one or more processing units to execute operations comprising: determining, using one or more images that depict an object in an environment, features defining a 3D representation of an action on the object and defining a 3D representation of geometry of the object; generating, using one or more machine learning models (MLMs) that operate on the features, an implicit 3D representation of flow dynamics that would result at locations of the object from performing the action on the object; and performing one or more control operations of a machine based at least on applying using the implicit 3D representation of flow dynamics of the object, the flow dynamics to the locations using an implicit 3D representation of geometry of the object. 10. The system of claim 9 , wherein the 3D representation of geometry of the object is a partial 3D shape of the object perceived from the one or more images, and the one or more MLMs are trained to predict the flow dynamics for the locations at least a portion of the object that is separate from the partial 3D shape. 11. The system of claim 9 , wherein the flow dynamics include a forward flow dynamics field corresponding to the locations on the object. 12. The system of claim 9 , wherein the determining of the features includes back projecting the one or more images using color information and depth information of the one or more images. 13. The system of claim 9 , wherein the operations further include generating occupancy predictions of the object for locations in the environment using at least some of the features corresponding to the 3D representation of geometry of the object, wherein the applying of the flow dynamics is to the occupancy predictions. 14. The system of claim 9 , wherein the one or more MLMs are one or more first MLMs and the operations further include applying at least some features of the features corresponding to the 3D representation of geometry of the object to one or more second MLMs trained to generate an implicit 3D representation of correspondences between locations of the object using the at least some features based at least on distances between the locations along a surface corresponding to the object. 15. The system of claim 9 , wherein the implicit 3D representation of geometry of the object is jointly learned with the implicit 3D representation of flow dynamics of the object. 16. The system of claim 9 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system for generating synthetic data; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 17. At least one processor comprising: one or more circuits to: generate, using one or more machine learning models (MLMs) and a three-dimensional (3D) representation of an action on an object, an implicit 3D representation of flow dynamics that would result at locations of the object from performing the action on the object, and perform one or more control operations for a machine based at least on applying, using the implicit 3D representation of flow dynamics of the object, the flow dynamics to the locations using an implicit 3D representation of geometry of the object. 18. The at least one processor of claim 17 , wherein the one or more MLMs are trained using training images and ground-truth data generated using a cloud-based platform that performs physical simulation and photorealistic rendering of one or more objects in one or more virtual environments. 19. The at least one processor of claim 17 , wherein the one or more MLMs are jointly trained to decode the implicit 3D representation of flow dynamics, and to decode the implicit 3D representation of geometry of the object. 20. The at least one processor of claim 17 , wherein the at least one processor is comprised in at least one of: a control system for an autonomous or semi-

Assignees

Inventors

Classifications

  • Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title

  • Ensemble learning · CPC title

  • Shape modification · CPC title

  • of extracted features · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12165258B2 cover?
One or more machine learning models (MLMs) may learn implicit 3D representations of geometry of an object and of dynamics of the object from performing an action on the object. Implicit neural representations may be used to reconstruct high-fidelity full geometry of the object and predict a flow-based dynamics field from one or more images, which may provide a partial view of the object. Corres…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T17/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).