Action Recognition Using Implicit Pose Representations
US-2021073525-A1 · Mar 11, 2021 · US
US12165258B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12165258-B2 |
| Application number | US-202217691723-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 10, 2022 |
| Priority date | Mar 10, 2022 |
| Publication date | Dec 10, 2024 |
| Grant date | Dec 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One or more machine learning models (MLMs) may learn implicit 3D representations of geometry of an object and of dynamics of the object from performing an action on the object. Implicit neural representations may be used to reconstruct high-fidelity full geometry of the object and predict a flow-based dynamics field from one or more images, which may provide a partial view of the object. Correspondences between locations of an object may be learned based at least on distances between the locations on a surface corresponding to the object, such as geodesic distances. The distances may be incorporated into a contrastive learning loss function to train one or more MLMs to learn correspondences between locations of the object, such as a correspondence embedding field. The correspondences may be used to evaluate state changes when evaluating one or more actions that may be performed on the object.
Opening claim text (preview).
What is claimed is: 1. A method comprising: generating, using sensor data capturing at least a portion of an object in an environment, first features defining a three-dimensional (3D) representation of geometry of the object; generating, using the sensor data, second features defining a 3D representation of an action on the object; applying the first features to one or more first machine learning models (MLMs) trained to generate an implicit 3D representation of geometry of the object; applying the first features and the second features to one or more second MLMs trained to generate an implicit 3D representation of flow dynamics that would result at locations of the object from performing the action on the object; and performing one or more control operations for a machine based at least on applying using the implicit 3D representation of flow dynamics of the object, the flow dynamics to the locations using the implicit 3D representation of geometry. 2. The method of claim 1 , wherein the generating of the first features is from a partial view of the object in the environment and the one or more first MLMs are trained to predict at least a portion of the object using the first features. 3. The method of claim 1 , wherein the performing of the action is on a first physical state of the object, the applying of the flow dynamics to the locations produces a second physical state of the object caused by the performing of the action on the first physical state of the object, and the performing of the one or more control operations is based at least on comparing the second physical state of the object to a goal physical state for the object. 4. The method of claim 1 , wherein the action represents a physical manipulation of the object by an external force. 5. The method of claim 1 , wherein the one or more first MLMs are trained to generate the implicit 3D representation of geometry of the object as occupancy predictions of the object for the locations in the environment using at least the first features. 6. The method of claim 1 , further comprising applying the first features to one or more third MLMs trained to generate an implicit 3D representation of correspondences between positions on the object using the first features based at least on distances between the positions along a surface corresponding to the object, wherein the performing of the one or more control operations is further based on the implicit 3D representation of correspondences. 7. The method of claim 1 , wherein the implicit 3D representation of geometry of the object is jointly learned with the implicit 3D representation of flow dynamics of the object. 8. The method of claim 1 , wherein the generating of the second features includes: determining, using the sensor data, one or more locations of the object; and computing one or more distances between one or more grasp locations associated with the action and the one or more locations of the object, wherein the 3D representation of the action on the object is based at least on the one or more distances. 9. A system comprising: one or more processing units to execute operations comprising: determining, using one or more images that depict an object in an environment, features defining a 3D representation of an action on the object and defining a 3D representation of geometry of the object; generating, using one or more machine learning models (MLMs) that operate on the features, an implicit 3D representation of flow dynamics that would result at locations of the object from performing the action on the object; and performing one or more control operations of a machine based at least on applying using the implicit 3D representation of flow dynamics of the object, the flow dynamics to the locations using an implicit 3D representation of geometry of the object. 10. The system of claim 9 , wherein the 3D representation of geometry of the object is a partial 3D shape of the object perceived from the one or more images, and the one or more MLMs are trained to predict the flow dynamics for the locations at least a portion of the object that is separate from the partial 3D shape. 11. The system of claim 9 , wherein the flow dynamics include a forward flow dynamics field corresponding to the locations on the object. 12. The system of claim 9 , wherein the determining of the features includes back projecting the one or more images using color information and depth information of the one or more images. 13. The system of claim 9 , wherein the operations further include generating occupancy predictions of the object for locations in the environment using at least some of the features corresponding to the 3D representation of geometry of the object, wherein the applying of the flow dynamics is to the occupancy predictions. 14. The system of claim 9 , wherein the one or more MLMs are one or more first MLMs and the operations further include applying at least some features of the features corresponding to the 3D representation of geometry of the object to one or more second MLMs trained to generate an implicit 3D representation of correspondences between locations of the object using the at least some features based at least on distances between the locations along a surface corresponding to the object. 15. The system of claim 9 , wherein the implicit 3D representation of geometry of the object is jointly learned with the implicit 3D representation of flow dynamics of the object. 16. The system of claim 9 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system for generating synthetic data; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 17. At least one processor comprising: one or more circuits to: generate, using one or more machine learning models (MLMs) and a three-dimensional (3D) representation of an action on an object, an implicit 3D representation of flow dynamics that would result at locations of the object from performing the action on the object, and perform one or more control operations for a machine based at least on applying, using the implicit 3D representation of flow dynamics of the object, the flow dynamics to the locations using an implicit 3D representation of geometry of the object. 18. The at least one processor of claim 17 , wherein the one or more MLMs are trained using training images and ground-truth data generated using a cloud-based platform that performs physical simulation and photorealistic rendering of one or more objects in one or more virtual environments. 19. The at least one processor of claim 17 , wherein the one or more MLMs are jointly trained to decode the implicit 3D representation of flow dynamics, and to decode the implicit 3D representation of geometry of the object. 20. The at least one processor of claim 17 , wherein the at least one processor is comprised in at least one of: a control system for an autonomous or semi-
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
Ensemble learning · CPC title
Shape modification · CPC title
of extracted features · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.