Computer-based techniques for learning compositional representations of 3D point clouds

US11869149B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11869149-B2
Application numberUS-202217744467-A
CountryUS
Kind codeB2
Filing dateMay 13, 2022
Priority dateMay 13, 2022
Publication dateJan 9, 2024
Grant dateJan 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, an unsupervised training application executes a neural network on a first point cloud to generate keys and values. The unsupervised training application generates output vectors based on a first query set, the keys, and the values and then computes spatial features based on the output vectors. The unsupervised training application computes quantized context features based on the output vectors and a first set of codes representing a first set of 3D geometry blocks. The unsupervised training application modifies the first neural network based on a likelihood of reconstructing the first point cloud, the quantized context features, and the spatial features to generate an updated neural network. A trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud to a representation of 3D geometry instances.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a machine learning model to generate representations of point clouds, the method comprising: executing a first neural network on a first point cloud that represents a first three-dimensional (3D) scene to generate a key set and a value set; generating an output vector set based on a first query set, the key set, and the value set; computing a plurality of spatial features based on the output vector set; computing a plurality of quantized context features based on the output vector set and a first set of codes representing a first set of 3D geometry blocks; and modifying the first neural network based on a likelihood of reconstructing the first point cloud, the plurality of quantized context features, and the plurality of spatial features to generate an updated neural network, wherein a trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud representing a 3D scene to a representation of a plurality of 3D geometry instances. 2. The computer-implemented method of claim 1 , wherein the output vector set is generated by executing a second neural network that includes a plurality of attention layers on the key set, the value set, and the first query set. 3. The computer-implemented method of claim 1 , wherein generating the output vector set comprises: computing a first plurality of compatibility scores between the first query set and the key set; computing a first intermediate query set based on the value set and the first plurality of compatibility scores; and computing the output vector set based on the value set, the first intermediate query set, and the key set. 4. The computer-implemented method of claim 1 , further comprising generating the second query set based on at least one of the output vector set, the first query set, the key set, or the value set. 5. The computer-implemented method of claim 1 , wherein a first spatial feature included in the plurality of spatial features specifies at least one of a weight, a rotation matrix, a 3D scaling factor, or a translation. 6. The computer-implemented method of claim 1 , wherein a first quantized context feature included in the plurality of quantized context features is computed by: computing a first context feature based on a first output vector included in the output vector set; computing a set of distances between the first context feature and the first set of codes; and setting the first quantized context feature equal to a first code included in the first set of codes based on the set of distances. 7. The computer-implemented method of claim 1 , wherein modifying the first neural network comprises replacing a first value for a first weight included in the first neural network with a second value for the first weight that increases a likelihood associated with reconstructing the first point cloud. 8. The computer-implemented method of claim 7 , further comprising executing one or more backpropagation operations on the first neural network to determine the second value for the first weight. 9. The computer-implemented method of claim 1 , further comprising executing the trained machine learning model on a second point cloud to generate a first representation of a first plurality of 3D geometry instances that includes at least one instance of a first 3D geometry block included in the second set of 3D geometry blocks and at least one instance of a second 3D geometry block included in the second set of 3D geometry blocks. 10. The computer-implemented method of claim 9 , wherein, for each 3D geometry instance included in the first plurality of 3D geometry instances, the first representation of the first plurality of 3D geometry instances includes a different quantized context feature and a different spatial feature. 11. One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to train a machine learning model to generate representations of point clouds by performing the steps of: executing a first neural network on a first point cloud that represents a first three-dimensional (3D) scene to generate a key set and a value set; generating an output vector set based on a first query set, the key set, and the value set; computing a plurality of spatial features based on the output vector set; computing a plurality of quantized context features based on the output vector set and a first set of codes representing a first set of 3D geometry blocks; and modifying the first neural network based on a likelihood of reconstructing the first point cloud, the plurality of quantized context features, and the plurality of spatial features to generate an updated neural network, wherein a trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud representing a 3D scene to a representation of a plurality of 3D geometry instances. 12. The one or more non-transitory computer readable media of claim 11 , wherein a second neural network executes a plurality of weighted averaging operations on the value set based on the key set and the first query set to generate the output vector set. 13. The one or more non-transitory computer readable media of claim 11 , wherein generating the output vector set comprises: computing a first plurality of compatibility scores between the first query set and the key set; computing a first intermediate query set based on the value set and the first plurality of compatibility scores; and computing the output vector set based on the value set, the first intermediate query set, and the key set. 14. The one or more non-transitory computer readable media of claim 11 , further comprising generating the second query set based on at least one of the output vector set, the first query set, the key set, or the value set. 15. The one or more non-transitory computer readable media of claim 11 , wherein each spatial feature included in the plurality of spatial features specifies spatial information associated with a different cluster of points within the first point cloud. 16. The one or more non-transitory computer readable media of claim 11 , wherein a first quantized context feature included in the plurality of quantized context features is computed by: computing a first context feature based on a first output vector included in the output vector set; computing a set of distances between the first context feature and the first set of codes; and setting the first quantized context feature equal to a first code included in the first set of codes based on the set of distances. 17. The one or more non-transitory computer readable media of claim 16 , further comprising: computing a second code based on the first context feature; and replacing the first code included in the first set of codes with the second code to generate the second set of codes. 18. The one or more non-transitory computer readable media of claim 11 , wherein modifying the first neural network comprises replacing a first value for a first weight included in the first neural network with a second value for the first weight that increases a likelihood associated with reconstructing the first point cloud. 19. The one or more non-transitory computer readable media of claim 11 , wherein a fir

Assignees

Inventors

Classifications

  • G06T17/10Primary

    Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes · CPC title

  • Combinations of networks · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title

  • Rotation, translation, scaling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11869149B2 cover?
In various embodiments, an unsupervised training application executes a neural network on a first point cloud to generate keys and values. The unsupervised training application generates output vectors based on a first query set, the keys, and the values and then computes spatial features based on the output vectors. The unsupervised training application computes quantized context features base…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T17/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).