Reconstructing three-dimensional models of objects from real images based on depth information
US-2023147722-A1 · May 11, 2023 · US
US11869149B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11869149-B2 |
| Application number | US-202217744467-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 13, 2022 |
| Priority date | May 13, 2022 |
| Publication date | Jan 9, 2024 |
| Grant date | Jan 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various embodiments, an unsupervised training application executes a neural network on a first point cloud to generate keys and values. The unsupervised training application generates output vectors based on a first query set, the keys, and the values and then computes spatial features based on the output vectors. The unsupervised training application computes quantized context features based on the output vectors and a first set of codes representing a first set of 3D geometry blocks. The unsupervised training application modifies the first neural network based on a likelihood of reconstructing the first point cloud, the quantized context features, and the spatial features to generate an updated neural network. A trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud to a representation of 3D geometry instances.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for training a machine learning model to generate representations of point clouds, the method comprising: executing a first neural network on a first point cloud that represents a first three-dimensional (3D) scene to generate a key set and a value set; generating an output vector set based on a first query set, the key set, and the value set; computing a plurality of spatial features based on the output vector set; computing a plurality of quantized context features based on the output vector set and a first set of codes representing a first set of 3D geometry blocks; and modifying the first neural network based on a likelihood of reconstructing the first point cloud, the plurality of quantized context features, and the plurality of spatial features to generate an updated neural network, wherein a trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud representing a 3D scene to a representation of a plurality of 3D geometry instances. 2. The computer-implemented method of claim 1 , wherein the output vector set is generated by executing a second neural network that includes a plurality of attention layers on the key set, the value set, and the first query set. 3. The computer-implemented method of claim 1 , wherein generating the output vector set comprises: computing a first plurality of compatibility scores between the first query set and the key set; computing a first intermediate query set based on the value set and the first plurality of compatibility scores; and computing the output vector set based on the value set, the first intermediate query set, and the key set. 4. The computer-implemented method of claim 1 , further comprising generating the second query set based on at least one of the output vector set, the first query set, the key set, or the value set. 5. The computer-implemented method of claim 1 , wherein a first spatial feature included in the plurality of spatial features specifies at least one of a weight, a rotation matrix, a 3D scaling factor, or a translation. 6. The computer-implemented method of claim 1 , wherein a first quantized context feature included in the plurality of quantized context features is computed by: computing a first context feature based on a first output vector included in the output vector set; computing a set of distances between the first context feature and the first set of codes; and setting the first quantized context feature equal to a first code included in the first set of codes based on the set of distances. 7. The computer-implemented method of claim 1 , wherein modifying the first neural network comprises replacing a first value for a first weight included in the first neural network with a second value for the first weight that increases a likelihood associated with reconstructing the first point cloud. 8. The computer-implemented method of claim 7 , further comprising executing one or more backpropagation operations on the first neural network to determine the second value for the first weight. 9. The computer-implemented method of claim 1 , further comprising executing the trained machine learning model on a second point cloud to generate a first representation of a first plurality of 3D geometry instances that includes at least one instance of a first 3D geometry block included in the second set of 3D geometry blocks and at least one instance of a second 3D geometry block included in the second set of 3D geometry blocks. 10. The computer-implemented method of claim 9 , wherein, for each 3D geometry instance included in the first plurality of 3D geometry instances, the first representation of the first plurality of 3D geometry instances includes a different quantized context feature and a different spatial feature. 11. One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to train a machine learning model to generate representations of point clouds by performing the steps of: executing a first neural network on a first point cloud that represents a first three-dimensional (3D) scene to generate a key set and a value set; generating an output vector set based on a first query set, the key set, and the value set; computing a plurality of spatial features based on the output vector set; computing a plurality of quantized context features based on the output vector set and a first set of codes representing a first set of 3D geometry blocks; and modifying the first neural network based on a likelihood of reconstructing the first point cloud, the plurality of quantized context features, and the plurality of spatial features to generate an updated neural network, wherein a trained machine learning model includes the updated neural network, a second query set, and a second set of codes representing a second set of 3D geometry blocks and maps a point cloud representing a 3D scene to a representation of a plurality of 3D geometry instances. 12. The one or more non-transitory computer readable media of claim 11 , wherein a second neural network executes a plurality of weighted averaging operations on the value set based on the key set and the first query set to generate the output vector set. 13. The one or more non-transitory computer readable media of claim 11 , wherein generating the output vector set comprises: computing a first plurality of compatibility scores between the first query set and the key set; computing a first intermediate query set based on the value set and the first plurality of compatibility scores; and computing the output vector set based on the value set, the first intermediate query set, and the key set. 14. The one or more non-transitory computer readable media of claim 11 , further comprising generating the second query set based on at least one of the output vector set, the first query set, the key set, or the value set. 15. The one or more non-transitory computer readable media of claim 11 , wherein each spatial feature included in the plurality of spatial features specifies spatial information associated with a different cluster of points within the first point cloud. 16. The one or more non-transitory computer readable media of claim 11 , wherein a first quantized context feature included in the plurality of quantized context features is computed by: computing a first context feature based on a first output vector included in the output vector set; computing a set of distances between the first context feature and the first set of codes; and setting the first quantized context feature equal to a first code included in the first set of codes based on the set of distances. 17. The one or more non-transitory computer readable media of claim 16 , further comprising: computing a second code based on the first context feature; and replacing the first code included in the first set of codes with the second code to generate the second set of codes. 18. The one or more non-transitory computer readable media of claim 11 , wherein modifying the first neural network comprises replacing a first value for a first weight included in the first neural network with a second value for the first weight that increases a likelihood associated with reconstructing the first point cloud. 19. The one or more non-transitory computer readable media of claim 11 , wherein a fir
Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes · CPC title
Combinations of networks · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
Rotation, translation, scaling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.