Systems and methods for object identification
US-2019108396-A1 · Apr 11, 2019 · US
US11257298B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11257298-B2 |
| Application number | US-202016822819-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 18, 2020 |
| Priority date | Mar 18, 2020 |
| Publication date | Feb 22, 2022 |
| Grant date | Feb 22, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and non-transitory computer readable storage media are disclosed for reconstructing three-dimensional meshes from two-dimensional images of objects with automatic coordinate system alignment. For example, the disclosed system can generate feature vectors for a plurality of images having different views of an object. The disclosed system can process the feature vectors to generate coordinate-aligned feature vectors aligned with a coordinate system associated with an image. The disclosed system can generate a combined feature vector from the feature vectors aligned to the coordinate system. Additionally, the disclosed system can then generate a three-dimensional mesh representing the object from the combined feature vector.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor, cause a computing device to: generate, utilizing a neural network encoder, a first feature vector by encoding visual features from a first image comprising a first view of an object and a second feature vector by encoding visual features from a second image comprising a second view of the object; determine camera parameters associated with the second image based on a camera pose of a camera that captured the second image; generate a coordinate-aligned feature vector for the second image from the second feature vector by utilizing a coordinate transform neural network to process the second feature vector based on the camera parameters associated with the second image; combine the coordinate-aligned feature vector for the second image and the first feature vector for the first image to generate a combined feature vector representing the object; and generate a three-dimensional mesh representing the object from the combined feature vector representing the object. 2. The non-transitory computer readable storage medium as recited in claim 1 , wherein the instructions that cause the computing device to: generate the three-dimensional mesh by using a surface generation neural network to generate the three-dimensional mesh from the combined feature vector. 3. The non-transitory computer readable storage medium as recited in claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: utilize the neural network encoder to generate a third feature vector from a third image comprising a third view of the object; generate an additional coordinate-aligned feature vector for the third image by processing the third feature vector based on camera parameters associated with the third image utilizing the coordinate transform neural network; and combine the first feature vector, the coordinate-aligned feature vector for the second image, and the additional coordinate-aligned feature vector for the third image to generate the combined feature vector representing the object. 4. The non-transitory computer readable storage medium as recited in claim 3 , wherein the instructions that cause the computing device to combine the coordinate-aligned feature vector further cause the computing device to use an average pooling layer to determine an average pooling of the first feature vector and the coordinate-aligned feature vector to generate the combined feature vector representing the object. 5. The non-transitory computer readable storage medium as recited in claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: identify surface mapping coordinates comprising two-dimensional coordinates that map to a three-dimensional surface; and generate the three-dimensional mesh representing the object from the combined feature vector representing the object and the surface mapping coordinates using the surface generation neural network. 6. The non-transitory computer readable storage medium as recited in claim 5 , wherein the instructions that cause the computing device to identify the surface mapping coordinates further cause the computing device to: determine a geometry classification for the object from the first image and the second image; and identify the surface mapping coordinates based on the geometry classification for the object. 7. The non-transitory computer readable storage medium as recited in claim 6 , wherein the instructions that cause the computing device to generate the three-dimensional mesh further cause the computing device to modify, utilizing the surface generation neural network, the surface mapping coordinates to change a base shape of the geometry classification to a target shape corresponding to the object using the combined feature vector. 8. The non-transitory computer readable storage medium as recited in claim 2 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: generate, for a sequence of images of a ground truth object and ground truth camera parameters, an output mesh representing the ground truth object using the neural network encoder, the coordinate transform neural network, and the surface generation neural network; determine a chamfer loss based on three-dimensional coordinates in the output mesh; and learn parameters of the neural network encoder, the coordinate transform neural network, and the surface generation neural network using the chamfer loss. 9. The non-transitory computer readable storage medium as recited in claim 8 , wherein the instructions that cause the computing device to determine the chamfer loss further cause the computing device to: calculate, for each three-dimensional coordinate in the output mesh, a Euclidean distance to a nearest mesh coordinate in a ground truth mesh for the ground truth object; and sum the Euclidean distance across the three-dimensional coordinates in the output mesh to determine the chamfer loss. 10. A system comprising: at least one computer memory device comprising a first image comprising a first view of an object and a second image comprising a second view of the object, wherein the first image corresponds to a first coordinate system and the second image corresponds to a second coordinate system; and one or more servers configured to cause the system to: utilize a neural network encoder to generate a first feature vector by encoding visual features from the first image and a second feature vector by encoding visual features from the second image; determine camera parameters associated with the second image based on a camera pose of a camera that captured the second image; generate a coordinate-aligned feature vector for the second image from the second feature vector by utilizing a coordinate transform neural network to process the second feature vector based on the camera parameters associated with the second image; combine, using a pooling layer that pools a plurality of feature vectors, the coordinate-aligned feature vector for the second image and the first feature vector for the first image to generate a combined feature vector representing the object; identify surface mapping coordinates comprising two-dimensional coordinates that map to a three-dimensional surface; and generate a three-dimensional mesh representing the object within the first coordinate system by processing the combined feature vector representing the object and the surface mapping coordinates using a surface generation neural network. 11. The system as recited in claim 10 , wherein the one or more servers are further configured to: utilize the neural network encoder to generate a third feature vector by encoding visual features from a third image comprising a third view of the object; and generate an additional coordinate-aligned feature vector for the third image from the third feature vector by utilizing the coordinate transform neural network to process the third feature vector based on camera parameters associated with the third image utilizing the coordinate transform neural network. 12. The system as recited in claim 11 , wherein the one or more servers are further configured to combine, using the pooling layer, the first feature vector, the coordinate-aligned feature vector for the second image, and the additional coordinate-aligned feature vector for the third image to generate the combined feature vector representing the object. 13. The system as recited i
Three-dimensional [3D] modelling for computer graphics · CPC title
Camera pose · CPC title
from motion · CPC title
Artificial neural networks [ANN] · CPC title
Vector quantisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.