Self-supervised single-view 3d reconstruction via semantic consistency
US-2021287430-A1 · Sep 16, 2021 · US
US2022343601A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022343601-A1 |
| Application number | US-202217659449-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 15, 2022 |
| Priority date | Apr 21, 2021 |
| Publication date | Oct 27, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One or more two-dimensional images of a three-dimensional object may be analyzed to estimate a three-dimensional mesh representing the object and a mapping of the two-dimensional images to the three-dimensional mesh. Initially, a correspondence may be determined between the images and a UV representation of a three-dimensional template mesh by training a neural network. Then, the three-dimensional template mesh may be deformed to determine the representation of the object. The process may involve a reprojection loss cycle in which points from the images are mapped onto the UV representation, then onto the three-dimensional template mesh, and then back onto the two-dimensional images.
Opening claim text (preview).
The invention claimed is: 1 . A method comprising: determining via a processor a correspondence between one or more two-dimensional images of a three-dimensional object and a UV representation of a three-dimensional template mesh of the three-dimensional object by training a neural network, the three-dimensional template mesh including a plurality of points in three-dimensional space and a plurality of edges between the plurality of points; determining via the processor a deformation of the three-dimensional template mesh, the deformation displacing one or more of the plurality of points, wherein the deformation is determined so as to reduce reprojection consistency loss when mapping points from the two-dimensional images back onto the two-dimensional images through both the UV representation and the three-dimensional template mesh; and storing on a storage device a deformed three-dimensional template mesh. 2 . The method recited in claim 1 , wherein training the neural network comprises predicting, for a first location in a designated one of the two-dimensional images, a corresponding second location in the UV representation. 3 . The method recited in claim 2 , wherein training the neural network further comprises determining a third location in the three-dimensional template mesh by mapping the second location to the third location via the UV parameterization. 4 . The method recited in claim 3 , wherein training the neural network further comprises determining a fourth location in the designated two-dimensional image by projecting the third location onto a virtual camera pose associated with the designated two-dimensional image. 5 . The method recited in claim 4 , wherein training the neural network further comprises determining a reprojection consistency loss value representing a displacement in two-dimensional space between the first location and the fourth location. 6 . The method recited in claim 5 , wherein training the neural network further comprises updating the neural network based on the reprojection consistency loss value. 7 . The method recited in claim 4 , the method further comprising: determining the virtual camera pose by analyzing the two-dimensional image to identify a virtual camera position and virtual camera orientation for the two-dimensional image relative to the three-dimensional template mesh. 8 . The method recited in claim 4 , wherein the one or more two-dimensional images include at least the designated two-dimensional image and a proximate two-dimensional image, the proximate two-dimensional image being captured from a proximate virtual camera pose that is proximate to the virtual camera pose, wherein the reprojection consistency loss value depends in part on a proximate reprojection consistency loss value computed for a corresponding pixel in the proximate two-dimensional image. 9 . The method recited in claim 1 , wherein training the neural network comprises determining a visibility loss value representing occlusion of a designated portion of the three-dimensional object within a designated one of the two-dimensional images and update the neural network based on the visibility loss value. 10 . The method recited in claim 1 , the method further comprising: determining an object type corresponding to the three-dimensional object by analyzing one or more of the one or more two-dimensional images; and selecting the three-dimensional template mesh from a plurality of available three-dimensional template meshes, the three-dimensional template mesh corresponding with the object type. 11 . The method recited in claim 10 , wherein the object type is a vehicle, and wherein the three-dimensional template mesh provides a generic representation of vehicles. 12 . The method recited in claim 10 , wherein the object type is a vehicle sub-type, and wherein the three-dimensional template mesh provides a generic representation of the vehicle sub-type. 13 . A computing system comprising a processor and a storage device, the computing system configured to perform a method comprising: determining via the processor a correspondence between one or more two-dimensional images of a three-dimensional object and a UV representation of a three-dimensional template mesh of the three-dimensional object by training a neural network, the three-dimensional template mesh including a plurality of points in three-dimensional space and a plurality of edges between the plurality of points; determining via the processor a deformation of the three-dimensional template mesh, the deformation displacing one or more of the plurality of points, wherein the deformation is determined so as to reduce reprojection consistency loss when mapping points from the two-dimensional images back onto the two-dimensional images through both the UV representation and the three-dimensional template mesh; and storing on the storage device a deformed three-dimensional template mesh. 14 . The computing system recited in claim 13 , wherein training the neural network comprises predicting, for a first location in a designated one of the two-dimensional images, a corresponding second location in the UV representation, wherein training the neural network further comprises determining a third location in the three-dimensional template mesh by mapping the second location to the third location via the UV parameterization, wherein training the neural network further comprises determining a fourth location in the designated two-dimensional image by projecting the third location onto a virtual camera pose associated with the designated two-dimensional image, wherein training the neural network further comprises determining a reprojection consistency loss value representing a displacement in two-dimensional space between the first location and the fourth location, wherein training the neural network further comprises updating the neural network based on the reprojection consistency loss value. 15 . The computing system recited in claim 14 , the method further comprising: determining the virtual camera pose by analyzing the two-dimensional image to identify a virtual camera position and virtual camera orientation for the two-dimensional image relative to the three-dimensional template mesh. 16 . The computing system recited in claim 14 , wherein the one or more two-dimensional images include at least the designated two-dimensional image and a proximate two-dimensional image, the proximate two-dimensional image being captured from a proximate virtual camera pose that is proximate to the virtual camera pose, wherein the reprojection consistency loss value depends in part on a proximate reprojection consistency loss value computed for a corresponding pixel in the proximate two-dimensional image. 17 . The computing system recited in claim 13 , wherein training the neural network comprises determining a visibility loss value representing occlusion of a designated portion of the three-dimensional object within a designated one of the two-dimensional images and update the neural network based on the visibility loss value. 18 . The computing system recited in claim 13 , the method further comprising: determining an object type corresponding to the three-dimensional object by analyzing one or more of the one or more two-dimensional images; and selecting the three-dimensional template mesh from a plurality of available three-dimensional template meshes, the three-dimensional template mesh corresponding with the object type. 19 . One or more non-t
by matching two-dimensional images to three-dimensional objects · CPC title
using neural networks · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
Vehicle exterior or interior · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.