Method for 3d scene dense reconstruction based on monocular visual slam
US-2020273190-A1 · Aug 27, 2020 · US
US12586293B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586293-B2 |
| Application number | US-202318524803-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 30, 2023 |
| Priority date | Jan 19, 2023 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A technique for reconstructing a three-dimensional scene from monocular video adaptively allocates an explicit sparse-dense voxel grid with dense voxel blocks around surfaces in the scene and sparse voxel blocks further from the surfaces. In contrast to conventional systems, the two-level voxel grid can be efficiently queried and sampled. In an embodiment, the scene surface geometry is represented as a signed distance field (SDF). Representation of the scene surface geometry can be extended to multi-modal data such as semantic labels and color. Because properties stored in the sparse-dense voxel grid structure are differentiable, the scene surface geometry can be optimized via differentiable volume rendering.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method, comprising: computing a depth scale for each predicted depth image in a set of predicted depth images corresponding to a set of monocular images comprising a video of a three-dimensional (3D) scene; calibrating each predicted depth image using the respective depth scale to produce calibrated depth values; constructing a volumetric grid for the 3D scene storing properties comprising the calibrated depth values and corresponding color values; rendering the volumetric grid to produce a set of predicted images; and adjusting the properties to reduce differences between the set of predicted images and the set of monocular images. 2 . The computer-implemented method of claim 1 , further comprising processing the set of monocular images using structure-from-motion supervision to compute the set of predicted depth images. 3 . The computer-implemented method of claim 1 , wherein the properties further comprise a set of predicted normal vector images and a set of semantic images. 4 . The computer-implemented method of claim 1 , wherein constructing the volumetric grid comprises allocating sparse voxel blocks near surfaces in the 3D scene and storing the properties in a dense voxel array within each of the sparse voxel blocks. 5 . The computer-implemented method of claim 4 , further comprising projecting the sparse voxel blocks to the set of monocular images to associate voxels of the dense voxel arrays with the properties. 6 . The computer-implemented method of claim 4 , wherein the sparse voxel blocks are indexed by a collision-free hash map. 7 . The computer-implemented method of claim 1 , wherein adjusting the volumetric grid comprises updating the calibrated depth values and the corresponding color values according to backpropagated gradients. 8 . The computer-implemented method of claim 1 , wherein the calibrating comprises: defining a scale function for each predicted depth image in the set of predicted depth images; and updating the set of predicted depth images to enforce local consistency between visually adjacent monocular images in the set of monocular images. 9 . The computer-implemented method of claim 1 , further comprising, before the rendering, applying a denoising filter to the volumetric grid. 10 . The computer-implemented method of claim 1 , wherein the adjusting of the properties is based on an integral of energy over a surface in the 3D scene that is computed by applying continuous conditional random field smoothing to the calibrated depth values. 11 . The computer-implemented method of claim 1 , wherein at least one of the steps of computing, calibrating, constructing, rendering, and adjusting is performed on a server or in a data center to generate the set of predicted images, and at least a portion of the properties stored in the volumetric grid for the 3D scene are streamed to a user device. 12 . The computer-implemented method of claim 1 , wherein at least one of the steps of computing, calibrating, constructing, rendering, and adjusting is performed within a cloud computing environment. 13 . The computer-implemented method of claim 1 , wherein at least one of the steps of computing, calibrating, constructing, rendering, and adjusting is performed for training, testing, or certifying a neural network employed in a machine, robot, or autonomous vehicle. 14 . The computer-implemented method of claim 1 , wherein at least one of the steps of computing, calibrating, constructing, rendering, and adjusting is performed on a virtual machine comprising a portion of a graphics processing unit. 15 . A system, comprising: a memory that stores a set of monocular images comprising a video of a three-dimensional (3D) scene; and a processor that is connected to the memory, wherein the processor is configured to: compute a depth scale for each predicted depth image in a set of predicted depth images corresponding to the set of monocular images; calibrate each predicted depth image using the respective depth scale to produce calibrated depth values; construct a volumetric grid for the 3D scene storing properties comprising the calibrated depth values and corresponding color values; render the volumetric grid to produce a set of predicted images; and adjust the properties to reduce differences between the set of predicted images and the set of monocular images. 16 . The system of claim 15 , wherein the properties further comprise a set of predicted normal vector images and a set of semantic images. 17 . The system of claim 15 , wherein constructing the volumetric grid comprises allocating sparse voxel blocks near surfaces in the 3D scene and storing the properties in a dense voxel array within each of the sparse voxel blocks. 18 . A non-transitory computer-readable media storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: computing a depth scale for each predicted depth image in a set of predicted depth images corresponding to a set of monocular images comprising a video of a three-dimensional (3D) scene; calibrating each predicted depth image using the respective depth scale to produce calibrated depth values; constructing a volumetric grid for the 3D scene storing properties comprising the calibrated depth values and corresponding color values; rendering the volumetric grid to produce a set of predicted images; and adjusting the properties to reduce differences between the set of predicted images and the set of monocular images. 19 . The non-transitory computer-readable media of claim 18 , wherein the properties further comprise a set of predicted normal vector images and a set of semantic images. 20 . The non-transitory computer-readable media of claim 18 , wherein constructing the volumetric grid comprises allocating sparse voxel blocks near surfaces in the 3D scene and storing the properties in a dense voxel array within each of the sparse voxel blocks.
Determination of colour characteristics · CPC title
Training; Learning · CPC title
Artificial neural networks [ANN] · CPC title
Range image; Depth image; 3D point clouds · CPC title
Collision detection, intersection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.