A generic modular sparse three-dimensional (3d) convolution design utilizing sparse 3d group convolution
US-2022147791-A1 · May 12, 2022 · US
US12079970B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12079970-B2 |
| Application number | US-202117492261-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 1, 2021 |
| Priority date | Oct 1, 2021 |
| Publication date | Sep 3, 2024 |
| Grant date | Sep 3, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for performing semantic scene completion of sparse 3D data are described. A frame of sparse 3D data is preprocessed into a sparse 3D tensor and a sparse 2D tensor. A partially completed 3D tensor is generated from the sparse 3D tensor using a 3D prediction network, and a semantically completed 2D tensor is generated from the sparse 2D tensor using a 2D prediction network. The partially completed 3D tensor is completed to obtain a semantically completed 3D tensor by assigning a given class label, which has been assigned to a given pixel in the semantically completed 2D tensor, to a voxel at a corresponding x-y coordinate in the partially completed 3D tensor.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: obtaining a frame of sparse 3D data captured by a sensor; preprocessing the frame of sparse 3D data into a sparse 3D tensor and a sparse 2D tensor; generating a partially completed 3D tensor from the sparse 3D tensor using a 3D prediction network, the partially completed 3D tensor including voxels missing assigned class labels; generating a semantically completed 2D tensor from the sparse 2D tensor using a 2D prediction network; completing the partially completed 3D tensor to obtain a semantically completed 3D tensor by assigning a given class label, which has been assigned to a given pixel in the semantically completed 2D tensor, to a voxel at a corresponding x-y coordinate in the partially completed 3D tensor; and outputting the semantically completed 3D tensor; wherein preprocessing the frame of sparse 3D data into the sparse 3D tensor comprises: converting the sparse 3D data into a range image; performing depth completion on the range image to obtain a depth-completed range image; performing surface feature extraction on the depth-completed range image to obtain surface normal feature vectors corresponding to respective voxels of the sparse 3D tensor; performing a truncated signed distance function (TSDF) computation on the depth-completed range image to obtain TSDF vectors corresponding to respective voxels of the sparse 3D tensor; and concatenating the respective surface normal feature vector and the respective TSDF vector for each voxel of the sparse 3D tensor to obtain the sparse 3D tensor comprising a feature vector associated with each voxel. 2. The method of claim 1 , wherein completing the partially completed 3D tensor comprises: dividing the partially completed 3D tensor into a plurality of 2D slices, each 2D slice comprising voxels in a x-y plane at a respective different z coordinate; for each given class label in a set of possible class labels: identifying a slice having a highest number of voxels that have been assigned the given class label; identifying all voxels in the identified slice that have x-y coordinates corresponding to x-y coordinates of pixels in the semantically completed 2D tensor that have been assigned the given class label; and for each identified voxel, assigning the given class label to the identified voxel conditional on the given class label being assigned to at least one neighboring voxel in a neighborhood of the identified voxel. 3. The method of claim 2 , wherein when the given class label is not found in the neighborhood of the identified voxel, a next slice corresponding to a next higher z coordinate relative to the identified slice is identified; and wherein the steps of identifying voxels and assigning the given class label are repeated for the identified next slice. 4. The method of claim 1 , wherein generating the partially completed 3D tensor comprises forward propagating the sparse 3D tensor through a sparse convolutional block, one or more encoder blocks, a dilation block, one or more decoder blocks, and a spatial propagation block, wherein the partially completed 3D tensor is outputted from the spatial propagation block; and wherein generating the semantically completed 2D tensor comprises forward propagating the sparse 2D tensor through another sparse convolutional block, one or more other encoder blocks, another dilation block, one or more other decoder blocks, and another spatial propagation block, wherein the semantically completed 2D tensor is outputted from the other spatial propagation block. 5. The method of claim 1 , further comprising: performing 3D spatial propagation on the semantically completed 3D tensor; and outputting the semantically completed 3D tensor after the 3D spatial propagation. 6. The method of claim 1 , wherein preprocessing the frame of sparse 3D data into a sparse 2D tensor comprises: projecting data points of the frame of sparse 3D data into pixels of a 2D bird's eye view (BEV) image in an x-y plane; and computing a feature vector for each pixel, each feature vector encoding intensity data projected from the data points of the sparse 3D data. 7. A computing system comprising a processing unit configured to execute instructions to cause the computing system to: obtain a frame of sparse 3D data captured by a sensor; preprocess the frame of sparse 3D data into a sparse 3D tensor and a sparse 2D tensor; generate a partially completed 3D tensor from the sparse 3D tensor using a 3D prediction network, the partially completed 3D tensor including voxels missing assigned class labels; generate a semantically completed 2D tensor from the sparse 2D tensor using a 2D prediction network; complete the partially completed 3D tensor to obtain a semantically completed 3D tensor by assigning a given class label, which has been assigned to a given pixel in the semantically completed 2D tensor, to a voxel at a corresponding x-y coordinate in the partially completed 3D tensor; and output the semantically completed 3D tensor; wherein the computer system preprocesses the frame of sparse 3D data into a sparse 3D tensor by: converting the sparse 3D data into a range image; performing depth completion on the range image to obtain a depth-completed range image; performing surface feature extraction on the depth-completed range image to obtain surface normal feature vectors corresponding to respective voxels of the sparse 3D tensor; performing a truncated signed distance function (TSDF) computation on the depth-completed range image to obtain TSDF vectors corresponding to respective voxels of the sparse 3D tensor; and concatenating the respective surface normal feature vector and the respective TSDF vector for each voxel of the sparse 3D tensor to obtain the sparse 3D tensor comprising a feature vector associated with each voxel. 8. The computing system of claim 7 , wherein the processing unit is configured to execute instructions to cause the computing system to complete the partially completed 3D tensor by: dividing the partially completed 3D tensor into a plurality of 2D slices, each 2D slice comprising voxels in a x-y plane at a respective different z coordinate; for each given class label in a set of possible class labels: identifying a slice having a highest number of voxels that have been assigned the given class label; identifying all voxels in the identified slice that have x-y coordinates corresponding to x-y coordinates of pixels in the semantically completed 2D tensor that have been assigned the given class label; and for each identified voxel, assigning the given class label to the identified voxel conditional on the given class label being assigned to at least one neighboring voxel in a neighborhood of the identified voxel. 9. The computing system of claim 8 , wherein when the given class label is not found in the neighborhood of the identified voxel, a next slice corresponding to a next higher z coordinate relative to the identified slice is identified; and wherein the steps of identifying voxels and assigning the given class label are repeated for the identified next slice. 10. The computing system of claim 7 , wherein the 3D prediction network and the 2D prediction network are instances of a common neural network with different dimensionality. 11. The computing system of claim 10 , wherein the common neural network comprises: a sparse convolutional block; one or more encoder blocks; a dilation block; one or more decoder blocks; and a spatial propagation block. 12. The computing system of claim 11 , wherein: each encoder block comprises: at least one sparse convolutional block; a squeeze r
Partitioning the feature space · CPC title
Classification techniques · CPC title
Syntactic or semantic context, e.g. balancing · CPC title
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.