Methods and systems for semantic scene completion for sparse 3D data

US12079970B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12079970-B2
Application numberUS-202117492261-A
CountryUS
Kind codeB2
Filing dateOct 1, 2021
Priority dateOct 1, 2021
Publication dateSep 3, 2024
Grant dateSep 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for performing semantic scene completion of sparse 3D data are described. A frame of sparse 3D data is preprocessed into a sparse 3D tensor and a sparse 2D tensor. A partially completed 3D tensor is generated from the sparse 3D tensor using a 3D prediction network, and a semantically completed 2D tensor is generated from the sparse 2D tensor using a 2D prediction network. The partially completed 3D tensor is completed to obtain a semantically completed 3D tensor by assigning a given class label, which has been assigned to a given pixel in the semantically completed 2D tensor, to a voxel at a corresponding x-y coordinate in the partially completed 3D tensor.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: obtaining a frame of sparse 3D data captured by a sensor; preprocessing the frame of sparse 3D data into a sparse 3D tensor and a sparse 2D tensor; generating a partially completed 3D tensor from the sparse 3D tensor using a 3D prediction network, the partially completed 3D tensor including voxels missing assigned class labels; generating a semantically completed 2D tensor from the sparse 2D tensor using a 2D prediction network; completing the partially completed 3D tensor to obtain a semantically completed 3D tensor by assigning a given class label, which has been assigned to a given pixel in the semantically completed 2D tensor, to a voxel at a corresponding x-y coordinate in the partially completed 3D tensor; and outputting the semantically completed 3D tensor; wherein preprocessing the frame of sparse 3D data into the sparse 3D tensor comprises: converting the sparse 3D data into a range image; performing depth completion on the range image to obtain a depth-completed range image; performing surface feature extraction on the depth-completed range image to obtain surface normal feature vectors corresponding to respective voxels of the sparse 3D tensor; performing a truncated signed distance function (TSDF) computation on the depth-completed range image to obtain TSDF vectors corresponding to respective voxels of the sparse 3D tensor; and concatenating the respective surface normal feature vector and the respective TSDF vector for each voxel of the sparse 3D tensor to obtain the sparse 3D tensor comprising a feature vector associated with each voxel. 2. The method of claim 1 , wherein completing the partially completed 3D tensor comprises: dividing the partially completed 3D tensor into a plurality of 2D slices, each 2D slice comprising voxels in a x-y plane at a respective different z coordinate; for each given class label in a set of possible class labels: identifying a slice having a highest number of voxels that have been assigned the given class label; identifying all voxels in the identified slice that have x-y coordinates corresponding to x-y coordinates of pixels in the semantically completed 2D tensor that have been assigned the given class label; and for each identified voxel, assigning the given class label to the identified voxel conditional on the given class label being assigned to at least one neighboring voxel in a neighborhood of the identified voxel. 3. The method of claim 2 , wherein when the given class label is not found in the neighborhood of the identified voxel, a next slice corresponding to a next higher z coordinate relative to the identified slice is identified; and wherein the steps of identifying voxels and assigning the given class label are repeated for the identified next slice. 4. The method of claim 1 , wherein generating the partially completed 3D tensor comprises forward propagating the sparse 3D tensor through a sparse convolutional block, one or more encoder blocks, a dilation block, one or more decoder blocks, and a spatial propagation block, wherein the partially completed 3D tensor is outputted from the spatial propagation block; and wherein generating the semantically completed 2D tensor comprises forward propagating the sparse 2D tensor through another sparse convolutional block, one or more other encoder blocks, another dilation block, one or more other decoder blocks, and another spatial propagation block, wherein the semantically completed 2D tensor is outputted from the other spatial propagation block. 5. The method of claim 1 , further comprising: performing 3D spatial propagation on the semantically completed 3D tensor; and outputting the semantically completed 3D tensor after the 3D spatial propagation. 6. The method of claim 1 , wherein preprocessing the frame of sparse 3D data into a sparse 2D tensor comprises: projecting data points of the frame of sparse 3D data into pixels of a 2D bird's eye view (BEV) image in an x-y plane; and computing a feature vector for each pixel, each feature vector encoding intensity data projected from the data points of the sparse 3D data. 7. A computing system comprising a processing unit configured to execute instructions to cause the computing system to: obtain a frame of sparse 3D data captured by a sensor; preprocess the frame of sparse 3D data into a sparse 3D tensor and a sparse 2D tensor; generate a partially completed 3D tensor from the sparse 3D tensor using a 3D prediction network, the partially completed 3D tensor including voxels missing assigned class labels; generate a semantically completed 2D tensor from the sparse 2D tensor using a 2D prediction network; complete the partially completed 3D tensor to obtain a semantically completed 3D tensor by assigning a given class label, which has been assigned to a given pixel in the semantically completed 2D tensor, to a voxel at a corresponding x-y coordinate in the partially completed 3D tensor; and output the semantically completed 3D tensor; wherein the computer system preprocesses the frame of sparse 3D data into a sparse 3D tensor by: converting the sparse 3D data into a range image; performing depth completion on the range image to obtain a depth-completed range image; performing surface feature extraction on the depth-completed range image to obtain surface normal feature vectors corresponding to respective voxels of the sparse 3D tensor; performing a truncated signed distance function (TSDF) computation on the depth-completed range image to obtain TSDF vectors corresponding to respective voxels of the sparse 3D tensor; and concatenating the respective surface normal feature vector and the respective TSDF vector for each voxel of the sparse 3D tensor to obtain the sparse 3D tensor comprising a feature vector associated with each voxel. 8. The computing system of claim 7 , wherein the processing unit is configured to execute instructions to cause the computing system to complete the partially completed 3D tensor by: dividing the partially completed 3D tensor into a plurality of 2D slices, each 2D slice comprising voxels in a x-y plane at a respective different z coordinate; for each given class label in a set of possible class labels: identifying a slice having a highest number of voxels that have been assigned the given class label; identifying all voxels in the identified slice that have x-y coordinates corresponding to x-y coordinates of pixels in the semantically completed 2D tensor that have been assigned the given class label; and for each identified voxel, assigning the given class label to the identified voxel conditional on the given class label being assigned to at least one neighboring voxel in a neighborhood of the identified voxel. 9. The computing system of claim 8 , wherein when the given class label is not found in the neighborhood of the identified voxel, a next slice corresponding to a next higher z coordinate relative to the identified slice is identified; and wherein the steps of identifying voxels and assigning the given class label are repeated for the identified next slice. 10. The computing system of claim 7 , wherein the 3D prediction network and the 2D prediction network are instances of a common neural network with different dimensionality. 11. The computing system of claim 10 , wherein the common neural network comprises: a sparse convolutional block; one or more encoder blocks; a dilation block; one or more decoder blocks; and a spatial propagation block. 12. The computing system of claim 11 , wherein: each encoder block comprises: at least one sparse convolutional block; a squeeze r

Assignees

Inventors

Classifications

  • Partitioning the feature space · CPC title

  • Classification techniques · CPC title

  • Syntactic or semantic context, e.g. balancing · CPC title

  • G06V20/56Primary

    exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12079970B2 cover?
Methods and systems for performing semantic scene completion of sparse 3D data are described. A frame of sparse 3D data is preprocessed into a sparse 3D tensor and a sparse 2D tensor. A partially completed 3D tensor is generated from the sparse 3D tensor using a 3D prediction network, and a semantically completed 2D tensor is generated from the sparse 2D tensor using a 2D prediction network. Th…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V20/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).