Temporally amortized supersampling using a kernel splatting network
US-2024296605-A1 · Sep 5, 2024 · US
US12205292B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12205292-B2 |
| Application number | US-202117378155-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 16, 2021 |
| Priority date | Jul 16, 2021 |
| Publication date | Jan 21, 2025 |
| Grant date | Jan 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods and apparatus for sematic segmentation of 3D point clouds using deep neural networks. The deep neural network generally has two primary subsystems: a multi-branch cascaded subnetwork that includes an encoder and a decoder, and is configured to receive a sparse 3D point cloud, and capture and fuse spatial feature information in the sparse 3D point cloud at multiple scales and multi hierarchical levels; and a spatial feature transformer subnetwork that is configured to transform the cascaded features generated by the multi-branch cascaded subnetwork and fuse these scaled features using a shared decoder attention framework to assist in the prediction of sematic classes for the sparse 3D point cloud.
Opening claim text (preview).
The invention claimed is: 1. A method for semantic segmentation of a 3D point cloud, the method comprising: processing a 3D point cloud to produce a sparse tensor; feeding the sparse tensor as an input to each of a plurality of branches of an encoder of a neural network to produce a plurality of branch feature maps, N being a number of the plurality of branches, N being equal to or greater than 3, each ith branch respectively comprising i sequentially chained different encoder blocks to produce an ith branch feature map, i being an integer between 1 and N; feeding the plurality of branch feature maps to a plurality of hierarchical attention blocks to generate a plurality of emphasized feature maps, wherein, for each pth branch of a 3rd to Nth branches, a pth branch feature map and a (p−2) th emphasized feature map are fed to a corresponding (p−1) th hierarchical attention block, the (p−2) th emphasized feature map is output by a (p−2) th hierarchical attention block, and wherein a first branch feature map and a second branch feature map are fed to a first hierarchical attention block; feeding each emphasized feature map output by the plurality of hierarchical attention blocks to a spatial feature transformer to fuse each emphasized feature map of the plurality of hierarchical attention blocks and generate a fused feature map; and processing the fused feature map and a final decoder block of a decoder to predict a class label for a plurality of points in the 3D point cloud. 2. The method of claim 1 , wherein processing the 3D point cloud to produce the sparse tensor is obtained by pre-processing the 3D point cloud to generate a voxel representation of the 3D point cloud. 3. The method of claim 2 , wherein the sparse tensor comprises for each point in the 3D point cloud, a set of coordinates and one or more associated features corresponding to the set of coordinates. 4. The method of claim 3 , wherein each set of coordinates is contained within a coordinate matrix, wherein the one or more associated features are contained within a feature matrix. 5. The method of claim 1 , further comprising, feeding an (N−1) th emphasized feature map output by an (N−1) th hierarchical attention block to a first decoder block. 6. The method of claim 5 , wherein the first decoder block is first of N decoder blocks. 7. The method of claim 6 , further comprising, feeding (N−1) encoder-decoder skip connection outputs from a first through (N−1) th encoder blocks of N encoder blocks to the N decoder blocks, wherein encoder-decoder skip connection outputs are fed to the N decoder blocks by reverse order of respective depth. 8. The method of claim 7 , wherein processing the fused feature map comprises feeding the fused feature map to an nth decoder block. 9. The method of claim 8 , further comprising fusing the fused feature map, an output of an (N−1) th decoder block and the output of first encoder blocks, wherein the fusing comprises concatenation followed by a convolution operation. 10. The method of claim 1 , further comprising scaling each emphasized feature map output by the plurality of hierarchical attention blocks to a common scale, prior to obtaining the fused feature map. 11. The method of claim 1 , further comprising assigning a weight to each of a plurality of channels, the plurality of channels corresponding to each output of the plurality of hierarchical attention blocks, prior to obtaining the fused feature map. 12. The method of claim 11 , wherein a kernel size of each encoder block is given according to: K = ⌊ N + 2 - p 2 M ⌋ + 3 wherein K is the kernel size, and M is block depth, and is a floor operation that rounds a value of N + 2 - p 2 M to a nearest integer value. 13. The method of claim 1 , wherein, for the first hierarchical attention block of the plurality of hierarchical attention blocks, the first hierarchical attention block comprises a first convolutional operation and a second convolutional operation. 14. The method of claim 13 , wherein, when a (p−1) th branch feature map and the pth branch feature map are fed to the corresponding (p−1) th hierarchical attention block, the pth branch feature map is fed to the second convolutional operation. 15. The method of claim 14 , wherein, when the (p−1) th branch feature map and the pth branch feature map are fed to the corresponding (p−1) th hierarchical attention block, the (p−1) th branch feature map is fed to the first convolutional operation. 16. The method of claim 15 , wherein, when the (p−1) th branch feature map and the pth branch feature map are fed to the corresponding (p−1) th hierarchical attention block, the pth branch feature map is upsampled and fed to the first convolutional operation. 17. The method of claim 16 , wherein, when the (p−1) th branch feature map and the pth branch feature map are fed to the corresponding (p−1) th hierarchical attention block, the (p−1) th branch feature map is downsampled and fed to the second convolutional operation. 18. The method of any one of claim 17 , further comprising: adding a first output and a second output from the first convolutional operation and the second convolutional operation, respectively, to obtain an emphasized feature map from a hierarchical attention block. 19. An apparatus for semantic segmentation of a 3D point cloud, the apparatus comprising: a memory storing executable instructions for implementing a neural network; and at least one processor configured to execute the executable instructions to: process a 3D point cloud to produce a sparse tensor; feed the sparse tensor as an input to each of a plurality of branches of an encoder of the neural network to produce a plurality of branch feature maps, N being a number of the plurality of branches, N being equal to or greater than 3, each ith branch respectively comprising i sequentially chained different encoder blocks to produce an ith branch feature map, i being an integer between 1 and N; feed the plurality of branch feature maps to a plurality of hierarchical attention blocks to generate a plurality of emphasized feature maps, wherein, for each pth branch of a 3rd to Nth branches, a pth branch feature map and the a (p−2) th emphasized feature map are fed to a corresponding (p−1) th hierarchical attention block, the (p−2) th emphasized feature map is output by a (p−2) th hierarchical attention block, and wherein a first branch feature map and a second branch feature map are fed to a first hierarchical attention block; feed each emphasized feature map output by the plurality of hierarchical attention blocks to a spatial feature transformer to fuse each emphasized feature map of the plurality of hierarchical attention blocks and gene
of extracted features · CPC title
Classification techniques · CPC title
for mapping or imaging · CPC title
Artificial neural networks [ANN] · CPC title
Range image; Depth image; 3D point clouds · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.