Methods and systems for semantic scene completion for sparse 3D data
US-12079970-B2 · Sep 3, 2024 · US
US12400333B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12400333-B2 |
| Application number | US-202318496532-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 27, 2023 |
| Priority date | Dec 13, 2022 |
| Publication date | Aug 26, 2025 |
| Grant date | Aug 26, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are a method, device, computer system for detecting pedestrians based on 3D point clouds. The method includes: obtaining the spatial radar point cloud data of area to be detected; dividing the spatial radar point cloud data to obtain a plurality of 3D voxel grid cells according to a preset unit of voxel; encoding the plurality of 3D voxel grid cells and obtaining the voxel encoded data of the plurality of radar point cloud data; obtaining a first feature map and a second feature map based on a predetermined sparse convolutional backbone network and self-attention transformation network; and performing fusing processing for a fused feature map to input into a predetermined pedestrian detection model for pedestrian detection to obtain the pedestrian detection information of the area to be detected. The present disclosure enables more comprehensive pedestrian detection in the area to be detected with improved accuracy.
Opening claim text (preview).
What is claimed: 1. A method for detecting pedestrian based on 3D point clouds, comprising steps of: obtaining spatial radar point cloud data of area to be detected with a sensor, wherein the spatial radar point cloud data includes a plurality of radar point cloud data; performing following steps with a processor: dividing the spatial radar point cloud data to obtain a plurality of 3D voxel grid cells according to a preset unit of voxel, wherein the plurality of 3D voxel grid cells comprises a plurality of radar point cloud data; encoding the plurality of 3D voxel grid cells and obtaining voxel encoded data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells; inputting the voxel encoded data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells into a predetermined sparse convolutional backbone network for feature extraction to obtain a first feature map; dividing the plurality of 3D voxel grid cells respectively according to a predetermined window to obtain a plurality of 3D sub-voxel grid cells corresponding to the plurality of 3D voxel grid cells and obtaining voxel encoded data of the plurality of radar point cloud data corresponding to the plurality of 3D sub-voxel grid cells; obtaining a second feature map according to the voxel encoded data of the plurality of radar point cloud data corresponding to the plurality of 3D sub-voxel grid cells and a predetermined second feature extraction algorithm, wherein the second feature extraction algorithm is: F 2 =W 2 (LN(MSA(LN( F 1 ),PE( I ))+ F 1 ))+ b 2 +MSA(LN( F 1 ),PE( I ))+ F 1 , wherein F 1 represents the voxel encoded data, F 2 represents the second feature map, MSA( ) represents a multi-headed self-attention function, LN( ) represents a layer normalization function, PE(□) represents a position encoding function, W 2 represents a second trainable weight parameter, b 2 represents a second bias parameter, and I represents coordinate data of the radar point cloud data corresponding to the plurality of 3D sub-voxel grid cells on the first feature map; and fusing the first feature map and the second feature map to obtain a fused feature map, and inputting the fused feature map into a predetermined pedestrian detection model for pedestrian detection to obtain the pedestrian detection information for the area to be detected. 2. The method for detecting pedestrian of claim 1 , wherein the step of encoding the plurality of 3D voxel grid cells and obtaining the voxel encoded data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells comprises: obtaining coordinate data and reflectivity data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells, wherein the coordinate data comprise original coordinate data, average coordinate difference data, and central coordinate difference data; splicing the coordinate data and reflectivity data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells, to obtain voxel spliced data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells thereof; and processing the voxel spliced data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells, to obtain the voxel encoded data thereof according to a predetermined encoding algorithm, wherein the predetermined encoding algorithm is: F 1 =W 1 ( W 0 F p +b 0 )+ b 1 , wherein, F 1 represents the voxel encoded data, W 0 represents a first trainable weight parameter, W 1 represents a second trainable weight parameter, F p represents the voxel spliced data, b 0 represents a first bias parameter, and b 1 represents a second bias parameter. 3. The method for detecting pedestrian of claim 2 , wherein the step of obtaining coordinate data and reflectivity data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells comprises: obtaining the average coordinate data of the plurality of 3D voxel grid cells, according to predetermined average coordinate algorithms and the original coordinate data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells, wherein the predetermined average coordinate algorithms are: x ¯ = ∑ i = 0 k x i , y ¯ = ∑ i = 0 k y i , z ¯ = ∑ i = 0 k z i , wherein, k represents any of x, y, z coordinate axes, i represents the i-th radar point cloud data in the 3D voxel grid cell, x i represents the original coordinate data of the i-th radar point cloud data on the x coordinate axis, x represents the average coordinate data on the x coordinate axis, y i represents the original coordinate data of the i-th radar point cloud data on the y coordinate axis, y represents the average coordinate data on the y coordinate axis, z i represents the original coordinate data of the i-th radar point cloud data on the z coordinate axis, z represents the average coordinate data on the z coordinate axis; obtaining the original coordinate data, length data and initial offset data of the plurality of 3D voxel grid cells and then obtaining central coordinate data thereof according to a predetermined central coordinate algorithm, wherein the predetermined central coordinate algorithm is: v k =coord k *v k +offset k , wherein, v k represents the central coordinate data of the 3D voxel grid cell on the k-th coordinate axis, coord k represents the original coordinate data of the 3D voxel grid cell on the k-th coordinate axis, v k represents the length data of the 3D voxel grid cell on the k-th coordinate axis, offset k represents the initial offset data of the 3D voxel grid cell on the k-th coordinate axis; and obtaining average coordinate difference data and the central coordinate difference data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells, according to the original coordinate data of the plurality of radar point cloud data corresponding to the plurality of 3D voxel grid cells, the average coordinate data and the central coordinate data of the plurality of 3D voxel grid cells. 4. The method for detecting pedestrian of claim 1 , wherein the st
Obstacle · CPC title
Human being; Person · CPC title
Image fusion; Image merging · CPC title
Artificial neural networks [ANN] · CPC title
Probabilistic image processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.