Voxel-based feature learning network
US-10970518-B1 · Apr 6, 2021 · US
US11900618B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11900618-B2 |
| Application number | US-202318338328-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 20, 2023 |
| Priority date | Dec 2, 2021 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and a method for detecting a moving target based on multi-frame point clouds. The system comprises a voxel feature extraction module; a transformer module used for matching and fusing the feature tensor sequence, fusing a first feature tensor with a second feature tensor, fusing the fused result with a third feature tensor, fusing the fused result with a fourth feature tensor, and repeating the fusing steps with a next feature tensor to obtain a final fused feature tensor; and an identification module used for extracting features from the final fused feature tensor and outputting detection information of a target. The method comprises the following steps: S1, constructing each system module; S2, training the model by the data in a training set; S3, predicting by the trained model.
Opening claim text (preview).
What is claimed is: 1. A system for detecting a moving target based on multi-frame point clouds, comprising: a voxel feature extraction module, a transformer module comprising a cross-modal attention module, and an identification module, wherein the voxel feature extraction module is configured to voxelize a continuous frame point cloud sequence and extract a feature tensor sequence; wherein the transformer module is configured to: acquire the feature tensor sequence, fuse a first feature tensor with a second feature tensor by the cross-modal attention module, fuse a fused result of the first feature tensor and the second feature tensor, with a third feature tensor, fuse a fused result of the fused result of the first feature tensor and the second feature tensor, and a third feature tensor, with a fourth feature tensor, and repeat the fusing steps with a next feature tensor, until a last feature tensor is fused, to obtain a final fused feature tensor; wherein the cross-modal attention module is configured to: match and fuse two feature tensors according to an attention mechanism to obtain a fused feature tensor by convolution neural network fusion; wherein the identification module is configured to extract features from the final fused feature tensor and output detection information of a target; and wherein the matching and fusion of the cross-modal attention module is as follows: Y ( X_a , X_b ) = soft max_col ( Q - a * Trans ( K - b ) d ) * V_b Y ( X_b , X_a ) = soft max_ col ( Q_b * Trans ( K_a ) d ) * V_a where Q_a=X_a*W_Q and Q_b=X_b*W_Q represent Query in the attention mechanism, respectively; K_a=X_a*W_K and K_b=X_b*W_K represent Key in the attention mechanism, respectively; V_a=X_a*W_V and V_b=X_b*W_V represent Value in the attention mechanism, respectively; X_a and X_b represent two feature tensors to be fused, respectively; W_Q, W_K and W_V represent trainable weight matrices, respectively; d represents the dimensions of Q_a and K_b and Q_b and K_a, respectively; Trans( ) represents a matrix transposition operation; and softmax_col( ) represents a matrix normalization operation by column; and fuse Y(X_a, X_b) and Y(X_b, X_a) by a convolutional neural network to obtain the fused feature tensor: Crossmodal Attention(X_a,X_b)=Conv(Y(X_a, X_b),Y(X_b, X_a)) where Conv ( ) represents the convolutional neural network. 2. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the voxel feature extraction module transforms the continuous frame point cloud sequence into a geodetic coordinate system according to a pose corresponding to each frame, and voxelizes the transformed continuous frame point cloud sequence, wherein the geodetic coordinate system is a Cartesian orthogonal coordinate system with a fixed preset coordinate origin relative to the earth, with a forward direction of a first frame point cloud data being a positive direction of an X axis of the geodetic coordinate system, a right direction being a positive direction of a Y axis of the geodetic coordinate system, and an upward direction being a positive direction of a Z axis of the geodetic coordinate system. 3. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the voxelization takes an average value of points in each voxel as a voxelization feature by constructing a voxel size and a voxelization range. 4. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the feature tensor extraction is to extract features from the features obtained by voxelization by a sparse convolution module to obtain feature tensors; the sparse convolution module comprises a group of sub-convolution modules, and each sub-convolution module comprises a sub-popular convolution layer, a normalization layer and a Relu layer. 5. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the transformer module reshapes a feature tensor with a size of C*D*W*H into a feature tensor with a size of C*(D*W*H), where C represents a number of feature channels, D represents a height, W represents a width and H represents a length, and matches and fuses the reshaped feature tensor sequence. 6. The system for detecting the moving target based on multi-frame point clouds according to claim 5 , wherein the recognition module reshapes the final fused feature tensor into a feature tensor with a size of (C*D)*W*H, and extracts features from the reshaped feature tensor to output the detection information of the target. 7. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the feature tensor sequence is {F_Base_seq[i],0<i<=N}, where i represents a frame index and N represents a number of frames; the feature tensors in the sequence are matched and fused to obtain a fused feature tensor F_Base_fusion_seq[j,j+1], where j represents a frame index, 0<j<=N, and when j=1, a feature tensor F_Base_seq[j] and a feature tensor F_Base_seq[j+1] are fused; when 1<j<N, a fused feature tensor F_Base_fusion_seq[j−1,j] and a feature tensor F_Base_seq[j+1] are loop-fused, and a final fused feature tensor F_Base_fusion_seq[N−1,N] is output. 8. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the identification module obtains a coordinate of a center point of the target, a moving direction of the center point of the target, an offset of the center point of the target, a length, a width and a height of the target, a
Related publications grouped by family.
Answers are generated from the same data shown on this page.