System and method for detecting moving target based on multi-frame point cloud

US11900618B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11900618-B2
Application numberUS-202318338328-A
CountryUS
Kind codeB2
Filing dateJun 20, 2023
Priority dateDec 2, 2021
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and a method for detecting a moving target based on multi-frame point clouds. The system comprises a voxel feature extraction module; a transformer module used for matching and fusing the feature tensor sequence, fusing a first feature tensor with a second feature tensor, fusing the fused result with a third feature tensor, fusing the fused result with a fourth feature tensor, and repeating the fusing steps with a next feature tensor to obtain a final fused feature tensor; and an identification module used for extracting features from the final fused feature tensor and outputting detection information of a target. The method comprises the following steps: S1, constructing each system module; S2, training the model by the data in a training set; S3, predicting by the trained model.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for detecting a moving target based on multi-frame point clouds, comprising: a voxel feature extraction module, a transformer module comprising a cross-modal attention module, and an identification module, wherein the voxel feature extraction module is configured to voxelize a continuous frame point cloud sequence and extract a feature tensor sequence; wherein the transformer module is configured to: acquire the feature tensor sequence, fuse a first feature tensor with a second feature tensor by the cross-modal attention module, fuse a fused result of the first feature tensor and the second feature tensor, with a third feature tensor, fuse a fused result of the fused result of the first feature tensor and the second feature tensor, and a third feature tensor, with a fourth feature tensor, and repeat the fusing steps with a next feature tensor, until a last feature tensor is fused, to obtain a final fused feature tensor; wherein the cross-modal attention module is configured to: match and fuse two feature tensors according to an attention mechanism to obtain a fused feature tensor by convolution neural network fusion; wherein the identification module is configured to extract features from the final fused feature tensor and output detection information of a target; and wherein the matching and fusion of the cross-modal attention module is as follows: Y ⁡ ( X_a , X_b ) = soft ⁢ max_col ⁢ ( Q - ⁢ a * Trans ( K - ⁢ b ) d ) * V_b Y ( X_b , X_a ) = soft ⁢ max_ ⁢ col ⁢ ( Q_b * Trans ( K_a ) d ) * V_a where Q_a=X_a*W_Q and Q_b=X_b*W_Q represent Query in the attention mechanism, respectively; K_a=X_a*W_K and K_b=X_b*W_K represent Key in the attention mechanism, respectively; V_a=X_a*W_V and V_b=X_b*W_V represent Value in the attention mechanism, respectively; X_a and X_b represent two feature tensors to be fused, respectively; W_Q, W_K and W_V represent trainable weight matrices, respectively; d represents the dimensions of Q_a and K_b and Q_b and K_a, respectively; Trans( ) represents a matrix transposition operation; and softmax_col( ) represents a matrix normalization operation by column; and fuse Y(X_a, X_b) and Y(X_b, X_a) by a convolutional neural network to obtain the fused feature tensor: Crossmodal Attention(X_a,X_b)=Conv(Y(X_a, X_b),Y(X_b, X_a)) where Conv ( ) represents the convolutional neural network. 2. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the voxel feature extraction module transforms the continuous frame point cloud sequence into a geodetic coordinate system according to a pose corresponding to each frame, and voxelizes the transformed continuous frame point cloud sequence, wherein the geodetic coordinate system is a Cartesian orthogonal coordinate system with a fixed preset coordinate origin relative to the earth, with a forward direction of a first frame point cloud data being a positive direction of an X axis of the geodetic coordinate system, a right direction being a positive direction of a Y axis of the geodetic coordinate system, and an upward direction being a positive direction of a Z axis of the geodetic coordinate system. 3. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the voxelization takes an average value of points in each voxel as a voxelization feature by constructing a voxel size and a voxelization range. 4. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the feature tensor extraction is to extract features from the features obtained by voxelization by a sparse convolution module to obtain feature tensors; the sparse convolution module comprises a group of sub-convolution modules, and each sub-convolution module comprises a sub-popular convolution layer, a normalization layer and a Relu layer. 5. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the transformer module reshapes a feature tensor with a size of C*D*W*H into a feature tensor with a size of C*(D*W*H), where C represents a number of feature channels, D represents a height, W represents a width and H represents a length, and matches and fuses the reshaped feature tensor sequence. 6. The system for detecting the moving target based on multi-frame point clouds according to claim 5 , wherein the recognition module reshapes the final fused feature tensor into a feature tensor with a size of (C*D)*W*H, and extracts features from the reshaped feature tensor to output the detection information of the target. 7. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the feature tensor sequence is {F_Base_seq[i],0<i<=N}, where i represents a frame index and N represents a number of frames; the feature tensors in the sequence are matched and fused to obtain a fused feature tensor F_Base_fusion_seq[j,j+1], where j represents a frame index, 0<j<=N, and when j=1, a feature tensor F_Base_seq[j] and a feature tensor F_Base_seq[j+1] are fused; when 1<j<N, a fused feature tensor F_Base_fusion_seq[j−1,j] and a feature tensor F_Base_seq[j+1] are loop-fused, and a final fused feature tensor F_Base_fusion_seq[N−1,N] is output. 8. The system for detecting the moving target based on multi-frame point clouds according to claim 1 , wherein the identification module obtains a coordinate of a center point of the target, a moving direction of the center point of the target, an offset of the center point of the target, a length, a width and a height of the target, a

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900618B2 cover?
A system and a method for detecting a moving target based on multi-frame point clouds. The system comprises a voxel feature extraction module; a transformer module used for matching and fusing the feature tensor sequence, fusing a first feature tensor with a second feature tensor, fusing the fused result with a third feature tensor, fusing the fused result with a fourth feature tensor, and repe…
Who is the assignee on this patent?
Zhejiang Lab
What technology area does this patent fall under?
Primary CPC classification G06T7/251. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).