Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning
US-10627823-B1 · Apr 21, 2020 · US
US11783594B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11783594-B2 |
| Application number | US-201917267493-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 16, 2019 |
| Priority date | Mar 4, 2019 |
| Publication date | Oct 10, 2023 |
| Grant date | Oct 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention discloses a method for segmenting pedestrians in roadside images using a variable-scale multi-feature fusion convolutional network. It addresses the challenge of significant changes in pedestrian scale by using two parallel convolutional neural networks to extract the local and global features at different scales, and then fusing them to obtain a variable-scale multi-feature fusion convolutional neural network, and this network is trained using roadside pedestrian images to realize accurate pedestrian segmentation, avoiding issues with boundary fuzziness and missing segments commonly found in single-network methods.
Opening claim text (preview).
The invention claimed is: 1. A roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network, comprising (1) establishing a pedestrian segmentation dataset; and (2) constructing a variable-scale multi-feature fusion convolutional neural network comprising the steps of firstly designing two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in the image; wherein a first network designs a fine feature extraction structure for small-scale pedestrians; a second network expands the receptive field of the network at the shallow level for large-scale pedestrians; secondly providing a two-level fusion strategy to fuse extracted features by the following steps first, fusing features of same level at different scales to obtain local and global features that are suitable for variable-scale pedestrians, and then constructing a jump connection structure to the fused local features and global features for the second time so as to obtain the complete local detailed information and global information of variable-scale pedestrians, and finally obtaining a variable-scale multi-feature fusion convolutional neural network which includes the following sub-steps: Sub-step 1: designing the first convolutional neural network for small-scale pedestrians, including: {circle around (1)} designing pooling layers wherein a number of pooling layers is 2; the pooling layers use a maximum pooling operation, their sampling sizes are both 2×2, and their step length is both 2; {circle around (2)} designing standard convolutional layers such that a number of standard convolutional layers is 18, of which 8 layers all have a convolutional kernel size of 3×3 and a number of the convolutional kernels is 64, 64, 128, 128, 256, 256, 256 and 2, respectively, and the step length is 1; and the remaining 10 layers all have a convolutional kernel size of 1×1, the number of their convolutional kernels are 32, 32, 64, 64, 128, 128, 128, 128, 128 and 128, respectively, and their step length is 1; {circle around (3)} designing deconvolutional layers; such that a number of deconvolutional layers is 2, the size of their convolutional kernels is all 3×3 and their step length is all 2, and the numbers of convolutional kernels are 2 and 2, respectively; {circle around (4)} determining the network architecture in order to establish different network models according to the network layer parameters involved in {circle around (1)}˜{circle around (3)} in sub-step 1 of step (2), and then use the dataset established in step (1) to verify these models, and filtering out the optimal network structure in terms of both accuracy and real-timeliness an optimal network structure is obtained as follows: Standard convolutional layer 1_1: using 64 3×3 convolutional kernels and input samples with A×A pixels to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Standard convolutional layer 1_1_1: using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1_1_2: using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1_1_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1_2: using 64 3×3 convolutional kernels and the feature map output by standard convolutional layer 1_1_2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Pooling layer 1: using the feature map output by 2×2 verified standard convolutional layer 1_2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 2 × A 2 × 6 4 ; Standard convolutional layer 2_1: using 128 3×3 convolutional kernels and the feature map output by pooling layer 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 2 8 ; Standard convolutional layer 2_1_1: using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 4 ; Standard convolutional layer 2_1_2: using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2_1_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 4 ; Standard convolutional layer 2_2: using 128 3×3 convolutional kernels and the feature map output by standard convolutional layer 2_1_2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 2 8 ; Pooling layer 2: using the feature map output by 2×2 verified standard convolutional layer 2_2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 4 × A 4 × 1 2 8 ; Standard convolutional layer 3_1: using 256 3×3 co
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
of extracted features · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.