Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning
US-10627823-B1 · Apr 21, 2020 · US
US2021303911A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021303911-A1 |
| Application number | US-201917267493-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 16, 2019 |
| Priority date | Mar 4, 2019 |
| Publication date | Sep 30, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention discloses a roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network. For scenes where the pedestrian scale changes significantly in the intelligent roadside terminal image, this method designs two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in the image, and then fuses the local features and global features extracted by the first network with the local features and global features extracted by the second network at the same level, and then fuse the fused local features and global features for the second time to obtain a variable-scale multi-feature fusion convolutional neural network, and then train the network and input roadside pedestrian images to realize pedestrian segmentation. The present invention effectively solves the problems that most current pedestrian segmentation methods based on a single network structure are prone to segmentation boundary fuzziness and missing segmentation.
Opening claim text (preview).
1 . A roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network, comprising: (1) establishing a pedestrian segmentation dataset; and (2) constructing a variable-scale multi-feature fusion convolutional neural network comprising the steps of firstly designing two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in the image; wherein a first network designs a fine feature extraction structure for small-scale pedestrians; a second network expands the receptive field of the network at the shallow level for large-scale pedestrians; secondly providing a two-level fusion strategy to fuse extracted features by the following steps first, fusing fuse features of same level at different scales to obtain local and global features that are suitable for variable-scale pedestrians, and then constructing a jump connection structure to fuse the fused local features and global features for the second time so as to obtain the complete local detailed information and global information of variable-scale pedestrians and finally getting a variable-scale multi-feature fusion convolutional neural network the step includes the following sub-steps: Sub-step 1: designing the first convolutional neural network for small-scale pedestrians, including: {circle around ( 1 )} designing pooling layers wherein a number of pooling layers is 2; the pooling layers use a maximum pooling operation, their sampling sizes are both 2×2, and their step length is both 2; {circle around ( 2 )} designing standard convolutional layers a number of standard convolutional layers is 18, of which 8 layers all have a convolutional kernel size of 3×3 and a number of the convolutional kernels is 64, 64, 128, 128, 256, 256, 256 and 2, respectively, and the step length is 1; and the remaining 10 layers all have a convolutional kernel size of 1×1, the number of their convolutional kernels are 32, 32, 64, 64, 128, 128, 128, 128, 128 and 128, respectively, and their step length is 1; {circle around ( 3 )} designing deconvolutional layers. The number of deconvolutional layers is 2, the size of their convolutional kernels is all 3×3 and their step length is all 2, and the numbers of convolutional kernels are 2 and 2, respectively; {circle around ( 4 )} determining the network architecture establishing different network models according to the network layer parameters involved in {circle around ( 1 )}˜{circle around ( 3 )} in sub-step 1 of step (2), and then use the dataset established in step (1) to verify these models, and filtering out the optimal network structure in terms of both accuracy and real-timeliness an optimal network structure is obtained as follows: Standard convolutional layer 1 _ 1 : using 64 3×3 convolutional kernels and input samples with A×A pixels to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Standard convolutional layer 1 _ 1 _ 1 : using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1 _ 1 _ 2 : using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1 _ 1 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1 _ 2 : using 64 3×3 convolutional kernels and the feature map output by standard convolutional layer 1 _ 1 _ 2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Pooling layer 1 : using the feature map output by 2×2 verified standard convolutional layer 1 _ 2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 2 × A 2 × 6 4 ; Standard convolutional layer 2 _ 1 : using 128 3×3 convolutional kernels and the feature map output by pooling layer 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 2 8 ; Standard convolutional layer 2 _ 1 _ 1 : using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 4 ; Standard convolutional layer 2 _ 1 _ 2 : using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2 _ 1 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 4 ; Standard convolutional layer 2 _ 2 : using 128 3×3 convolutional kernels and the feature map output by standard convolutional layer 2 _ 1 _ 2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 2 8 ; Pooling layer 2 : using the feature map output by 2×2 verified standard convolutional layer 2 _ 2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 4 × A 4 × 1 2 8 ;
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
of extracted features · CPC title
using neural networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.