Method of segmenting pedestrians in roadside image by using convolutional network fusing features at different scales

US2021303911A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021303911-A1
Application numberUS-201917267493-A
CountryUS
Kind codeA1
Filing dateMay 16, 2019
Priority dateMar 4, 2019
Publication dateSep 30, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention discloses a roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network. For scenes where the pedestrian scale changes significantly in the intelligent roadside terminal image, this method designs two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in the image, and then fuses the local features and global features extracted by the first network with the local features and global features extracted by the second network at the same level, and then fuse the fused local features and global features for the second time to obtain a variable-scale multi-feature fusion convolutional neural network, and then train the network and input roadside pedestrian images to realize pedestrian segmentation. The present invention effectively solves the problems that most current pedestrian segmentation methods based on a single network structure are prone to segmentation boundary fuzziness and missing segmentation.

First claim

Opening claim text (preview).

1 . A roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network, comprising: (1) establishing a pedestrian segmentation dataset; and (2) constructing a variable-scale multi-feature fusion convolutional neural network comprising the steps of firstly designing two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in the image; wherein a first network designs a fine feature extraction structure for small-scale pedestrians; a second network expands the receptive field of the network at the shallow level for large-scale pedestrians; secondly providing a two-level fusion strategy to fuse extracted features by the following steps first, fusing fuse features of same level at different scales to obtain local and global features that are suitable for variable-scale pedestrians, and then constructing a jump connection structure to fuse the fused local features and global features for the second time so as to obtain the complete local detailed information and global information of variable-scale pedestrians and finally getting a variable-scale multi-feature fusion convolutional neural network the step includes the following sub-steps: Sub-step 1: designing the first convolutional neural network for small-scale pedestrians, including: {circle around ( 1 )} designing pooling layers wherein a number of pooling layers is 2; the pooling layers use a maximum pooling operation, their sampling sizes are both 2×2, and their step length is both 2; {circle around ( 2 )} designing standard convolutional layers a number of standard convolutional layers is 18, of which 8 layers all have a convolutional kernel size of 3×3 and a number of the convolutional kernels is 64, 64, 128, 128, 256, 256, 256 and 2, respectively, and the step length is 1; and the remaining 10 layers all have a convolutional kernel size of 1×1, the number of their convolutional kernels are 32, 32, 64, 64, 128, 128, 128, 128, 128 and 128, respectively, and their step length is 1; {circle around ( 3 )} designing deconvolutional layers. The number of deconvolutional layers is 2, the size of their convolutional kernels is all 3×3 and their step length is all 2, and the numbers of convolutional kernels are 2 and 2, respectively; {circle around ( 4 )} determining the network architecture establishing different network models according to the network layer parameters involved in {circle around ( 1 )}˜{circle around ( 3 )} in sub-step 1 of step (2), and then use the dataset established in step (1) to verify these models, and filtering out the optimal network structure in terms of both accuracy and real-timeliness an optimal network structure is obtained as follows: Standard convolutional layer 1 _ 1 : using 64 3×3 convolutional kernels and input samples with A×A pixels to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Standard convolutional layer 1 _ 1 _ 1 : using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1 _ 1 _ 2 : using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1 _ 1 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1 _ 2 : using 64 3×3 convolutional kernels and the feature map output by standard convolutional layer 1 _ 1 _ 2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Pooling layer 1 : using the feature map output by 2×2 verified standard convolutional layer 1 _ 2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 2 × A 2 × 6 ⁢ 4 ; Standard convolutional layer 2 _ 1 : using 128 3×3 convolutional kernels and the feature map output by pooling layer 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 ⁢ 2 ⁢ 8 ; Standard convolutional layer 2 _ 1 _ 1 : using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 ⁢ 4 ; Standard convolutional layer 2 _ 1 _ 2 : using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2 _ 1 _ 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 ⁢ 4 ; Standard convolutional layer 2 _ 2 : using 128 3×3 convolutional kernels and the feature map output by standard convolutional layer 2 _ 1 _ 2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 ⁢ 2 ⁢ 8 ; Pooling layer 2 : using the feature map output by 2×2 verified standard convolutional layer 2 _ 2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 4 × A 4 × 1 ⁢ 2 ⁢ 8 ;

Assignees

Inventors

Classifications

  • G06V20/58Primary

    Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • of extracted features · CPC title

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021303911A1 cover?
The present invention discloses a roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network. For scenes where the pedestrian scale changes significantly in the intelligent roadside terminal image, this method designs two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in t…
Who is the assignee on this patent?
Univ Southeast
What technology area does this patent fall under?
Primary CPC classification G06V20/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).