Method of segmenting pedestrians in roadside image by using convolutional network fusing features at different scales

US11783594B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11783594-B2
Application numberUS-201917267493-A
CountryUS
Kind codeB2
Filing dateMay 16, 2019
Priority dateMar 4, 2019
Publication dateOct 10, 2023
Grant dateOct 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention discloses a method for segmenting pedestrians in roadside images using a variable-scale multi-feature fusion convolutional network. It addresses the challenge of significant changes in pedestrian scale by using two parallel convolutional neural networks to extract the local and global features at different scales, and then fusing them to obtain a variable-scale multi-feature fusion convolutional neural network, and this network is trained using roadside pedestrian images to realize accurate pedestrian segmentation, avoiding issues with boundary fuzziness and missing segments commonly found in single-network methods.

First claim

Opening claim text (preview).

The invention claimed is: 1. A roadside image pedestrian segmentation method based on a variable-scale multi-feature fusion convolutional network, comprising (1) establishing a pedestrian segmentation dataset; and (2) constructing a variable-scale multi-feature fusion convolutional neural network comprising the steps of firstly designing two parallel convolutional neural networks to extract the local and global features of pedestrians at different scales in the image; wherein a first network designs a fine feature extraction structure for small-scale pedestrians; a second network expands the receptive field of the network at the shallow level for large-scale pedestrians; secondly providing a two-level fusion strategy to fuse extracted features by the following steps first, fusing features of same level at different scales to obtain local and global features that are suitable for variable-scale pedestrians, and then constructing a jump connection structure to the fused local features and global features for the second time so as to obtain the complete local detailed information and global information of variable-scale pedestrians, and finally obtaining a variable-scale multi-feature fusion convolutional neural network which includes the following sub-steps: Sub-step 1: designing the first convolutional neural network for small-scale pedestrians, including: {circle around (1)} designing pooling layers wherein a number of pooling layers is 2; the pooling layers use a maximum pooling operation, their sampling sizes are both 2×2, and their step length is both 2; {circle around (2)} designing standard convolutional layers such that a number of standard convolutional layers is 18, of which 8 layers all have a convolutional kernel size of 3×3 and a number of the convolutional kernels is 64, 64, 128, 128, 256, 256, 256 and 2, respectively, and the step length is 1; and the remaining 10 layers all have a convolutional kernel size of 1×1, the number of their convolutional kernels are 32, 32, 64, 64, 128, 128, 128, 128, 128 and 128, respectively, and their step length is 1; {circle around (3)} designing deconvolutional layers; such that a number of deconvolutional layers is 2, the size of their convolutional kernels is all 3×3 and their step length is all 2, and the numbers of convolutional kernels are 2 and 2, respectively; {circle around (4)} determining the network architecture in order to establish different network models according to the network layer parameters involved in {circle around (1)}˜{circle around (3)} in sub-step 1 of step (2), and then use the dataset established in step (1) to verify these models, and filtering out the optimal network structure in terms of both accuracy and real-timeliness an optimal network structure is obtained as follows: Standard convolutional layer 1_1: using 64 3×3 convolutional kernels and input samples with A×A pixels to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Standard convolutional layer 1_1_1: using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1_1_2: using 32 1×1 convolutional kernels and the feature map output by standard convolutional layer 1_1_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×32; Standard convolutional layer 1_2: using 64 3×3 convolutional kernels and the feature map output by standard convolutional layer 1_1_2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A×A×64; Pooling layer 1: using the feature map output by 2×2 verified standard convolutional layer 1_2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 2 × A 2 × 6 ⁢ 4 ; Standard convolutional layer 2_1: using 128 3×3 convolutional kernels and the feature map output by pooling layer 1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 ⁢ 2 ⁢ 8 ; Standard convolutional layer 2_1_1: using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 ⁢ 4 ; Standard convolutional layer 2_1_2: using 64 1×1 convolutional kernels and the feature map output by standard convolutional layer 2_1_1 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 6 ⁢ 4 ; Standard convolutional layer 2_2: using 128 3×3 convolutional kernels and the feature map output by standard convolutional layer 2_1_2 to make convolutions with a step length of 1, and then activating the convolutions with ReLU to obtain a feature map with a dimension of A 2 × A 2 × 1 ⁢ 2 ⁢ 8 ; Pooling layer 2: using the feature map output by 2×2 verified standard convolutional layer 2_2 to make the maximum pooling with a step length of 2 to get a feature map with a dimension of A 4 × A 4 × 1 ⁢ 2 ⁢ 8 ; Standard convolutional layer 3_1: using 256 3×3 co

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G06V20/58Primary

    Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • G06F18/253Primary

    of extracted features · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11783594B2 cover?
The present invention discloses a method for segmenting pedestrians in roadside images using a variable-scale multi-feature fusion convolutional network. It addresses the challenge of significant changes in pedestrian scale by using two parallel convolutional neural networks to extract the local and global features at different scales, and then fusing them to obtain a variable-scale multi-featu…
Who is the assignee on this patent?
Univ Southeast
What technology area does this patent fall under?
Primary CPC classification G06V20/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).