Object detection method and apparatus, electronic device, and storage medium
US-2021287037-A1 · Sep 16, 2021 · US
US12536825B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12536825-B2 |
| Application number | US-202218561616-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2022 |
| Priority date | Dec 23, 2022 |
| Publication date | Jan 27, 2026 |
| Grant date | Jan 27, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a text matting method and apparatus based on a neural network, a device, and a storage medium. The text matting method based on a neural network includes: processing a first image with a feature extraction network to obtain feature maps, processing the feature maps with an intermediate processing network to obtain intermediate feature maps, processing the intermediate feature maps with a feature fusion network to obtain a second image, wherein the second image includes a text feature extracted from the first image.
Opening claim text (preview).
What is claimed is: 1 . A text matting method based on a neural network, comprising: processing a first image with a feature extraction network to obtain feature maps, wherein, the feature extraction network comprises sequentially connected n extraction convolutional network blocks, and wherein a 1-st extraction convolutional network block processes the first image and outputs a 1-st feature map, and an i-th extraction convolutional network block processes an (i−1)-th feature map and outputs an i-th feature map; processing the feature maps with an intermediate processing network to obtain intermediate feature maps, wherein, the intermediate processing network comprises n spatial convolutional network blocks, a 1-st spatial convolutional network block processes the 1-st feature map and outputs a 1-st intermediate feature map, and an i-th spatial convolutional network block processes the i-th feature map and outputs an i-th intermediate feature map; and processing intermediate feature maps with a feature fusion network to obtain a second image, wherein, the feature fusion network comprises sequentially connected n fusion convolutional network blocks, a 1-st fusion convolutional network block processes an n-th intermediate feature map output by an n-th spatial convolutional network block to obtain a 1-st fusion map, an i-th fusion convolutional network block processes an (n−i+1)-th intermediate feature map output by an (n−i+1)-th spatial convolutional network block and an (i−1)-th fusion map output by an (i−1)-th fusion convolutional network block to obtain an i-th fusion map, wherein an n-th fusion map output by an n-th fusion convolutional network block serves as the second image, and the second image comprises a text feature extracted from the first image, and wherein n is an integer greater than 1, and i is an integer greater than 1 but less than or equal to n. 2 . The method according to claim 1 , wherein, the n extraction convolutional network blocks in the feature extraction network are each composed of one or more of a convolutional layer, a pooling layer and a residual convolutional block; and the n fusion convolutional network blocks in the feature fusion network are each composed of one or more of a convolutional layer, a residual convolutional block and a first connection layer; wherein the first connection layer comprises a feature merging layer and a super-resolution layer. 3 . The method according to claim 2 , wherein, the feature merging layer is used for merging a plurality of input images by increasing a number of image channels; and the super-resolution layer is used for increasing image size by reducing the number of image channels, the residual convolutional block is composed of a convolutional layer, a batch normalization layer and an activation function. 4 . The method according to claim 1 , wherein, n is equal to 4; in the feature extraction network, the 1-st extraction convolutional network block is composed of a convolutional layer and a maximum pooling layer, a 2-nd extraction convolutional network block is composed of a residual convolutional block, a convolutional layer and a maximum pooling layer, a 3-rd extraction convolutional network block is composed of a residual convolutional block, a convolutional layer and a maximum pooling layer, and a 4-th extraction convolutional network block is composed of a residual convolutional block and a convolutional layer, wherein in the feature fusion network, a 1-st fusion convolutional network block is composed of a convolutional layer and a residual convolutional block, a 2-nd fusion convolutional network block is composed of a first connection layer, a convolutional layer and a residual convolutional block, a 3-rd fusion convolutional network block is composed of a first connection layer, a convolutional layer and a residual convolutional block, and a 4-th fusion convolutional network block is composed of a first connection layer and a convolutional layer, the first connection layer comprises a feature merging layer and a super-resolution layer. 5 . The method according to claim 4 , wherein, the feature merging layer is used for merging a plurality of input images by increasing a number of image channels; and the super-resolution layer is used for increasing image size by reducing the number of image channels, the residual convolutional block is composed of a convolutional layer, a batch normalization layer and an activation function. 6 . The method according to claim 1 , wherein, the n spatial convolutional network blocks have a same structure; each spatial convolutional network block of the n spatial convolutional network blocks comprises m parallel processing units and a second connection layer; the m processing units are respectively used for processing an input image; the second connection layer is used for merging processing results of the m processing units; the second connection layer comprises a feature merging layer, a convolutional layer and a batch normalization layer; and the feature merging layer is used for merging m images respectively output by the m processing units by increasing a number of image channels, wherein m is an integer greater than 1. 7 . The method according to claim 6 , wherein, m is equal to 5; in each spatial convolutional network block, a 1-st processing unit is composed of a first convolutional layer and a batch normalization layer sequentially connected; a 2-nd processing unit is composed of a second convolutional layer, a batch normalization layer, a third convolutional layer and a batch normalization layer sequentially connected; a 3-rd processing unit is composed of a fourth convolutional layer, a batch normalization layer, a third convolutional layer and a batch normalization layer sequentially connected; a 4-th processing unit is composed of a fifth convolutional layer, a batch normalization layer, a third convolutional layer and a batch normalization layer sequentially connected; and a fifth processing unit is composed of a batch normalization layer, an adaptive average pooling layer, a third convolutional layer and an upsampling layer sequentially connected; and wherein the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer have different network parameters. 8 . The method according to claim 1 , wherein, pixel points of the n-th fusion map are represented as text probability values located in an interval of 0 to 1, and are quantified as a grayscale image in an interval of 0 to 255. 9 . The method according to claim 1 , further comprising: generating a training image set and annotating training images in the training image set to generate corresponding mask images; and training, based on a training function, the feature extraction network, the intermediate processing network and the feature fusion network with the training image set. 10 . The method according to claim 9 , wherein, the generating the training image set comprises: constructing a corpus composed of text characters, a font database composed of text fonts, a texture feature database composed of text colors, and a background database composed of background pictures; and randomly selecting a group of a text character, a text font, a text color, and a background picture from the corpus, the font database, the texture feature database, and the background database, and forming a training image, wherein the text characters comprise Chinese characters and English characters; the text fonts comprise Chinese fonts and English fonts; and the text colors comprise solid colors, textures, and patterns. 11 . T
Statistical pre-processing, e.g. techniques for normalisation or restoring missing data · CPC title
Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion · CPC title
Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Image preprocessing · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.