Text matting method and apparatus based on neural network, device, and storage medium

US12536825B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536825-B2
Application numberUS-202218561616-A
CountryUS
Kind codeB2
Filing dateDec 23, 2022
Priority dateDec 23, 2022
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a text matting method and apparatus based on a neural network, a device, and a storage medium. The text matting method based on a neural network includes: processing a first image with a feature extraction network to obtain feature maps, processing the feature maps with an intermediate processing network to obtain intermediate feature maps, processing the intermediate feature maps with a feature fusion network to obtain a second image, wherein the second image includes a text feature extracted from the first image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A text matting method based on a neural network, comprising: processing a first image with a feature extraction network to obtain feature maps, wherein, the feature extraction network comprises sequentially connected n extraction convolutional network blocks, and wherein a 1-st extraction convolutional network block processes the first image and outputs a 1-st feature map, and an i-th extraction convolutional network block processes an (i−1)-th feature map and outputs an i-th feature map; processing the feature maps with an intermediate processing network to obtain intermediate feature maps, wherein, the intermediate processing network comprises n spatial convolutional network blocks, a 1-st spatial convolutional network block processes the 1-st feature map and outputs a 1-st intermediate feature map, and an i-th spatial convolutional network block processes the i-th feature map and outputs an i-th intermediate feature map; and processing intermediate feature maps with a feature fusion network to obtain a second image, wherein, the feature fusion network comprises sequentially connected n fusion convolutional network blocks, a 1-st fusion convolutional network block processes an n-th intermediate feature map output by an n-th spatial convolutional network block to obtain a 1-st fusion map, an i-th fusion convolutional network block processes an (n−i+1)-th intermediate feature map output by an (n−i+1)-th spatial convolutional network block and an (i−1)-th fusion map output by an (i−1)-th fusion convolutional network block to obtain an i-th fusion map, wherein an n-th fusion map output by an n-th fusion convolutional network block serves as the second image, and the second image comprises a text feature extracted from the first image, and wherein n is an integer greater than 1, and i is an integer greater than 1 but less than or equal to n. 2 . The method according to claim 1 , wherein, the n extraction convolutional network blocks in the feature extraction network are each composed of one or more of a convolutional layer, a pooling layer and a residual convolutional block; and the n fusion convolutional network blocks in the feature fusion network are each composed of one or more of a convolutional layer, a residual convolutional block and a first connection layer; wherein the first connection layer comprises a feature merging layer and a super-resolution layer. 3 . The method according to claim 2 , wherein, the feature merging layer is used for merging a plurality of input images by increasing a number of image channels; and the super-resolution layer is used for increasing image size by reducing the number of image channels, the residual convolutional block is composed of a convolutional layer, a batch normalization layer and an activation function. 4 . The method according to claim 1 , wherein, n is equal to 4; in the feature extraction network, the 1-st extraction convolutional network block is composed of a convolutional layer and a maximum pooling layer, a 2-nd extraction convolutional network block is composed of a residual convolutional block, a convolutional layer and a maximum pooling layer, a 3-rd extraction convolutional network block is composed of a residual convolutional block, a convolutional layer and a maximum pooling layer, and a 4-th extraction convolutional network block is composed of a residual convolutional block and a convolutional layer, wherein in the feature fusion network, a 1-st fusion convolutional network block is composed of a convolutional layer and a residual convolutional block, a 2-nd fusion convolutional network block is composed of a first connection layer, a convolutional layer and a residual convolutional block, a 3-rd fusion convolutional network block is composed of a first connection layer, a convolutional layer and a residual convolutional block, and a 4-th fusion convolutional network block is composed of a first connection layer and a convolutional layer, the first connection layer comprises a feature merging layer and a super-resolution layer. 5 . The method according to claim 4 , wherein, the feature merging layer is used for merging a plurality of input images by increasing a number of image channels; and the super-resolution layer is used for increasing image size by reducing the number of image channels, the residual convolutional block is composed of a convolutional layer, a batch normalization layer and an activation function. 6 . The method according to claim 1 , wherein, the n spatial convolutional network blocks have a same structure; each spatial convolutional network block of the n spatial convolutional network blocks comprises m parallel processing units and a second connection layer; the m processing units are respectively used for processing an input image; the second connection layer is used for merging processing results of the m processing units; the second connection layer comprises a feature merging layer, a convolutional layer and a batch normalization layer; and the feature merging layer is used for merging m images respectively output by the m processing units by increasing a number of image channels, wherein m is an integer greater than 1. 7 . The method according to claim 6 , wherein, m is equal to 5; in each spatial convolutional network block, a 1-st processing unit is composed of a first convolutional layer and a batch normalization layer sequentially connected; a 2-nd processing unit is composed of a second convolutional layer, a batch normalization layer, a third convolutional layer and a batch normalization layer sequentially connected; a 3-rd processing unit is composed of a fourth convolutional layer, a batch normalization layer, a third convolutional layer and a batch normalization layer sequentially connected; a 4-th processing unit is composed of a fifth convolutional layer, a batch normalization layer, a third convolutional layer and a batch normalization layer sequentially connected; and a fifth processing unit is composed of a batch normalization layer, an adaptive average pooling layer, a third convolutional layer and an upsampling layer sequentially connected; and wherein the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer have different network parameters. 8 . The method according to claim 1 , wherein, pixel points of the n-th fusion map are represented as text probability values located in an interval of 0 to 1, and are quantified as a grayscale image in an interval of 0 to 255. 9 . The method according to claim 1 , further comprising: generating a training image set and annotating training images in the training image set to generate corresponding mask images; and training, based on a training function, the feature extraction network, the intermediate processing network and the feature fusion network with the training image set. 10 . The method according to claim 9 , wherein, the generating the training image set comprises: constructing a corpus composed of text characters, a font database composed of text fonts, a texture feature database composed of text colors, and a background database composed of background pictures; and randomly selecting a group of a text character, a text font, a text color, and a background picture from the corpus, the font database, the texture feature database, and the background database, and forming a training image, wherein the text characters comprise Chinese characters and English characters; the text fonts comprise Chinese fonts and English fonts; and the text colors comprise solid colors, textures, and patterns. 11 . T

Assignees

Inventors

Classifications

  • Statistical pre-processing, e.g. techniques for normalisation or restoring missing data · CPC title

  • Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion · CPC title

  • Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Image preprocessing · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536825B2 cover?
The present disclosure provides a text matting method and apparatus based on a neural network, a device, and a storage medium. The text matting method based on a neural network includes: processing a first image with a feature extraction network to obtain feature maps, processing the feature maps with an intermediate processing network to obtain intermediate feature maps, processing the interme…
Who is the assignee on this patent?
Boe Technology Group Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).