Method of rectifying text image, training method, electronic device, and medium

US12518503B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12518503-B2
Application numberUS-202218077026-A
CountryUS
Kind codeB2
Filing dateDec 7, 2022
Priority dateDec 8, 2021
Publication dateJan 6, 2026
Grant dateJan 6, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of rectifying a text image, a training method, an electronic device, and a medium, which relate to a field of an artificial intelligence technology, in particular to fields of computer vision, deep learning technology, intelligent transportation and high-precision maps. An exemplary implementation includes: performing, based on a gating strategy, a plurality of first layer-wise processing on a text image to be rectified, so as to obtain respective feature maps of a plurality of layer levels, wherein each of the feature maps includes a text structural feature related to the text image to be rectified, and the gating strategy is configured to increase an attention to the text structural feature; and performing a plurality of second layer-wise processing on the respective feature maps of the plurality of layer levels, so as to obtain a rectified text image corresponding to the text image to be rectified.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of rectifying a text image, the method comprising: performing, based on a gating strategy, a plurality of first layer-wise processing on a text image to be rectified, so as to obtain respective feature maps of a plurality of layer levels, wherein each of the feature maps comprises a text structural feature related to the text image to be rectified, and the gating strategy is configured to increase an attention to the text structural feature; and performing a plurality of second layer-wise processing on the respective feature maps of the plurality of layer levels, so as to obtain a rectified text image corresponding to the text image to be rectified, wherein the performing, based on a gating strategy, a plurality of first layer-wise processing comprises performing, based on a text image rectification model, a plurality of first layer-wise processing on the text image to be rectified, so as to obtain the respective feature maps of the plurality of layer levels, wherein the text image rectification model comprises a gating module created according to the gating strategy, wherein the text image rectification model further comprises an encoder and a decoder, the gating module comprises a plurality of channel layer units, and each of the channel layer units is configured to determine a channel weight of each channel in the feature map corresponding to the channel layer unit, wherein the performing, based on a text image rectification model, a plurality of first layer-wise processing comprises performing, based on the encoder and the plurality of channel layer units, a plurality of first layer-wise processing on the text image to be rectified, so as to obtain the respective feature maps of the plurality of layer levels, wherein the performing a plurality of second layer-wise processing comprises performing, based on the decoder, a plurality of second layer-wise processing on the respective feature maps of the plurality of layer levels, so as to obtain the rectified text image corresponding to the text image to be rectified, wherein the encoder comprises N down-sampling modules connected in cascade, the decoder comprises N up-sampling modules connected in cascade, and the gating module comprises N channel layer units, where N is an integer greater than 1; wherein the performing, based on the encoder and the plurality of channel layer units, a plurality of first layer-wise processing comprises: for 1<i≤N, processing a first down-sampling feature map of an (i−1) th layer level by using an (i−1) th channel layer unit, so as to obtain a channel weight feature map of the (i−1) th layer level; and processing the channel weight feature map of the (i−1) th layer level by using an i th down-sampling module, so as to obtain a first down-sampling feature map of the i th layer level; and wherein the performing, based on the decoder, a plurality of second layer-wise processing comprises: for 1≤i<N, processing a first output feature map of an (i+1) th layer level by using an i th up-sampling module, so as to obtain a first up-sampling feature map of an i th layer level; fusing the first down-sampling feature map and the first up-sampling feature map of the i th layer level to obtain a first fusion feature map of the i th layer level; processing the first fusion feature map of the i th layer level by using the i th up-sampling module, so as to obtain a first output feature map of the i th layer level; and determining, according to the first output feature map of a first layer level, the rectified text image corresponding to the text image to be rectified. 2 . The method according to claim 1 , wherein the gating module further comprises a fine-grain layer unit; further comprising processing a channel weight feature map of an N th layer level by using the fine-grain layer unit, so as to obtain a first fine-grain feature map of the N th layer level; and wherein the performing, based on the decoder, a plurality of second layer-wise processing on the respective feature maps of the plurality of layer levels, so as to obtain the rectified text image corresponding to the text image to be rectified comprises: for i=N, processing the first fine-grain feature map of the N th layer level by using an N th up-sampling module, so as to obtain a first up-sampling feature map of the N th layer level; fusing the first up-sampling feature map and the first down-sampling feature map of the N th layer level to obtain a first fusion feature map of the N th layer level; and processing the first fusion feature map of the N th layer level by using the N th up-sampling module, so as to obtain a first output feature map of the N th layer level. 3 . The method according to claim 1 , wherein the gating module further comprises N coarse-grain layer units; further comprising processing a first down-sampling feature map of an i th layer level by using an i th coarse-grain layer unit, so as to obtain a first coarse-grain feature map of the i th layer level; and wherein the fusing the first down-sampling feature map of the i th layer level and the first up-sampling feature map of the i th layer level to obtain a first fusion feature map of the i th layer level comprises fusing the first coarse-grain feature map of the i th layer level and the first up-sampling feature map of the i th layer level to obtain the first fusion feature map of the i th layer level. 4 . The method according to claim 1 , wherein the (i−1) th channel layer unit comprises M first processing layer combinations connected in cascade, each first processing layer combination comprises a first processing layer and a second processing layer connected in cascade, each first processing layer comprises Q pooling layers connected in parallel, and each second processing layer comprises U first convolution layers connected in cascade, where M, Q and U are integers greater than or equal to 1; and wherein the processing a first down-sampling feature map of the (i−1) th layer level by using an (i−1) th channel layer unit, so as to obtain a channel weight feature map of the (i−1) th layer level comprises: processing a first down-sampling feature map of the (i−1) th layer level by using the M first processing layer combinations connected in cascade of the (i−1) th channel layer unit, so as to obtain first intermediate feature maps respectively corresponding to the Q first processing layers connected in parallel of the (i−1) th layer level; obtaining a first gating map of the (i−1) th layer level according to the Q first intermediate feature maps of the (i−1) th layer level; performing a dot multiplication on the first down-sampling feature map of the (i−1) th layer level and the first gating map of the (i−1) th layer level to obtain a second intermediate feature map of the (i−1) th layer level; and obtaining the channel weight feature map of the (i−1) th layer level according to the first down-sampling feature map and the second intermediate feature map of the (i−1) th layer level. 5 . The method according to claim 1 , wherein the fine-grain layer unit comprises P second processing layer combinations connected in parallel, each second processing layer combination comprises V third processing layers connected in parallel, and each third processing layer comprises S second convolution layers connected in cascade, where P, V and S are integers greater than or equal to 1; and wherein the processing a channel weight feature map of an N th layer level by using the fine-grain layer unit, so as to obtain a first fine-grain feature map of the N th layer level comprises: processing the channel weight feature map of the N th layer level by using the P second processing layer combinations connected in parallel,

Assignees

Inventors

Classifications

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Music notations · CPC title

  • Correcting image deformation, e.g. trapezoidal deformation caused by perspective · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • using machine learning, e.g. neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12518503B2 cover?
A method of rectifying a text image, a training method, an electronic device, and a medium, which relate to a field of an artificial intelligence technology, in particular to fields of computer vision, deep learning technology, intelligent transportation and high-precision maps. An exemplary implementation includes: performing, based on a gating strategy, a plurality of first layer-wise process…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/243. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).