Character identification method and device
US-2020311460-A1 · Oct 1, 2020 · US
US11210546B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11210546-B2 |
| Application number | US-202016822085-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 18, 2020 |
| Priority date | Jul 5, 2019 |
| Publication date | Dec 28, 2021 |
| Grant date | Dec 28, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure proposes an end-to-end text recognition method and apparatus, computer device and readable medium. The method comprises: obtaining a to-be-recognized picture containing a text region; recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model; the end-to-end text recognition model comprising a region of interest perspective transformation processing module for performing perspective transformation processing for the text region. The technical solution of the present disclosure does not need to serially arrange a plurality of steps, and may avoid introducing the accumulated errors and may effectively improve the accuracy of the text recognition.
Opening claim text (preview).
What is claimed is: 1. An end-to-end text recognition method, wherein the method comprises: obtaining a to-be-recognized picture containing a text region; recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model; the end-to-end text recognition model comprising a region of interest perspective transformation processing module for performing perspective transformation processing for the text region, a global feature obtaining module, a region detection module and a text recognition module, wherein the end-to-end text recognition model is trained by: collecting a plurality of training pictures, and marking a real position of a text region in each of the plurality of training pictures and real text content included in the text region in each of the plurality of training pictures, inputting a training picture of the plurality of training pictures into the end-to-end text recognition model, the end-to-end text recognition model outputting a predicted position of a text region in the training picture and predicted text content included in the text region; detecting whether the predicted position of the text region in the training picture is consistent with the real position of the text region in the training picture, and whether the predicted text content included in the text region is consistent with the real text content included in the text region; if the predicted position of the text region is not consistent with the real position, and the predicted text content included in the text region is not consistent with the real text content, adjusting one or more parameters for the global feature obtaining module, the region of interest perspective transformation processing module, the region detection module and the text recognition module in the end-to-end text recognition model, so that the predicted position of the text region tends to be consistent with the real position, and the predicted text content included in the text region tends to be consistent with the real text content; repeating the above steps until the number of times of training reaches a preset threshold of number of times, or in trainings of a preset successive number of times, the predicted positions of the text regions of training pictures output by the end-to-end text recognition model are consistent with the real positions respectively, and the predicted text contents included in the text regions are always consistent with the real text content respectively. 2. The method according to claim 1 , wherein the recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model comprises: inputting the to-be-recognized picture into the end-to-end text recognition model, the global feature obtaining module obtaining and outputting a global feature expression of the to-be-recognized picture; the region detection module detecting the position of the text region according to the global feature expression, and outputting the position; the region of interest perspective transformation processing module obtaining a feature expression of the text region from the global feature expression according to the position of the text region, and performing perspective transformation processing for the feature expression of the text region to obtain an aligned region of interest feature expression; the end-to-end text recognition module recognizing the text content included in the text region based on a spatial attention mechanism and according to the aligned region of interest feature expression, and output the text content. 3. The method according to claim 1 , wherein the to-be-recognized picture is processed by a backbone network based on a full convolution to obtain a global feature expression of the to-be-recognized picture. 4. The method according to claim 3 , wherein the position of the text region is represented by positional coordinates of four vertexes of a quadrangle obtained by using a full convolution and a Non-Maximum Suppression algorithm. 5. The method according to claim 4 , wherein performing perspective transformation processing for the text region comprises: performing perspective transformation processing for the text region to obtain a plurality of region of interest feature expressions with a fixed height and variable lengths. 6. The method according to claim 5 , wherein performing perspective transformation processing comprises calculation according to the following equation: ( u v w ) = T θ ( x k t y k t 1 ) where T θ is a matrix of perspective transformation; ( u v w ) represents a vector representation of input coordinates, ( x k t y k t 1 ) represents a vector representation of output coordinates; where (x k t ,y k t ) are real output coordinates, k represents the k th pixel, and the superscript t is a mark of the output coordinates and used to differentiate from the input coordinates (x k s ,y k s ), where x k s =u/w, y k s =v/w, where u, v, w are intermediate variables; k represents any k th pixel, ∀k=1,2, .
Classification techniques · CPC title
using neural networks · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Character recognition · CPC title
Text, e.g. of license plates, overlay texts or captions on TV images · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.