End-to-end text recognition method and apparatus, computer device and readable medium

US11210546B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11210546-B2
Application numberUS-202016822085-A
CountryUS
Kind codeB2
Filing dateMar 18, 2020
Priority dateJul 5, 2019
Publication dateDec 28, 2021
Grant dateDec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure proposes an end-to-end text recognition method and apparatus, computer device and readable medium. The method comprises: obtaining a to-be-recognized picture containing a text region; recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model; the end-to-end text recognition model comprising a region of interest perspective transformation processing module for performing perspective transformation processing for the text region. The technical solution of the present disclosure does not need to serially arrange a plurality of steps, and may avoid introducing the accumulated errors and may effectively improve the accuracy of the text recognition.

First claim

Opening claim text (preview).

What is claimed is: 1. An end-to-end text recognition method, wherein the method comprises: obtaining a to-be-recognized picture containing a text region; recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model; the end-to-end text recognition model comprising a region of interest perspective transformation processing module for performing perspective transformation processing for the text region, a global feature obtaining module, a region detection module and a text recognition module, wherein the end-to-end text recognition model is trained by: collecting a plurality of training pictures, and marking a real position of a text region in each of the plurality of training pictures and real text content included in the text region in each of the plurality of training pictures, inputting a training picture of the plurality of training pictures into the end-to-end text recognition model, the end-to-end text recognition model outputting a predicted position of a text region in the training picture and predicted text content included in the text region; detecting whether the predicted position of the text region in the training picture is consistent with the real position of the text region in the training picture, and whether the predicted text content included in the text region is consistent with the real text content included in the text region; if the predicted position of the text region is not consistent with the real position, and the predicted text content included in the text region is not consistent with the real text content, adjusting one or more parameters for the global feature obtaining module, the region of interest perspective transformation processing module, the region detection module and the text recognition module in the end-to-end text recognition model, so that the predicted position of the text region tends to be consistent with the real position, and the predicted text content included in the text region tends to be consistent with the real text content; repeating the above steps until the number of times of training reaches a preset threshold of number of times, or in trainings of a preset successive number of times, the predicted positions of the text regions of training pictures output by the end-to-end text recognition model are consistent with the real positions respectively, and the predicted text contents included in the text regions are always consistent with the real text content respectively. 2. The method according to claim 1 , wherein the recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model comprises: inputting the to-be-recognized picture into the end-to-end text recognition model, the global feature obtaining module obtaining and outputting a global feature expression of the to-be-recognized picture; the region detection module detecting the position of the text region according to the global feature expression, and outputting the position; the region of interest perspective transformation processing module obtaining a feature expression of the text region from the global feature expression according to the position of the text region, and performing perspective transformation processing for the feature expression of the text region to obtain an aligned region of interest feature expression; the end-to-end text recognition module recognizing the text content included in the text region based on a spatial attention mechanism and according to the aligned region of interest feature expression, and output the text content. 3. The method according to claim 1 , wherein the to-be-recognized picture is processed by a backbone network based on a full convolution to obtain a global feature expression of the to-be-recognized picture. 4. The method according to claim 3 , wherein the position of the text region is represented by positional coordinates of four vertexes of a quadrangle obtained by using a full convolution and a Non-Maximum Suppression algorithm. 5. The method according to claim 4 , wherein performing perspective transformation processing for the text region comprises: performing perspective transformation processing for the text region to obtain a plurality of region of interest feature expressions with a fixed height and variable lengths. 6. The method according to claim 5 , wherein performing perspective transformation processing comprises calculation according to the following equation: ( u v w ) = T θ ⁡ ( x k t y k t 1 ) where T θ is a matrix of perspective transformation; ( u v w )   represents a vector representation of input coordinates, ( x k t y k t 1 )   represents a vector representation of output coordinates; where (x k t ,y k t ) are real output coordinates, k represents the k th pixel, and the superscript t is a mark of the output coordinates and used to differentiate from the input coordinates (x k s ,y k s ), where x k s =u/w, y k s =v/w, where u, v, w are intermediate variables; k represents any k th pixel, ∀k=1,2, .

Assignees

Inventors

Classifications

  • Classification techniques · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Character recognition · CPC title

  • Text, e.g. of license plates, overlay texts or captions on TV images · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210546B2 cover?
The present disclosure proposes an end-to-end text recognition method and apparatus, computer device and readable medium. The method comprises: obtaining a to-be-recognized picture containing a text region; recognizing a position of the text region in the to-be-recognized picture and text content included in the text region with a pre-trained end-to-end text recognition model; the end-to-end te…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).