Method and apparatus for detecting text

US10762376B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10762376-B2
Application numberUS-201816175590-A
CountryUS
Kind codeB2
Filing dateOct 30, 2018
Priority dateJan 30, 2018
Publication dateSep 1, 2020
Grant dateSep 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and apparatus for detecting text are provided. The method includes: extracting features of a to-be-detected image; predicting using a character detection network a probability of each pixel point in the to-be-detected image being a character pixel point, and position information of each pixel point relative to a bounding box of a character including the pixel point when the pixel point is the character pixel point; determining position information of bounding boxes of candidate characters based on the prediction result of the character detection network; inputting the extracted features into a character map network, converting a feature map outputted by the character map network, and generating character vectors; determining a neighboring candidate character of each candidate character, and connecting each candidate character with an associated neighboring candidate character to form a character set; and determining a character area of the to-be-detected image.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting text, the method comprising: extracting features of a to-be-detected image in a plurality of abstract levels using a feature extraction network of a text detection model; predicting, using a character detection network of the text detection mode based on the extracted features of the to-be-detected image, a probability of each pixel point in the to-be-detected image being a character pixel point, and position information of the each pixel point relative to a bounding box of a character including the each pixel point when the each pixel point is the character pixel point; determining position information of bounding boxes of candidate characters, based on the probability of the each pixel point being the character pixel point and the position information of the each pixel point relative to the bounding box of the character including the each pixel point when the each pixel point is the character pixel point; inputting the extracted features of the to-be-detected image into a character mapping network of the text detection model, converting a feature map outputted by the character mapping network based on the position information of the bounding boxes of the candidate characters, and generating character vectors characterizing features of the candidate characters; determining a neighboring candidate character of each candidate character in the to-be-detected image based on the position information of the bounding boxes of the candidate characters, and connecting the each candidate character with an associated neighboring candidate character to form a character set based on a degree of difference between the each candidate character and the neighboring candidate character, the degree of difference being calculated using the character vectors; and determining a character area of the to-be-detected image based on the position information of the bounding boxes of the candidate characters in the character set. 2. The method according to claim 1 , wherein the extracting features of a to-be-detected image in a plurality of abstract levels using a feature extraction network of a text detection model comprises: inputting the to-be-detected image into the feature extraction network to extract outputs from a plurality of different convolutional layers of the feature extraction network, and using the outputs as features in the plurality of abstract levels; and splicing the features in the plurality of abstract levels, or processing the features in the plurality of abstract levels using a feature pyramid network, to generate the features of the to-be-detected image. 3. The method according to claim 1 , wherein the position information of the each pixel point relative to the bounding box of the character including the each pixel point when the each pixel point is the character pixel point comprises: offsets of coordinates of the each pixel point relative to coordinates of two vertexes on a diagonal line of a rectangular bounding box of the character including the each pixel point when the each pixel point is the character pixel point. 4. The method according to claim 3 , wherein the determining position information of bounding boxes of candidate characters based on the probability of the each pixel point being the character pixel point, and the position information of the each pixel point relative to the bounding box of the character including the each pixel point when the each pixel point is the character pixel point comprises: determining pixel points having the probability greater than a preset probability threshold as character pixel points; determining, based on the offsets of the coordinates of each of the determined character pixel points relative to the coordinates of the two vertexes on the diagonal line of the rectangular bounding box of the character including the each character pixel point, coordinates of bounding boxes of characters positioned by the character pixel points; and filtering out coordinates of a bounding box of a repeatedly positioned character from the coordinates of the bounding boxes of the characters positioned by the character pixel points using a non-maximal value suppression method, to obtain coordinates of the bounding boxes of the candidate characters. 5. The method according to claim 1 , wherein the determining a neighboring candidate character of each candidate character in the to-be-detected image based on the position information of the bounding box of the each candidate character comprises: classifying the candidate characters using a k-nearest neighbors algorithm based on the position information of the bounding boxes of the candidate characters, and determining the neighboring candidate character of the each candidate character based on a classification result; and the connecting the each candidate character with the associated neighboring candidate character to form a character set based on a degree of difference between the each candidate character and the neighboring candidate character calculated using the character vector comprises: calculating a Euclidean distance between a character vector of the each candidate character and a character vector of each of the neighboring candidate character, and using the Euclidean distance as the degree of difference between the each candidate character and the each neighboring candidate character; and using a neighboring candidate character having the degree of the difference smaller than a preset difference degree threshold as a neighboring candidate character associated with the each candidate character, and connecting the each candidate character with the associated neighboring candidate character to form the character set. 6. The method according to claim 1 , wherein the determining a character area of the to-be-detected image based on the position information of the bounding box of the each candidate character in the character set comprises: drawing a surrounding line surrounding all characters in the character set to form the character area of the to-be-detected image based on the position information of the bounding boxes of the candidate characters in the character set. 7. The method according to claim 1 , further comprising: training the text detection model using a machine learning method based on a sample image. 8. The method according to claim 7 , wherein the training the text detection model using a machine learning method based on a sample image comprises: acquiring the sample image including characters marked with bounding boxes; inputting the sample image into the text detection model to predict a character area in the sample image, and to obtain a prediction result on whether a pixel point in the sample image is a character pixel point, a prediction result on position information of a bounding box of a character including the character pixel point in the sample image, and a prediction result on a character set in the sample image; and calculating a value of a preset loss function, calculating a gradient of each parameter in the text detection model relative to the preset loss function, and updating the each parameter of the model using a back propagation algorithm, until the value of the preset loss function meets a preset convergence condition, wherein the preset loss function comprises a classification loss function, a bounding box regression loss function, and a character connecting loss function; wherein the value of the classification loss function is used for characterizing a difference between a prediction result of the character detection network on whether the pixel point in the sample image is the character pixel point and a marked result on whether the pixel point in the sample image is the character pi

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques · CPC title

  • using recognition of characters or words · CPC title

  • Distances to closest patterns, e.g. nearest neighbour classification · CPC title

  • Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10762376B2 cover?
A method and apparatus for detecting text are provided. The method includes: extracting features of a to-be-detected image; predicting using a character detection network a probability of each pixel point in the to-be-detected image being a character pixel point, and position information of each pixel point relative to a bounding box of a character including the pixel point when the pixel point…
Who is the assignee on this patent?
Baidu online network technology beijing co ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).