Method for text recognition, electronic device and storage medium

US12014275B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12014275-B2
Application numberUS-202017081758-A
CountryUS
Kind codeB2
Filing dateOct 27, 2020
Priority dateMar 29, 2019
Publication dateJun 18, 2024
Grant dateJun 18, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for text recognition, an electronic device and a storage medium are provided. The method includes: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, each of the plurality of semantic vectors corresponds to one of a plurality of characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for text recognition, comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vector of the target semantic vector; performing encoding processing on the priori information of the target semantic vector through at least one second convolutional layer of the convolutional neutral network to obtain a second vector corresponding to the priori information; and determining the weight parameter based on the first vector and the second vector; wherein the performing encoding processing on the priori information comprises: responsive to the priori information comprising a text recognition result corresponding to a previous semantic vector of the target semantic vector, performing word embedding processing on the text recognition result corresponding to the previous semantic vector to obtain a feature vector corresponding to the priori information; and encoding the feature vector through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 2. The method of claim 1 , wherein the performing encoding processing on the priori information comprises: encoding an initial vector corresponding to a start character in the priori information through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 3. The method of claim 1 , wherein the determining a text recognition result corresponding to the target semantic vector comprises: obtaining an attention distribution vector corresponding to the target semantic vector based on the weight parameter and the target semantic vector; and decoding the attention distribution vector through at least one de-convolutional layer of the convolutional neutral network to determine the text recognition result corresponding to the target semantic vector. 4. The method of claim 1 , wherein the performing feature extraction processing comprises: performing feature extraction on the image to be detected to obtain feature information; and performing down-sampling processing on the feature information to obtain the plurality of semantic vectors. 5. An electronic device, comprising: a processor; and a memory, configured to store instructions that, when executed by the processor, cause the processor to perform the following operations comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vector of the target semantic vector; performing encoding processing on the priori information of the target semantic vector through at least one second convolutional layer of the convolutional neutral network to obtain a second vector corresponding to the priori information; and determining the weight parameter based on the first vector and the second vector; wherein the performing encoding processing on the priori information comprises: responsive to the priori information comprising a text recognition result corresponding to a previous semantic vector of the target semantic vector, performing word embedding processing on the text recognition result corresponding to the previous semantic vector to obtain a feature vector corresponding to the priori information; and encoding the feature vector through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 6. The electronic device of claim 5 , wherein the processor is configured to: encode an initial vector corresponding to a start character in the priori information through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 7. The electronic device of claim 5 , wherein the processor is configured to: obtain an attention distribution vector corresponding to the target semantic vector based on the weight parameter and the target semantic vector; and decode the attention distribution vector through at least one de-convolutional layer of the convolutional neutral network to determine the text recognition result corresponding to the target semantic vector. 8. The electronic device of claim 5 , wherein the processor is configured to: perform feature extraction on the image to be detected to obtain feature information; and perform down-sampling processing on the feature information to obtain the plurality of semantic vectors. 9. A non-transitory computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor of an electronic device, cause the processor to perform the following operations comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vecto

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • using neural networks · CPC title

  • Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12014275B2 cover?
A method for text recognition, an electronic device and a storage medium are provided. The method includes: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, each of the plurality of semantic vectors corresponds to one of a plurality of characters of a text sequence in the image to be detected; and sequentially performing recognition …
Who is the assignee on this patent?
Beijing Sensetime Tech Development Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).