Who is the assignee on this patent?

Beijing Sensetime Tech Development Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method for text recognition, electronic device and storage medium

US12014275B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12014275-B2
Application number	US-202017081758-A
Country	US
Kind code	B2
Filing date	Oct 27, 2020
Priority date	Mar 29, 2019
Publication date	Jun 18, 2024
Grant date	Jun 18, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for text recognition, an electronic device and a storage medium are provided. The method includes: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, each of the plurality of semantic vectors corresponds to one of a plurality of characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for text recognition, comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vector of the target semantic vector; performing encoding processing on the priori information of the target semantic vector through at least one second convolutional layer of the convolutional neutral network to obtain a second vector corresponding to the priori information; and determining the weight parameter based on the first vector and the second vector; wherein the performing encoding processing on the priori information comprises: responsive to the priori information comprising a text recognition result corresponding to a previous semantic vector of the target semantic vector, performing word embedding processing on the text recognition result corresponding to the previous semantic vector to obtain a feature vector corresponding to the priori information; and encoding the feature vector through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 2. The method of claim 1 , wherein the performing encoding processing on the priori information comprises: encoding an initial vector corresponding to a start character in the priori information through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 3. The method of claim 1 , wherein the determining a text recognition result corresponding to the target semantic vector comprises: obtaining an attention distribution vector corresponding to the target semantic vector based on the weight parameter and the target semantic vector; and decoding the attention distribution vector through at least one de-convolutional layer of the convolutional neutral network to determine the text recognition result corresponding to the target semantic vector. 4. The method of claim 1 , wherein the performing feature extraction processing comprises: performing feature extraction on the image to be detected to obtain feature information; and performing down-sampling processing on the feature information to obtain the plurality of semantic vectors. 5. An electronic device, comprising: a processor; and a memory, configured to store instructions that, when executed by the processor, cause the processor to perform the following operations comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vector of the target semantic vector; performing encoding processing on the priori information of the target semantic vector through at least one second convolutional layer of the convolutional neutral network to obtain a second vector corresponding to the priori information; and determining the weight parameter based on the first vector and the second vector; wherein the performing encoding processing on the priori information comprises: responsive to the priori information comprising a text recognition result corresponding to a previous semantic vector of the target semantic vector, performing word embedding processing on the text recognition result corresponding to the previous semantic vector to obtain a feature vector corresponding to the priori information; and encoding the feature vector through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 6. The electronic device of claim 5 , wherein the processor is configured to: encode an initial vector corresponding to a start character in the priori information through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 7. The electronic device of claim 5 , wherein the processor is configured to: obtain an attention distribution vector corresponding to the target semantic vector based on the weight parameter and the target semantic vector; and decode the attention distribution vector through at least one de-convolutional layer of the convolutional neutral network to determine the text recognition result corresponding to the target semantic vector. 8. The electronic device of claim 5 , wherein the processor is configured to: perform feature extraction on the image to be detected to obtain feature information; and perform down-sampling processing on the feature information to obtain the plurality of semantic vectors. 9. A non-transitory computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor of an electronic device, cause the processor to perform the following operations comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vecto

Assignees

Beijing Sensetime Tech Development Co Ltd

Inventors

Liu Xuebo

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/09
Supervised learning · CPC title
G06V10/82
using neural networks · CPC title
G06V10/40
Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

View patent family 72664623

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12014275B2 cover?: A method for text recognition, an electronic device and a storage medium are provided. The method includes: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, each of the plurality of semantic vectors corresponds to one of a plurality of characters of a text sequence in the image to be detected; and sequentially performing recognition …
Who is the assignee on this patent?: Beijing Sensetime Tech Development Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).