Multi-resolution convolutional neural networks for sequence modeling
US-2020151250-A1 · May 14, 2020 · US
US12014275B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12014275-B2 |
| Application number | US-202017081758-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 27, 2020 |
| Priority date | Mar 29, 2019 |
| Publication date | Jun 18, 2024 |
| Grant date | Jun 18, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for text recognition, an electronic device and a storage medium are provided. The method includes: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, each of the plurality of semantic vectors corresponds to one of a plurality of characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence.
Opening claim text (preview).
The invention claimed is: 1. A method for text recognition, comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vector of the target semantic vector; performing encoding processing on the priori information of the target semantic vector through at least one second convolutional layer of the convolutional neutral network to obtain a second vector corresponding to the priori information; and determining the weight parameter based on the first vector and the second vector; wherein the performing encoding processing on the priori information comprises: responsive to the priori information comprising a text recognition result corresponding to a previous semantic vector of the target semantic vector, performing word embedding processing on the text recognition result corresponding to the previous semantic vector to obtain a feature vector corresponding to the priori information; and encoding the feature vector through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 2. The method of claim 1 , wherein the performing encoding processing on the priori information comprises: encoding an initial vector corresponding to a start character in the priori information through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 3. The method of claim 1 , wherein the determining a text recognition result corresponding to the target semantic vector comprises: obtaining an attention distribution vector corresponding to the target semantic vector based on the weight parameter and the target semantic vector; and decoding the attention distribution vector through at least one de-convolutional layer of the convolutional neutral network to determine the text recognition result corresponding to the target semantic vector. 4. The method of claim 1 , wherein the performing feature extraction processing comprises: performing feature extraction on the image to be detected to obtain feature information; and performing down-sampling processing on the feature information to obtain the plurality of semantic vectors. 5. An electronic device, comprising: a processor; and a memory, configured to store instructions that, when executed by the processor, cause the processor to perform the following operations comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vector of the target semantic vector; performing encoding processing on the priori information of the target semantic vector through at least one second convolutional layer of the convolutional neutral network to obtain a second vector corresponding to the priori information; and determining the weight parameter based on the first vector and the second vector; wherein the performing encoding processing on the priori information comprises: responsive to the priori information comprising a text recognition result corresponding to a previous semantic vector of the target semantic vector, performing word embedding processing on the text recognition result corresponding to the previous semantic vector to obtain a feature vector corresponding to the priori information; and encoding the feature vector through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 6. The electronic device of claim 5 , wherein the processor is configured to: encode an initial vector corresponding to a start character in the priori information through the at least one second convolutional layer of the convolutional neutral network to obtain the second vector. 7. The electronic device of claim 5 , wherein the processor is configured to: obtain an attention distribution vector corresponding to the target semantic vector based on the weight parameter and the target semantic vector; and decode the attention distribution vector through at least one de-convolutional layer of the convolutional neutral network to determine the text recognition result corresponding to the target semantic vector. 8. The electronic device of claim 5 , wherein the processor is configured to: perform feature extraction on the image to be detected to obtain feature information; and perform down-sampling processing on the feature information to obtain the plurality of semantic vectors. 9. A non-transitory computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor of an electronic device, cause the processor to perform the following operations comprising: performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein each of the plurality of semantic vectors corresponds to a respective one of multiple characters of a text sequence in the image to be detected; and sequentially performing recognition processing on the plurality of semantic vectors through a convolutional neutral network to obtain a recognition result of the text sequence, wherein the sequentially performing comprises: processing priori information of a target semantic vector through the convolutional neutral network to obtain a weight parameter of the target semantic vector, wherein the target semantic vector is one of the plurality of semantic vectors; and determining a text recognition result corresponding to the target semantic vector according to the weight parameter and the target semantic vector; wherein the processing priori information comprises: performing encoding processing on the target semantic vector through at least one first convolutional layer of the convolutional neutral network to obtain a first vecto
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
using neural networks · CPC title
Extraction of image or video features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.