Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V30/148. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium

US12424010B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12424010-B2
Application number	US-202318168759-A
Country	US
Kind code	B2
Filing date	Feb 14, 2023
Priority date	Aug 16, 2022
Publication date	Sep 23, 2025
Grant date	Sep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.

First claim

Opening claim text (preview).

What is claimed is: 1. A character recognition method being applied to a server and comprising: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; wherein the first training set comprises a first sub-sample image with a visible attribute, and the second training set comprises a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder; wherein the performing the self-supervised training on the to-be-trained encoder by taking the second training set as the tag of the first training set, to obtain the target encoder comprises: initializing the to-be-trained encoder to obtain a first encoder; extracting, based on the first encoder, a first visual feature of the first sub-sample image in the first training set and a second visual feature of the second sub-sample image in the second training set; performing mask query calculation on the first visual feature, to obtain a third visual feature; and updating the first encoder according to a feature error between the third visual feature and the second visual feature until the feature error satisfies a first error condition, and determining a latest updated first encoder as the target encoder; wherein the updating the first encoder according to the feature error between the third visual feature and the second visual feature until the feature error satisfies the first error condition, and the determining the latest updated first encoder as the target encoder comprise: initializing a to-be-trained decoder to obtain a first decoder; determining, based on the first decoder, an image error generated when image reconstruction is performed on the third visual feature; determining the feature error between the third visual feature and the second visual feature; and updating the first encoder based on the feature error and the image error and updating the first decoder based on the image error until the feature error satisfies the first error condition and the image error satisfies a second error condition, and determining a latest obtained first encoder as the target encoder; receiving a to-be-recognized image sent by a terminal device, and performing, based on the target encoder and the updated first decoder, image features extraction on the to-be-recognized image to obtain a target text; and sending the target text to the terminal device. 2. The method according to claim 1 , wherein the determining, based on the first decoder, the image error generated when the image reconstruction is performed on the third visual feature comprises: performing decoding calculation processing on the third visual feature by using the first decoder, to obtain a first decoded feature; and obtaining the image error according to an image reconstruction result of the first decoded feature. 3. The method according to claim 2 , wherein the obtaining the image error according to the image reconstruction result of the first decoded feature comprises: performing image reconstruction processing on the first decoded feature, to obtain a first prediction result; and performing image error calculation by using the second sub-sample image and the first prediction result, to obtain the image error. 4. The method according to claim 1 , further comprising: dividing, based on a mask setting strategy, at least two query vectors into a first query vector and a second query vector; wherein the mask setting strategy comprises mask data generated based on a preset first mask ratio; the at least two query vectors are spatial transformation vectors corresponding to a basis character string; the performing the mask query calculation on the first visual feature, to obtain the third visual feature comprises: obtaining, based on feature prediction calculation of the second query vector and the first visual feature, a feature vector corresponding to an occurrence probability of the first visual feature in the second query vector; and performing vector combination on the feature vector corresponding to the first visual feature, to obtain the third visual feature. 5. The method according to claim 1 , wherein the dividing the at least two sub-sample images into the first training set and the second training set comprises: dividing the at least two sub-sample images into the first training set and the second training set by using a mask setting strategy. 6. A character recognition apparatus comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor to cause the at least one processor to perform the method according to claim 1 . 7. A non-transitory computer-readable storage medium storing a computer instruction, wherein the computer instruction is used to cause a computer to perform the method according to claim 1 . 8. A character recognition model training method comprising: partitioning a synthetic sample into at least two sub-synthetic images, wherein the synthetic sample comprises a synthetic text tag; dividing the at least two sub-synthetic images as a first synthetic set and a second synthetic set; wherein the first synthetic set comprises a first sub-synthetic image with a visible attribute, and the second synthetic set comprises a second sub-synthetic image with an invisible attribute; and performing, based on the first synthetic set and the second synthetic set, supervised training on a to-be-trained decoder to obtain a target decoder corresponding to the to-be-trained decoder; wherein the performing, based on the first synthetic set and the second synthetic set, the supervised training on the to-be-trained decoder to obtain the target decoder corresponding to the to-be-trained decoder comprises: extracting, based on a target encoder, a first feature sequence of the first sub-synthetic image in the first synthetic set; wherein the target encoder is obtained by performing following steps: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; wherein the first training set comprises a first sub-sample image with a visible attribute, and the second training set comprises a second sub-sample image with an invisible attribute; and performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain the target encoder; performing feature completion on the first feature sequence according to an image position, in the synthetic sample, of the second sub-synthetic image in the second synthetic set, to obtain a second feature sequence; and training, by taking that a predictive text of the second feature sequence predicted by the to-be-trained decoder is the same as a synthetic text of the second sub-synthetic image in the synthetic text tag as a training objective, to obtain the target decoder corresponding to the to-be-trained decoder. 9. The method according to claim 8 , wherein the training, by taking that the predictive text of the second feature sequence predicted by the to-be-trained decoder is same as the synthetic text of the second sub-synthetic image in the synthetic text tag as the training objective, to obtain the target decoder corresponding to the to-be-trained decoder comprises: initializing the to-be-trained decoder to obtain a second decoder

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06F16/53
Querying · CPC title
G06T2207/20021
Dividing image into blocks, subimages or windows · CPC title
G06T2207/20112
Image segmentation details · CPC title
G06T2207/20081
Training; Learning · CPC title
G06F16/83
Querying · CPC title

Patent family

Related publications grouped by family.

View patent family 84738387

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12424010B2 cover?: The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition …
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V30/148. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Unified Vision and Dialogue Transformer with BERT

Training and inferencing using a neural network to predict orientations of objects in images

System and method for controllable machine text generation architecture

Target detection in latent space

Frequently asked questions