Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V30/19147. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Vision processing and model training method, device, storage medium and program product

US12374140B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12374140-B2
Application number	US-202318170902-A
Country	US
Kind code	B2
Filing date	Feb 17, 2023
Priority date	Feb 25, 2022
Publication date	Jul 29, 2025
Grant date	Jul 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a vision processing and model training method, device, storage medium and program product. A specific implementation solution is as follows: establishing an image classification network with the same backbone network as the vision model, performing a self-monitoring training on the image classification network by using an unlabeled first data set; initializing a weight of a backbone network of the vision model according to a weight of a backbone network of the trained image classification network to obtain a pre-training model, the structure of the pre-training model being consistent with that of the vision model, and optimize the weight of the backbone network by using real data set in a current computer vision task scenario, so as to be more suitable for the current computer vision task; then, training the pre-training model by using a labeled second data set to obtain a trained vision model.

First claim

Opening claim text (preview).

What is claimed is: 1. A vision model training method, executed by a processor, comprising: establishing an image classification network, wherein the image classification network has the same backbone network as a vision model; performing a self-monitoring training on the image classification network by using an unlabeled first data set to obtain a trained image classification network; initializing a weight of the backbone network of the vision model according to a weight of the backbone network of the trained image classification network to obtain a pre-training model; training the pre-training model by using a labeled second data set to obtain a trained vision model; and applying the trained vision model to a computer vision task to perform a corresponding computer vision processing to obtain a processing result, wherein the computer vision task comprises target detection, image segmentation, and text recognition, and wherein performing the self-monitoring training on the image classification network by using the unlabeled first data set to obtain the trained image classification network comprises: obtaining the unlabeled first data set, the first data set comprising a plurality of groups of sample images and direction information of each sample image, wherein each group of sample images comprises a first sample image and a second sample image obtained by rotating the first sample image by a preset angle; extracting an image feature of each sample image in the first data set through the image classification network, and determining a direction prediction result of each sample image according to the image feature; calculating a first loss according to the image features of two sample images whose direction information differs by 180 degrees in the same group of sample images; and calculating a second loss according to real direction information and the direction prediction result of each sample image; and adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss. 2. The method according to claim 1 , wherein the obtaining the unlabeled first data set comprises: obtaining an unlabeled first sample image and determining direction information of the first sample image as 0 degrees; rotating the first sample image by the preset angle to obtain the second sample image, and determining direction information of the second sample image as the preset angle. 3. The method according to claim 2 , wherein the preset angle at least comprises 180 degrees, calculating the first loss according to the image features of two sample images whose direction information differs by 180 degrees in the same group of sample images comprises: calculating the first loss according to a difference between an image feature obtained by rotating an image feature of the first sample image by 180 degrees and an image feature of the second sample image obtained by rotating the first sample image by 180 degrees in each group of sample images. 4. The method according to claim 2 , wherein the preset angle at least comprises a first angle and a second angle, the second angle is equal to the first angle plus 180 degrees, and the first angle is not 0 degrees; calculating the first loss according to the image features of two sample images whose direction information differs by 180 degrees in the same group of sample images comprises: calculating the first loss according to a difference between an image feature obtained by rotating an image feature of a sample image whose direction information is the first angle by 180 degrees and an image feature of a sample image whose direction information is the second angle in the same group of sample images. 5. The method according to claim 2 , wherein obtaining the unlabeled first sample image comprises: obtaining an original image, wherein the original image comprises at least one of a synthetic image and a real image; performing a preprocessing on the original image to obtain a sample image meeting a model training requirement; performing a random data augmentation on the sample image to obtain the first sample image. 6. The method according to claim 3 , wherein obtaining the unlabeled first sample image comprises: obtaining an original image, wherein the original image comprises at least one of a synthetic image and a real image; performing a preprocessing on the original image to obtain a sample image meeting a model training requirement; performing a random data augmentation on the sample image to obtain the first sample image. 7. The method according to claim 4 , wherein obtaining the unlabeled first sample image comprises: obtaining an original image, wherein the original image comprises at least one of a synthetic image and a real image; performing a preprocessing on the original image to obtain a sample image meeting a model training requirement; performing a random data augmentation on the sample image to obtain the first sample image. 8. The method according to claim 5 , wherein if the vision model is applied to a text recognition scenario, performing the preprocessing on the original image to obtain the sample image meeting the model training requirement comprises: performing a text detection on the original image, and extracting an image of a region where text information is located; and performing an image correction on the image of the region where the text information is located to obtain the sample image meeting the model training requirement. 9. The method according to claim 6 , wherein if the vision model is applied to a text recognition scenario, performing the preprocessing on the original image to obtain the sample image meeting the model training requirement comprises: performing a text detection on the original image, and extracting an image of a region where text information is located; and performing an image correction on the image of the region where the text information is located to obtain the sample image meeting the model training requirement. 10. The method according to claim 7 , wherein if the vision model is applied to a text recognition scenario, performing the preprocessing on the original image to obtain the sample image meeting the model training requirement comprises: performing a text detection on the original image, and extracting an image of a region where text information is located; and performing an image correction on the image of the region where the text information is located to obtain the sample image meeting the model training requirement. 11. The method according to claim 1 , wherein adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss comprises: calculating a sum of the first loss and the second loss as a final loss; and adjusting the weight of the backbone network of the image classification network according to the final loss. 12. The method according to claim 2 , wherein adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss comprises: calculating a sum of the first loss and the second loss as a final loss; and adjusting the weight of the backbone network of the image classification network according to the final loss. 13. The method according to claim 3 , wherein adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss comprises: calculating a sum of the first loss and the second loss as a final loss; and adjusting the weight of the backbone network of the image

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06V30/19173
Classification techniques · CPC title
G06V30/18
Extraction of features or characteristics of the image · CPC title
G06V30/16
Image preprocessing · CPC title
G06V30/19147Primary
Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V10/82
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 86896885

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12374140B2 cover?: The present disclosure provides a vision processing and model training method, device, storage medium and program product. A specific implementation solution is as follows: establishing an image classification network with the same backbone network as the vision model, performing a self-monitoring training on the image classification network by using an unlabeled first data set; initializing a …
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V30/19147. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Upsampling and refining segmentation masks

System and method for training a model using localized textual supervision

Method for generating classification model, electronic device, and medium

Content-adaptive non-uniform image downsampling using predictive auxiliary convolutional neural network

Frequently asked questions