Upsampling and refining segmentation masks
US-2023132180-A1 · Apr 27, 2023 · US
US12374140B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12374140-B2 |
| Application number | US-202318170902-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 17, 2023 |
| Priority date | Feb 25, 2022 |
| Publication date | Jul 29, 2025 |
| Grant date | Jul 29, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a vision processing and model training method, device, storage medium and program product. A specific implementation solution is as follows: establishing an image classification network with the same backbone network as the vision model, performing a self-monitoring training on the image classification network by using an unlabeled first data set; initializing a weight of a backbone network of the vision model according to a weight of a backbone network of the trained image classification network to obtain a pre-training model, the structure of the pre-training model being consistent with that of the vision model, and optimize the weight of the backbone network by using real data set in a current computer vision task scenario, so as to be more suitable for the current computer vision task; then, training the pre-training model by using a labeled second data set to obtain a trained vision model.
Opening claim text (preview).
What is claimed is: 1. A vision model training method, executed by a processor, comprising: establishing an image classification network, wherein the image classification network has the same backbone network as a vision model; performing a self-monitoring training on the image classification network by using an unlabeled first data set to obtain a trained image classification network; initializing a weight of the backbone network of the vision model according to a weight of the backbone network of the trained image classification network to obtain a pre-training model; training the pre-training model by using a labeled second data set to obtain a trained vision model; and applying the trained vision model to a computer vision task to perform a corresponding computer vision processing to obtain a processing result, wherein the computer vision task comprises target detection, image segmentation, and text recognition, and wherein performing the self-monitoring training on the image classification network by using the unlabeled first data set to obtain the trained image classification network comprises: obtaining the unlabeled first data set, the first data set comprising a plurality of groups of sample images and direction information of each sample image, wherein each group of sample images comprises a first sample image and a second sample image obtained by rotating the first sample image by a preset angle; extracting an image feature of each sample image in the first data set through the image classification network, and determining a direction prediction result of each sample image according to the image feature; calculating a first loss according to the image features of two sample images whose direction information differs by 180 degrees in the same group of sample images; and calculating a second loss according to real direction information and the direction prediction result of each sample image; and adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss. 2. The method according to claim 1 , wherein the obtaining the unlabeled first data set comprises: obtaining an unlabeled first sample image and determining direction information of the first sample image as 0 degrees; rotating the first sample image by the preset angle to obtain the second sample image, and determining direction information of the second sample image as the preset angle. 3. The method according to claim 2 , wherein the preset angle at least comprises 180 degrees, calculating the first loss according to the image features of two sample images whose direction information differs by 180 degrees in the same group of sample images comprises: calculating the first loss according to a difference between an image feature obtained by rotating an image feature of the first sample image by 180 degrees and an image feature of the second sample image obtained by rotating the first sample image by 180 degrees in each group of sample images. 4. The method according to claim 2 , wherein the preset angle at least comprises a first angle and a second angle, the second angle is equal to the first angle plus 180 degrees, and the first angle is not 0 degrees; calculating the first loss according to the image features of two sample images whose direction information differs by 180 degrees in the same group of sample images comprises: calculating the first loss according to a difference between an image feature obtained by rotating an image feature of a sample image whose direction information is the first angle by 180 degrees and an image feature of a sample image whose direction information is the second angle in the same group of sample images. 5. The method according to claim 2 , wherein obtaining the unlabeled first sample image comprises: obtaining an original image, wherein the original image comprises at least one of a synthetic image and a real image; performing a preprocessing on the original image to obtain a sample image meeting a model training requirement; performing a random data augmentation on the sample image to obtain the first sample image. 6. The method according to claim 3 , wherein obtaining the unlabeled first sample image comprises: obtaining an original image, wherein the original image comprises at least one of a synthetic image and a real image; performing a preprocessing on the original image to obtain a sample image meeting a model training requirement; performing a random data augmentation on the sample image to obtain the first sample image. 7. The method according to claim 4 , wherein obtaining the unlabeled first sample image comprises: obtaining an original image, wherein the original image comprises at least one of a synthetic image and a real image; performing a preprocessing on the original image to obtain a sample image meeting a model training requirement; performing a random data augmentation on the sample image to obtain the first sample image. 8. The method according to claim 5 , wherein if the vision model is applied to a text recognition scenario, performing the preprocessing on the original image to obtain the sample image meeting the model training requirement comprises: performing a text detection on the original image, and extracting an image of a region where text information is located; and performing an image correction on the image of the region where the text information is located to obtain the sample image meeting the model training requirement. 9. The method according to claim 6 , wherein if the vision model is applied to a text recognition scenario, performing the preprocessing on the original image to obtain the sample image meeting the model training requirement comprises: performing a text detection on the original image, and extracting an image of a region where text information is located; and performing an image correction on the image of the region where the text information is located to obtain the sample image meeting the model training requirement. 10. The method according to claim 7 , wherein if the vision model is applied to a text recognition scenario, performing the preprocessing on the original image to obtain the sample image meeting the model training requirement comprises: performing a text detection on the original image, and extracting an image of a region where text information is located; and performing an image correction on the image of the region where the text information is located to obtain the sample image meeting the model training requirement. 11. The method according to claim 1 , wherein adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss comprises: calculating a sum of the first loss and the second loss as a final loss; and adjusting the weight of the backbone network of the image classification network according to the final loss. 12. The method according to claim 2 , wherein adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss comprises: calculating a sum of the first loss and the second loss as a final loss; and adjusting the weight of the backbone network of the image classification network according to the final loss. 13. The method according to claim 3 , wherein adjusting the weight of the backbone network of the image classification network according to the first loss and the second loss comprises: calculating a sum of the first loss and the second loss as a final loss; and adjusting the weight of the backbone network of the image
Classification techniques · CPC title
Extraction of features or characteristics of the image · CPC title
Image preprocessing · CPC title
Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.