Model training method and apparatus for image recognition, network device, and storage medium

US12169875B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12169875-B2
Application numberUS-202017083180-A
CountryUS
Kind codeB2
Filing dateOct 28, 2020
Priority dateOct 10, 2018
Publication dateDec 17, 2024
Grant dateDec 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A model training method and apparatus for image recognition, and a non-transitory storage medium are provided. The model training method includes: obtaining a multi-label image training set including a plurality of training images each annotated with a plurality of sample labels; selecting target training images from the multi-label image training set for training a current model; performing label prediction on each target training image using the current model, to obtain a plurality of predicted labels of the each target training image; obtaining a cross-entropy loss function corresponding to the plurality of sample labels of the each target training image, a positive label loss being greater than a negative label loss and having a weight greater than 1; converging the predicted labels and the sample labels of the each target training image according to the cross-entropy loss function, and updating parameters of the current model, to obtain a trained model.

First claim

Opening claim text (preview).

What is claimed is: 1. A model training method for image recognition, performed by a network device, the method comprising: obtaining a multi-label image training set, the multi-label image training set comprising a plurality of batches of training images, and each training image being annotated with a plurality of sample labels; performing a plurality of times of batch training on an image recognition model based on the plurality of batches of training images, comprising: for a current batch training: selecting target training images of a current batch from the multi-label image training set for training a current model of the image recognition model; performing label prediction on each target training image by using the current model, to obtain a plurality of predicted labels of the each target training image; obtaining a first training image overall type corresponding to each sample label of target training images of an adjacent batch training, and a number of times that training images having labels the same as the sample label occur successively within the adjacent batch training, the first training image overall type corresponding to the sample label indicating whether one or more successive training images having labels the same as the sample label exist in the adjacent batch training; obtaining a second training image overall type corresponding to each sample label of the target training images of the current batch training, the second training image overall type corresponding to the each sample label indicating whether one or more successive training images having labels the same as the sample label exist in the current batch training; obtaining a cross-entropy loss function corresponding to the plurality of sample labels of the each target training image and updating a cross-entropy loss attenuation parameter of the cross-entropy loss function according to the first training image overall type, the second training image overall type and the number of times, a positive label loss in the cross-entropy loss function being provided with a weight greater than 1, and the positive label loss is greater than a negative label loss; and converging the predicted labels and the sample labels of the each target training image according to the cross-entropy loss function to update parameters of the current model, to obtain a trained model of the image recognition model corresponding to the current batch training, wherein: the method further comprises: before the performing label prediction on each target training image by using the current model, extracting a corresponding regional image from the target training image; scaling the regional image to a preset size, to obtain a scaled image; and performing random disturbance processing on the scaled image, to obtain a preprocessed training image; and the performing label prediction on each target training image by using the current model comprises: performing label prediction on each preprocessed training image by using the current model. 2. The model training method according to claim 1 , wherein the updating the cross-entropy loss attenuation parameter according to the first training image overall type, the second training image overall type and the number of times comprises: comparing the first training image overall type with the second training image overall type, to obtain a comparison result; obtaining, according to the comparison result and the number of times, a target number of times that current training image having the sample label occur successively in the current batch training; and updating the cross-entropy loss attenuation parameter according to the target number of times, to obtain an updated cross-entropy loss function. 3. The model training method according to claim 1 , wherein the current model comprises an output layer, the output layer comprising a plurality of output functions; and the performing label prediction on each target training image by using the current model, to obtain a plurality of predicted labels of the each target training image comprises: for each sample label of the each target training image, updating, according to a preset processing probability, a parameter in an output function corresponding to the sample label when the target training images are all negative training images without the sample label, to obtain an updated model; and performing label prediction on the each target training image by using the updated model, to obtain the plurality of predicted labels of the each target training image. 4. The model training method according to claim 1 , wherein the performing label prediction on each target training image by using the current model, to obtain a plurality of predicted labels of the each target training image comprises: for each sample label of the each target training image, randomly downsampling negative training images without the sample label in the target training images when a positive training image with the sample label exists in the target training images, to obtain downsampled target training images; and performing label prediction on the downsampled target training images by using the current model, to obtain the plurality of prediction labels of the each target training image. 5. The model training method according to claim 4 , wherein the randomly downsampling negative training images without the sample label in the target training images comprises: randomly downsampling the negative training images without the sample label in the target training images according to a preset positive-negative training image ratio corresponding to the sample label. 6. The model training method according to claim 1 , wherein the performing random disturbance processing on the scaled image comprises at least one of: flipping the scaled image horizontally according to a first processing probability, to obtain a flipped image; rotating the scaled image with a random angle according to a second processing probability, to obtain a rotated image, the random angle being an angle randomly selected from a predetermined angle range; separately performing disturbance processing on one or more attributes of the scaled image according to a third processing probability, to obtain a processed image; or scaling a pixel value of the scaled image to a preset pixel value range. 7. The model training method according to claim 1 , wherein the current model comprises a deep residual network model; the deep residual network model comprises a plurality of residual blocks that are sequentially connected, each residual block comprises a convolution branch and a residual branch, a convolution kernel size of a first convolutional layer in the convolution branch is less than a convolution kernel size of a second convolutional layer following the first convolutional layer, and a convolution step size of the second convolutional layer is greater than a convolution step size of the first convolutional layer and less than a convolution kernel width of the second convolutional layer. 8. The model training method according to claim 1 , further comprising: replacing a plurality of output functions in an output layer of the trained model with single-label classifiers, to obtain a changed network model for each sample label; performing adaptive adjustment on a learning rate of each layer in the changed network model according to a principle that a learning rate of a higher layer is greater than a learning rate of a lower layer, to obtain an adjusted network model; and training parameters of the adjusted network model according to a single-label training image set, to obtain a single-label image classification model for each sample label. 9. The mo

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Transfer learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using neural networks · CPC title

  • Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12169875B2 cover?
A model training method and apparatus for image recognition, and a non-transitory storage medium are provided. The model training method includes: obtaining a multi-label image training set including a plurality of training images each annotated with a plurality of sample labels; selecting target training images from the multi-label image training set for training a current model; performing la…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F18/214. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).