Lightweight model training method, image processing method, electronic device, and storage medium

US12380328B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12380328-B2
Application numberUS-202318108956-A
CountryUS
Kind codeB2
Filing dateFeb 13, 2023
Priority dateAug 30, 2022
Publication dateAug 5, 2025
Grant dateAug 5, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a lightweight model training method, an image processing method, a device and a medium. The lightweight model training method includes: acquiring first and second augmentation probabilities and a target weight adopted in an e-th iteration; performing data augmentation on a data set based on the first and second augmentation probabilities respectively, to obtain first and second data sets; obtaining a first output value of a student model and a second output value of a teacher model based on the first data set; obtaining a third output value and a fourth output value based on the second data set; determining a distillation loss function, a truth-value loss function and a target loss function; training the student model based on the target loss function; and determining a first augmentation probability or target weight to be adopted in an (e+1)-th iteration in a case of e is less than E.

First claim

Opening claim text (preview).

What is claimed is: 1. A lightweight model training method, comprising: acquiring a first augmentation probability, a second augmentation probability and a target weight adopted in an e-th iteration, the target weight being a weight of a distillation loss value, e being a positive integer not greater than E, and E being a maximum quantity of iterations and being a positive integer greater than 1; performing data augmentation on a data set based on the first augmentation probability and the second augmentation probability respectively, to obtain a first data set and a second data set; obtaining a first output value of a student model and a second output value of a teacher model based on the first data set; obtaining a third output value of the student model and a fourth output value of the teacher model based on the second data set, and the student model being a lightweight model; determining a distillation loss function based on the first output value and the second output value; determining a truth-value loss function based on the third output value and the fourth output value; determining a target loss function based on the distillation loss function and the truth-value loss function; training the student model based on the target loss function; and determining a first augmentation probability or target weight to be adopted in an (e+1)-th iteration in a case of e is less than E. 2. The method of claim 1 , further comprising: acquiring a maximum augmentation probability; and determining the second augmentation probability based on the maximum augmentation probability, the maximum quantity of iterations and the first augmentation probability. 3. The method of claim 2 , wherein determining the first augmentation probability to be adopted in the (e+1)-th iteration, comprises: determining the first augmentation probability to be adopted in the (e+1)-th iteration based on the maximum augmentation probability, the maximum quantity of iterations and the first augmentation probability of the e-th iteration. 4. The method of claim 1 , further comprising: acquiring a maximum target weight; wherein determining the target weight to be adopted in the (e+1)-th iteration, comprises: determining the target weight to be adopted in the (e+1)-th iteration based on the maximum target weight, the maximum quantity of iterations, and the target weight of the e-th iteration. 5. The method of claim 1 , wherein determining the target loss function based on the distillation loss function and the truth-value loss function, comprises: determining the distillation loss function as the target loss function in a case of the target weight is not less than the maximum target weight or the distillation loss function is not less than the truth-value loss function; and determining the truth-value loss function as the target loss function in other cases. 6. The method of claim 1 , wherein determining the distillation loss function based on the first output value and the second output value, comprises: determining the distillation loss function according to a formula as follow: l 1=( a+a dft ×2/ E )× L dist ( o 1 s,o 1 t )+(1− a−a dft ×2/ E )× L gt ( o 1 s,gt ); wherein l1 represents the distillation loss function, L dist (o1s,o1t) represents a distillation loss value determined according to the first output value and the second output value, L gt (o1s,gt) represents a truth-value loss value determined according to the first output value and a truth-value, a represents the target weight, a dft represents a maximum target weight, E represents the maximum quantity of iterations, gt represents the truth-value, o1s represents the first output value, and o1t represents the second output value. 7. The method of claim 1 , wherein determining the truth-value loss function based on the third output value and the fourth output value, comprises: determining the truth-value loss function according to a formula as follow: l 2= a×L dist ( o 2 s,o 2 t )+(1− a )× L gt ( o 2 s,gt ); wherein l2 represents the truth-value loss function, L dist (o2s,o2t) represents a distillation loss value determined according to the third output value and the fourth output value, L gt (o2s,gt) represents a truth-value loss value determined according to the third output value and a truth-value, a represents the target weight, gt represents the truth-value, o2s represents the third output value, and o2t represents the fourth output value. 8. An image processing method, comprising: receiving an image to be processed in a target scene; and inputting the image to be processed into a student model, to acquire a processed result of the image to be processed output by the student model; wherein the student model is obtained by adopting the lightweight model training method of claim 1 . 9. The method of claim 8 , wherein receiving the image to be processed in the target scene, comprises at least one of: acquiring an image to be processed in an image classification scene; acquiring an image to be processed in an image recognition scene; or acquiring an image to be processed in a target detection scene. 10. An electronic device, comprising: at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores an instruction executable by the at least one processor, and the instruction, when executed by the at least one processor, enables the at least one processor to execute operations, comprising: acquiring a first augmentation probability, a second augmentation probability and a target weight adopted in an e-th iteration, the target weight being a weight of a distillation loss value, e being a positive integer not greater than E, and E being a maximum quantity of iterations and being a positive integer greater than 1; performing data augmentation on a data set based on the first augmentation probability and the second augmentation probability respectively, to obtain a first data set and a second data set; obtaining a first output value of a student model and a second output value of a teacher model based on the first data set; obtaining a third output value of the student model and a fourth output value of the teacher model based on the second data set, and the student model being a lightweight model; determining a distillation loss function based on the first output value and the second output value; determining a truth-value loss function based on the third output value and the fourth output value; determining a target loss function based on the distillation loss function and the truth-value loss function; training the student model based on the target loss function; and determining a first augmentation probability or target weight to be adopted in an (e+1)-th iteration in a case of e is less than E. 11. The electronic device of claim 10 , wherein the operations further comprise: acquiring a maximum augmentation probability; and determining the second augmentation probability based on the maximum augmentation probability, the maximum quantity of iterations and the first augmentation probability. 12. The electronic device of claim 11 , wherein determining the first augmentation probability to be adopted in the (e+1)-th iteration, comprises: determining the first augmentation probability to be adopted in the (e+1)-th iteration based on the maximum augmentation probability, the maximum quantity of iterations and the first augmentation probability of the e-th iteration. 13. The electronic device of claim 10 , wherein the operations further comprise: acquiring a maximum target weight; wherein de

Assignees

Inventors

Classifications

  • G06V10/82Primary

    using neural networks · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • the supervisor being an automated module, e.g. "intelligent oracle" · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12380328B2 cover?
Provided is a lightweight model training method, an image processing method, a device and a medium. The lightweight model training method includes: acquiring first and second augmentation probabilities and a target weight adopted in an e-th iteration; performing data augmentation on a data set based on the first and second augmentation probabilities respectively, to obtain first and second data…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 05 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).