Machine learning model search method, related apparatus, and device

US12475388B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12475388-B2
Application numberUS-202017758166-A
CountryUS
Kind codeB2
Filing dateNov 19, 2020
Priority dateDec 31, 2019
Publication dateNov 18, 2025
Grant dateNov 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This application relates to the field of artificial intelligence technologies, and discloses a machine learning model search method, a related apparatus, and a device. In the method, before model search and quantization, a plurality of single bit models are generated based on a to-be-quantized model, and evaluation parameters of layer structures in the plurality of single bit models are obtained. Further, after a candidate model selected from a candidate set is trained and tested, to obtain a target model, a quantization weight of each layer structure in the target model may be determined based on a network structure of the target model and evaluation parameters of all layer structures in the target model, a layer structure with a maximum quantization weight in the target model is quantized, and a model obtained through quantization is added to the candidate set.

First claim

Opening claim text (preview).

What is claimed is: 1 . A machine learning model search method, comprising: generating M single bit models based on a to-be-quantized model, wherein each of the M single bit models and the to-be-quantized model is a deep neural network with a same network structure, and M is a positive integer greater than 1; obtaining N evaluation parameters of all layer structures in the M single bit models through a measurement when a mobile terminal runs the M single bit models; and performing a model search process for at least one time, to output a target model whose N evaluation parameters and accuracy meet their respective requirements; wherein the model search process comprises: training and testing, by using a first dataset, a candidate model selected from a candidate set, to obtain the target model and accuracy of the target model, wherein the candidate model is a mixed bit model having the same network structure as the to-be-quantized model, and wherein the first dataset comprises a plurality of samples; and when at least one of the N evaluation parameters of the target model does not meet a predetermined threshold and the accuracy of the target model is greater than a target threshold, obtaining N evaluation parameters of all layer structures in the target model based on the N evaluation parameters of all the layer structures in the M single bit models, determining a quantization weight of each layer structure in the target model based on the network structure of the target model and the N evaluation parameters of all the layer structures in the target model, quantizing a layer structure with a maximum quantization weight in the target model, and adding a model obtained through quantization to the candidate set. 2 . The method according to claim 1 , wherein the N evaluation parameters of the target model comprise an inference time and a parameter quantity, and the determining the quantization weight of each layer structure in the target model based on the network structure of the target model and the N evaluation parameters of all the layer structures in the target model comprises: when the inference time of the target model is greater than a target inference time and the parameter quantity of the target model is not greater than a target parameter quantity, determining a quantization weight of a layer structure i in the target model based on an inference time of the layer structure i in the target model and a weight of the layer structure i in the target model, wherein i is an index of a layer structure in the target model, and i is a positive integer; when the inference time of the target model is not greater than the target inference time and the parameter quantity of the target model is greater than the target parameter quantity, determining the quantization weight of the layer structure i in the target model based on the parameter quantity of the layer structure i in the target model and the weight of the layer structure i in the target model; or when the inference time of the target model is greater than the target inference time and the parameter quantity of the target model is greater than the target parameter quantity, determining the quantization weight of the layer structure i in the target model based on the inference time of the layer structure i in the target model, the parameter quantity of the layer structure i in the target model, and the weight of the layer structure i in the target model. 3 . The method according to claim 1 , wherein before the training and testing the candidate model selected from the candidate set, the model search process further comprises: training and testing each candidate model in the candidate set by using a second dataset, to obtain a test accuracy of each candidate model in the candidate set, wherein a quantity of samples in the second dataset is less than a quantity of samples in the first dataset; and selecting the candidate model from the candidate set based on the test accuracy of each candidate model and a weight of each candidate model. 4 . The method according to claim 3 , wherein the weight of the candidate model is determined based on a total quantity of times of model search performed when the candidate model is added to the candidate set and a total quantity of times of current model search. 5 . The method according to claim 1 , wherein the quantizing the layer structure with the maximum quantization weight in the target model comprises: converting a model parameter of the layer structure with the maximum quantization weight in the target model into a model parameter represented by at least one bit quantity, wherein the at least one bit quantity is a bit quantity that is in a bit quantity set and that is less than a current bit quantity of the model parameter of the layer structure with the maximum quantization weight in the target model, wherein the bit quantity set comprises M values, and wherein the M values indicate bit quantities of model parameters of the M single bit models. 6 . The method according to claim 1 , wherein the model search process further comprises: when the accuracy of the target model is less than the target threshold, reselecting a different model from the candidate set, and performing the model search process. 7 . The method according to claim 1 , wherein the N evaluation parameters of all layer structures in the M single bit models comprise an inference time, and the obtaining the N evaluation parameters of all layer structures in the M single bit models comprises: sending the M single bit models to the mobile terminal, which runs the M single bit models and measures an inference time of each layer structure in the M single bit models; and receiving the inference time of each layer structure in the M single bit models from the mobile terminal. 8 . The method according to claim 1 , wherein during a model search performed for the first time, the candidate set comprises a single bit model with a largest bit quantity in the M single bit models. 9 . A terminal device, comprising: at least one processor; and at least one memory, the at least one memory comprising instructions that when executed by the at least one processor, cause the terminal device to: generate M single bit models based on a to-be-quantized model, wherein each of the M single bit models and the to-be-quantized model is a deep neural network with a same network structure, and M is a positive integer greater than 1; obtain N evaluation parameters of all layer structures in the M single bit models through a measurement when a mobile terminal runs the M single bit models; and perform a model search process for at least one time, to output a target model whose N evaluation parameters and accuracy meet their respective requirements, wherein the model search process comprises: training and testing, by using a first dataset, a candidate model selected from a candidate set, to obtain the target model and accuracy of the target model, wherein the candidate model is a mixed bit model having the same network structure as the to-be-quantized model, and wherein the first dataset comprises a plurality of samples; and when at least one of the N evaluation parameters of the target model does not meet a predetermined threshold and the accuracy of the target model is greater than a target threshold, obtaining N evaluation parameters of all layer structures in the target model based on the N evaluation parameters of all the layer structures in the M single bit models, determining a quantization weight of each layer structure in the target model based on the network structure of the target model and the N evaluation parameters of all the layer structures in the target model, quantizin

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • using electronic means · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12475388B2 cover?
This application relates to the field of artificial intelligence technologies, and discloses a machine learning model search method, a related apparatus, and a device. In the method, before model search and quantization, a plurality of single bit models are generated based on a to-be-quantized model, and evaluation parameters of layer structures in the plurality of single bit models are obtaine…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).