Method and apparatus with neural network parameter quantization
US-2019347550-A1 · Nov 14, 2019 · US
US12475388B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12475388-B2 |
| Application number | US-202017758166-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 19, 2020 |
| Priority date | Dec 31, 2019 |
| Publication date | Nov 18, 2025 |
| Grant date | Nov 18, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This application relates to the field of artificial intelligence technologies, and discloses a machine learning model search method, a related apparatus, and a device. In the method, before model search and quantization, a plurality of single bit models are generated based on a to-be-quantized model, and evaluation parameters of layer structures in the plurality of single bit models are obtained. Further, after a candidate model selected from a candidate set is trained and tested, to obtain a target model, a quantization weight of each layer structure in the target model may be determined based on a network structure of the target model and evaluation parameters of all layer structures in the target model, a layer structure with a maximum quantization weight in the target model is quantized, and a model obtained through quantization is added to the candidate set.
Opening claim text (preview).
What is claimed is: 1 . A machine learning model search method, comprising: generating M single bit models based on a to-be-quantized model, wherein each of the M single bit models and the to-be-quantized model is a deep neural network with a same network structure, and M is a positive integer greater than 1; obtaining N evaluation parameters of all layer structures in the M single bit models through a measurement when a mobile terminal runs the M single bit models; and performing a model search process for at least one time, to output a target model whose N evaluation parameters and accuracy meet their respective requirements; wherein the model search process comprises: training and testing, by using a first dataset, a candidate model selected from a candidate set, to obtain the target model and accuracy of the target model, wherein the candidate model is a mixed bit model having the same network structure as the to-be-quantized model, and wherein the first dataset comprises a plurality of samples; and when at least one of the N evaluation parameters of the target model does not meet a predetermined threshold and the accuracy of the target model is greater than a target threshold, obtaining N evaluation parameters of all layer structures in the target model based on the N evaluation parameters of all the layer structures in the M single bit models, determining a quantization weight of each layer structure in the target model based on the network structure of the target model and the N evaluation parameters of all the layer structures in the target model, quantizing a layer structure with a maximum quantization weight in the target model, and adding a model obtained through quantization to the candidate set. 2 . The method according to claim 1 , wherein the N evaluation parameters of the target model comprise an inference time and a parameter quantity, and the determining the quantization weight of each layer structure in the target model based on the network structure of the target model and the N evaluation parameters of all the layer structures in the target model comprises: when the inference time of the target model is greater than a target inference time and the parameter quantity of the target model is not greater than a target parameter quantity, determining a quantization weight of a layer structure i in the target model based on an inference time of the layer structure i in the target model and a weight of the layer structure i in the target model, wherein i is an index of a layer structure in the target model, and i is a positive integer; when the inference time of the target model is not greater than the target inference time and the parameter quantity of the target model is greater than the target parameter quantity, determining the quantization weight of the layer structure i in the target model based on the parameter quantity of the layer structure i in the target model and the weight of the layer structure i in the target model; or when the inference time of the target model is greater than the target inference time and the parameter quantity of the target model is greater than the target parameter quantity, determining the quantization weight of the layer structure i in the target model based on the inference time of the layer structure i in the target model, the parameter quantity of the layer structure i in the target model, and the weight of the layer structure i in the target model. 3 . The method according to claim 1 , wherein before the training and testing the candidate model selected from the candidate set, the model search process further comprises: training and testing each candidate model in the candidate set by using a second dataset, to obtain a test accuracy of each candidate model in the candidate set, wherein a quantity of samples in the second dataset is less than a quantity of samples in the first dataset; and selecting the candidate model from the candidate set based on the test accuracy of each candidate model and a weight of each candidate model. 4 . The method according to claim 3 , wherein the weight of the candidate model is determined based on a total quantity of times of model search performed when the candidate model is added to the candidate set and a total quantity of times of current model search. 5 . The method according to claim 1 , wherein the quantizing the layer structure with the maximum quantization weight in the target model comprises: converting a model parameter of the layer structure with the maximum quantization weight in the target model into a model parameter represented by at least one bit quantity, wherein the at least one bit quantity is a bit quantity that is in a bit quantity set and that is less than a current bit quantity of the model parameter of the layer structure with the maximum quantization weight in the target model, wherein the bit quantity set comprises M values, and wherein the M values indicate bit quantities of model parameters of the M single bit models. 6 . The method according to claim 1 , wherein the model search process further comprises: when the accuracy of the target model is less than the target threshold, reselecting a different model from the candidate set, and performing the model search process. 7 . The method according to claim 1 , wherein the N evaluation parameters of all layer structures in the M single bit models comprise an inference time, and the obtaining the N evaluation parameters of all layer structures in the M single bit models comprises: sending the M single bit models to the mobile terminal, which runs the M single bit models and measures an inference time of each layer structure in the M single bit models; and receiving the inference time of each layer structure in the M single bit models from the mobile terminal. 8 . The method according to claim 1 , wherein during a model search performed for the first time, the candidate set comprises a single bit model with a largest bit quantity in the M single bit models. 9 . A terminal device, comprising: at least one processor; and at least one memory, the at least one memory comprising instructions that when executed by the at least one processor, cause the terminal device to: generate M single bit models based on a to-be-quantized model, wherein each of the M single bit models and the to-be-quantized model is a deep neural network with a same network structure, and M is a positive integer greater than 1; obtain N evaluation parameters of all layer structures in the M single bit models through a measurement when a mobile terminal runs the M single bit models; and perform a model search process for at least one time, to output a target model whose N evaluation parameters and accuracy meet their respective requirements, wherein the model search process comprises: training and testing, by using a first dataset, a candidate model selected from a candidate set, to obtain the target model and accuracy of the target model, wherein the candidate model is a mixed bit model having the same network structure as the to-be-quantized model, and wherein the first dataset comprises a plurality of samples; and when at least one of the N evaluation parameters of the target model does not meet a predetermined threshold and the accuracy of the target model is greater than a target threshold, obtaining N evaluation parameters of all layer structures in the target model based on the N evaluation parameters of all the layer structures in the M single bit models, determining a quantization weight of each layer structure in the target model based on the network structure of the target model and the N evaluation parameters of all the layer structures in the target model, quantizin
Learning methods · CPC title
using electronic means · CPC title
Architecture, e.g. interconnection topology · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.