Quantization method of improving the model inference accuracy
US-2020364552-A1 · Nov 19, 2020 · US
US2021286688A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021286688-A1 |
| Application number | US-201916622541-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 19, 2019 |
| Priority date | Jun 12, 2019 |
| Publication date | Sep 16, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to a neural network quantization parameter determination method and related products. A board card in the related products includes a memory device, an interface device, a control device, and an artificial intelligence chip, in which the artificial intelligence chip is connected with the memory device, the control device, and the interface device respectively. The memory device is configured to store data, and the interface device is configured to transmit data between the artificial intelligence chip and an external device. The control device is configured to monitor the state of the artificial intelligence chip. The board card can be used to perform an artificial intelligence computation.
Opening claim text (preview).
1 . A method for determining neural network quantization parameters, comprising: obtaining an analyzing result of each type of data to be quantized, wherein the data to be quantized includes at least one type of neurons, weights, gradients, and biases of the neural network; and determining a corresponding quantization parameter according to the analyzing result of each type of the data to be quantized and a data bit width, wherein the quantization parameter is used by an artificial intelligence processor to perform corresponding quantization on data involved in a process of neural network operation. 2 .- 3 . (canceled) 4 . The method of claim 1 , wherein the neural network operation process includes at least one operation of neural network training, neural network inference, and neural network fine-tuning, wherein the analyzing result includes a maximum value and a minimum value, or includes a maximum absolute value, of each type of data to be quantized, wherein the maximum absolute value is determined according to the maximum value and the minimum value of each type of data to be quantized, and wherein the quantization parameter is determined according to the data bit width along with either the maximum value and the minimum value of each type of the data to be quantized or the maximum absolute value of each type of the data to be quantized. 5 .- 10 . (canceled) 11 . The method of claim 1 , wherein the data bit width is adjusted according to a corresponding quantization error by comparing the quantization error with a threshold to obtain a comparison result, and adjusting the data bit width according to the comparison result, wherein the quantization error is determined according to the quantized data and corresponding pre-quantized data, wherein the threshold includes at least one of a first threshold and a second threshold. 12 . (canceled) 13 . The method of claim 12 , wherein the adjusting of the data bit width includes: increasing the data bit width if the quantization error is greater than or equal to the first threshold, or reducing the data bit width if the quantization error is less than or equal to the second threshold, or keeping the data bit width unchanged if the quantization error is between the first threshold and the second threshold. 14 . (canceled) 15 . (canceled) 16 . The method of claim 11 , wherein a method for obtaining the quantization error includes: determining a quantization interval according to the data bit width, and determining the quantization error according to the quantization interval, the number of the quantized data, and the corresponding pre-quantized data. 17 . The method of claim 11 , wherein a method for obtaining the quantization error includes: performing inverse quantization on the quantized data to obtain inverse quantized data, wherein a data format of the inverse quantized data is the same as that of the corresponding pre-quantized data, and determining the quantization error according to the quantized data and the corresponding inverse quantized data. 18 . (canceled) 19 . The method of claim 11 , wherein the pre-quantized data is data to be quantized involved in weight update iteration within a target iteration interval, wherein the target iteration interval includes at least one weight update iteration, and the same data bit width is used in the quantization process within the same target iteration interval. 20 . The method of claim 19 , wherein the determining of the target iteration interval includes: at a predicted time point, determining a variation trend value of a point position parameter of data to be quantized involved in the weight update iteration, wherein the predicted time point is configured to determine whether the data bit width needs to be adjusted or not, and the predicted time point corresponds to the time point when the weight update iteration is completed, and determining the corresponding target iteration interval according to the variation trend value of the point position parameter. 21 . The method of claim 19 , wherein the determining of the target iteration interval includes: at a predicted time point, determining a variation trend value of a point position parameter and a variation trend value of data bit width corresponding to the data to be quantized involved in the weight iteration process, wherein the predicted time point is configured to determine whether the data bit width needs to be adjusted, and the predicted time point corresponds to the time point when the weight update iteration is completed, and determining the corresponding target iteration interval according to the variation trend value of the point position parameter and the variation trend value of the data bit width. 22 . The method of claim 20 , wherein the predicted time point includes a first predicted time point, wherein the first predicted time point is determined according to the target iteration interval. 23 . The method of claim 22 , wherein the predicted time point further includes a second predicted time point, wherein the second predicted time point is determined according to a curve of data variation range, wherein the curve of data variation range is obtained by analyzing the data variation range in the process of weight update iteration. 24 . The method of claim 20 , wherein the variation trend value of the point position parameter is determined according to a moving average value of the point position parameter corresponding to a current predicted time point and a moving average value of the point position parameter corresponding to a previous predicted time point, or is determined according to the point position parameter corresponding to the current predicted time point and the moving average value of the corresponding point position parameter corresponding to the previous predicted time point. 25 . (canceled) 26 . The method of claim 24 , wherein the determining of a moving average value of a point position parameter corresponding to the current predicted time point includes: determining the point position parameter corresponding to the current predicted time point according to a point position parameter corresponding to a previous predicted time point and an adjusted value of the data bit width, adjusting a moving average value of a point position parameter corresponding to the previous predicted time point according to the adjusted value of the data bit width to obtain an adjusted result, and determining the moving average value of the point position parameter corresponding to the current predicted time point according to the point position parameter corresponding to the current predicted time point and the adjusted result. 27 . The method of claim 24 , wherein the determining of the moving average value of the point position parameter corresponding to the current predicted time point include: determining an intermediate result of the moving average value of the point position parameter corresponding to the current predicted time point according to the point position parameter corresponding to the previous predicted time point and the moving average value of the point position parameter corresponding to the previous predicted time point, and determining the moving average value of the point position parameter corresponding to the current predicted time point according to the intermediate result of the moving average value of the point position parameter corresponding to the current predicted time
Related publications grouped by family.
Answers are generated from the same data shown on this page.