Neural Network Quantization Parameter Determination Method and Related Products

US2021286688A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021286688-A1
Application numberUS-201916622541-A
CountryUS
Kind codeA1
Filing dateSep 19, 2019
Priority dateJun 12, 2019
Publication dateSep 16, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a neural network quantization parameter determination method and related products. A board card in the related products includes a memory device, an interface device, a control device, and an artificial intelligence chip, in which the artificial intelligence chip is connected with the memory device, the control device, and the interface device respectively. The memory device is configured to store data, and the interface device is configured to transmit data between the artificial intelligence chip and an external device. The control device is configured to monitor the state of the artificial intelligence chip. The board card can be used to perform an artificial intelligence computation.

First claim

Opening claim text (preview).

1 . A method for determining neural network quantization parameters, comprising: obtaining an analyzing result of each type of data to be quantized, wherein the data to be quantized includes at least one type of neurons, weights, gradients, and biases of the neural network; and determining a corresponding quantization parameter according to the analyzing result of each type of the data to be quantized and a data bit width, wherein the quantization parameter is used by an artificial intelligence processor to perform corresponding quantization on data involved in a process of neural network operation. 2 .- 3 . (canceled) 4 . The method of claim 1 , wherein the neural network operation process includes at least one operation of neural network training, neural network inference, and neural network fine-tuning, wherein the analyzing result includes a maximum value and a minimum value, or includes a maximum absolute value, of each type of data to be quantized, wherein the maximum absolute value is determined according to the maximum value and the minimum value of each type of data to be quantized, and wherein the quantization parameter is determined according to the data bit width along with either the maximum value and the minimum value of each type of the data to be quantized or the maximum absolute value of each type of the data to be quantized. 5 .- 10 . (canceled) 11 . The method of claim 1 , wherein the data bit width is adjusted according to a corresponding quantization error by comparing the quantization error with a threshold to obtain a comparison result, and adjusting the data bit width according to the comparison result, wherein the quantization error is determined according to the quantized data and corresponding pre-quantized data, wherein the threshold includes at least one of a first threshold and a second threshold. 12 . (canceled) 13 . The method of claim 12 , wherein the adjusting of the data bit width includes: increasing the data bit width if the quantization error is greater than or equal to the first threshold, or reducing the data bit width if the quantization error is less than or equal to the second threshold, or keeping the data bit width unchanged if the quantization error is between the first threshold and the second threshold. 14 . (canceled) 15 . (canceled) 16 . The method of claim 11 , wherein a method for obtaining the quantization error includes: determining a quantization interval according to the data bit width, and determining the quantization error according to the quantization interval, the number of the quantized data, and the corresponding pre-quantized data. 17 . The method of claim 11 , wherein a method for obtaining the quantization error includes: performing inverse quantization on the quantized data to obtain inverse quantized data, wherein a data format of the inverse quantized data is the same as that of the corresponding pre-quantized data, and determining the quantization error according to the quantized data and the corresponding inverse quantized data. 18 . (canceled) 19 . The method of claim 11 , wherein the pre-quantized data is data to be quantized involved in weight update iteration within a target iteration interval, wherein the target iteration interval includes at least one weight update iteration, and the same data bit width is used in the quantization process within the same target iteration interval. 20 . The method of claim 19 , wherein the determining of the target iteration interval includes: at a predicted time point, determining a variation trend value of a point position parameter of data to be quantized involved in the weight update iteration, wherein the predicted time point is configured to determine whether the data bit width needs to be adjusted or not, and the predicted time point corresponds to the time point when the weight update iteration is completed, and determining the corresponding target iteration interval according to the variation trend value of the point position parameter. 21 . The method of claim 19 , wherein the determining of the target iteration interval includes: at a predicted time point, determining a variation trend value of a point position parameter and a variation trend value of data bit width corresponding to the data to be quantized involved in the weight iteration process, wherein the predicted time point is configured to determine whether the data bit width needs to be adjusted, and the predicted time point corresponds to the time point when the weight update iteration is completed, and determining the corresponding target iteration interval according to the variation trend value of the point position parameter and the variation trend value of the data bit width. 22 . The method of claim 20 , wherein the predicted time point includes a first predicted time point, wherein the first predicted time point is determined according to the target iteration interval. 23 . The method of claim 22 , wherein the predicted time point further includes a second predicted time point, wherein the second predicted time point is determined according to a curve of data variation range, wherein the curve of data variation range is obtained by analyzing the data variation range in the process of weight update iteration. 24 . The method of claim 20 , wherein the variation trend value of the point position parameter is determined according to a moving average value of the point position parameter corresponding to a current predicted time point and a moving average value of the point position parameter corresponding to a previous predicted time point, or is determined according to the point position parameter corresponding to the current predicted time point and the moving average value of the corresponding point position parameter corresponding to the previous predicted time point. 25 . (canceled) 26 . The method of claim 24 , wherein the determining of a moving average value of a point position parameter corresponding to the current predicted time point includes: determining the point position parameter corresponding to the current predicted time point according to a point position parameter corresponding to a previous predicted time point and an adjusted value of the data bit width, adjusting a moving average value of a point position parameter corresponding to the previous predicted time point according to the adjusted value of the data bit width to obtain an adjusted result, and determining the moving average value of the point position parameter corresponding to the current predicted time point according to the point position parameter corresponding to the current predicted time point and the adjusted result. 27 . The method of claim 24 , wherein the determining of the moving average value of the point position parameter corresponding to the current predicted time point include: determining an intermediate result of the moving average value of the point position parameter corresponding to the current predicted time point according to the point position parameter corresponding to the previous predicted time point and the moving average value of the point position parameter corresponding to the previous predicted time point, and determining the moving average value of the point position parameter corresponding to the current predicted time point according to the intermediate result of the moving average value of the point position parameter corresponding to the current predicted time

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Activation functions · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021286688A1 cover?
The present disclosure relates to a neural network quantization parameter determination method and related products. A board card in the related products includes a memory device, an interface device, a control device, and an artificial intelligence chip, in which the artificial intelligence chip is connected with the memory device, the control device, and the interface device respectively. The…
Who is the assignee on this patent?
Shanghai Cambricon Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).