Universal Loss-Error-Aware Quantization for Deep Neural Networks with Flexible Ultra-Low-Bit Weights and Activations
US-2022129759-A1 · Apr 28, 2022 · US
US2020082269A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020082269-A1 |
| Application number | US-201916373447-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 2, 2019 |
| Priority date | Sep 12, 2018 |
| Publication date | Mar 12, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of a method includes performing one or more activation functions in a neural network using weights that have been quantized from floating point values to values that are represented using fewer bits than the floating point values. The method further includes performing a first quantization of the weights from the floating point values to the values that are represented using fewer bits than the floating point values after the floating point values are updated using a first number of forward-backward passes of the neural network using training data. The method further includes performing a second quantization of the weights from the floating point values to the values that are represented using fewer bits than the floating point values after the floating point values are updated using a second number of forward-backward passes of the neural network following the first quantization of the weights.
Opening claim text (preview).
What is claimed is: 1 . A processor comprising: one or more arithmetic logic units (ALUs) to perform one or more activation functions in a neural network using weights that have been converted from a first floating point value representation to a second floating point value representation having fewer bits than the first floating point value representation. 2 . The processor of claim 1 , wherein the one or more ALUs further perform one or more activation functions in the neural network by applying the weights to activation inputs that have been converted from the first floating point value representation to the second floating point value representation. 3 . The processor of claim 1 , wherein the weights are converted by: performing a first quantization of the weights from the first floating point value representation to the second floating point value representation after the weights are updated using a first number of forward-backward passes of training the neural network; and performing a second quantization of the weights from the first floating point value representation to the second floating point value representation after the weights are updated using a second number of forward-backward passes of training the neural network following the first quantization of the weight. 4 . The processor of claim 3 , wherein the first number of forward-backward passes is determined based on an offset hyperparameter associated with training the neural network. 5 . The processor of claim 3 , wherein the second number of forward-backward passes is determined based on a frequency hyperparameter associated with training the neural network. 6 . The processor of claim 1 , wherein the weights are converted by: freezing a first portion of the weights in a first one or more layers of the neural network; and modifying a second portion of the weights in a second one or more layers of the neural network. 7 . The processor of claim 6 , wherein an output of the first one or more layers is quantized prior to modifying the second portion of the weights in the second one or more layers. 8 . The processor of claim 6 , wherein the weights are converted by: after the second portion of the weights is modified, freezing the second portion of the weights in the second one or more layers of the neural network; and modifying a third portion of the weights in a third one or more layers of the neural network following the second one or more layers. 9 . The processor of claim 6 , wherein modifying the second portion of the weights comprises: updating the floating point values in the second portion of the weights based at least on an output of the first one or more layers; and converting the second portion of the weights from the first floating point value representation to the second floating point value representation. 10 . A method, comprising: training one or more neural networks, wherein training the one or more neural networks includes converting weight parameters from a first floating point value representation to a second floating point value representation having fewer bits than the first floating point value representation. 11 . The method of claim 10 , wherein converting the weight parameters comprises: performing a first quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a first number of forward-backward passes of training the one or more neural networks; and performing a second quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a second number of forward-backward passes of training the one or more neural networks following the first quantization of the weight parameters. 12 . The method of claim 11 , further comprising: determining the first number of forward-backward passes based on an offset hyperparameter associated with the training of the one or more neural networks. 13 . The method of claim 11 , further comprising: determining the second number of forward-backward passes based on a frequency hyperparameter associated with the training of the one or more neural networks. 14 . The method of claim 10 , wherein converting the weight parameters comprises: freezing a first portion of the weight parameters in a first one or more layers of the one or more neural networks; and modifying a second portion of the weight parameters in a second one or more layers of the one or more neural networks that follow the first one or more layers. 15 . The method of claim 14 , further comprising quantizing an output of the first one or more layers prior to modifying the second portion of the weight parameters in the second one or more layers. 16 . The method of claim 14 , further comprising: after the second portion of the weight parameters is modified, freezing the second portion of the weight parameters in the second one or more layers of the one or more neural networks; and modifying a third portion of the weight parameters in a third one or more layers of the one or more neural networks that follow the second one or more layers. 17 . The method of claim 14 , wherein modifying the second portion of the weight parameters comprises: updating the floating point values in the second portion of the weight parameters based at least on an output of the first one or more layers; and converting the second portion of the weight parameters from the first floating point value representation to the second floating point value representation. 18 . The method of claim 14 , wherein the first one or more layers of the neural network comprise a convolutional layer, a batch normalization layer, and an activation layer. 19 . The method of claim 10 , wherein the weight parameters are associated with a fully connected layer in the neural network. 20 . A system comprising: one or more computers including one or more processors to train one or more neural networks, wherein training the one or more neural networks includes converting weight parameters from a first floating point value representation to a second floating point value representation having fewer bits than the first floating point value representation. 21 . The system of claim 20 , wherein converting the weight parameters comprises: performing a first quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a first number of forward-backward passes of training the one or more neural networks; and performing a second quantization of the weight parameters from the first floating point value representation to the second floating point value representation after the weight parameters are updated using a second number of forward-backward passes of training the one or more neural networks following the first quantization of the weight parameters. 22 . The system of claim 21 , wherein the first number of forward-backward passes is based on an offset hyperparameter associated with the training of the one or more neural networks. 23 . The system of claim 21 , wherein the second number of forward-backward passes is based on a frequency hyperparameter associated with the training of the one or more n
Neural networks · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Inference or reasoning models · CPC title
Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations {(G06F7/49, G06F7/491 take precedence)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.