Computation method and device used in a convolutional neural network
US-2018349758-A1 · Dec 6, 2018 · US
US11270187B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11270187-B2 |
| Application number | US-201815914229-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 7, 2018 |
| Priority date | Nov 7, 2017 |
| Publication date | Mar 8, 2022 |
| Grant date | Mar 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method is provided. The method includes selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modifying the neural network model by inserting a plurality of quantization layers within the neural network model; associating a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and training the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, further including optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modifying the neural network model by inserting a plurality of quantization layers within the neural network model; associating a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and training the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, further including optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor. 2. The method of claim 1 , further comprising optimizing the weight scaling factor and the activation scaling factor based on minimizing a mean square quantization error (MSQE). 3. The method of claim 1 , further comprising inserting each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model. 4. The method of claim 1 , wherein the cost function includes a second coefficient corresponding to a second regularization term based on the weight scaling factor and the activation scaling factor being power-of-two numbers. 5. The method of claim 1 , further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, wherein the fixed-point neural network includes a plurality of convolutional layers, wherein each of the plurality of convolutional layers includes a convolution operation configured to perform convolution on feature maps and the quantized weights, a bias addition operation configured to perform addition on an output of the convolution operation and biases, a first multiplying operation configured to perform multiplication on an output of the bias addition operation and a first scale factor, an activation operation configured to apply an activation function to an output of the first multiplying operation, a second multiplying operation configured to perform multiplication on an output of the activation operation and a second scale factor, and a quantization operation configured to quantize an output of the second multiplying operation. 6. The method of claim 5 , wherein the weights are fixed-point weights. 7. The method of claim 5 , wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor. 8. The method of claim 5 , wherein the activation operation is a non-linear activation function. 9. The method of claim 1 , wherein training the neural network comprises: updating the weights by a stochastic gradient descent method; updating the weight scaling factor by the stochastic gradient descent method; updating the activation scaling factor by the stochastic gradient descent method; if the weight scaling factor and the activation scaling factor are of a power of two, including additional gradients of the stochastic descent method; updating regularization coefficients by the stochastic gradient descent method; and terminating the training if either the regularization coefficient is greater than a pre-determined constant or a number of iterations of the method is greater than a predetermined limit. 10. The method of claim 1 , further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, wherein the fixed-point neural network includes a plurality of convolutional layers, wherein each of the plurality of convolutional layers includes a convolution operation configured to perform convolution on feature maps and the quantized weights, a bias addition operation configured to perform addition on an output of the convolution operation and biases, a rectified linear unit (ReLU) activation operation configured to apply an ReLU activation function to an output of the bias addition operation, a scale-factor multiplying operation configured to perform multiplication on an output of the ReLU activation operation and a scale factor, and a quantization operation configured to quantize an output of the scale-factor multiplying operation. 11. The method of claim 10 , wherein the scale factor is a product of a weight scale factor and a quantization scale factor. 12. An apparatus, comprising: a memory storing instructions; and a processor, wherein the processor is configured to execute the instructions causing the processor to: select a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modify the neural network model by inserting a plurality of quantization layers within the neural network model; associate a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and train the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, and optimize a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, wherein the quantized weights are quantized using the optimized weight scaling factor. 13. The apparatus of claim 12 , wherein the processor is further configured to execute the instructions to optimize the weight scaling factor and the activation scaling factor based on minimizing a mean square quantization error (MSQE). 14. The apparatus of claim 12 , wherein the processor is further configured to execute the instructions to insert each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model. 15. The apparatus of claim 12 , wherein the cost function includes a second coefficient corresponding to a second regularization term based on the weight scaling factor and the activation scaling factor being power-of-two numbers. 16. The apparatus of claim 12 , wherein the neural network model is a fixed-point neural network to which the quantized weights, the weight scaling factor, and the activation scaling factor are applied, wherein the fixed-point neural network includes a plurality of convolutional layers, wherein each of the plurality of convolutional layers is configured to perform a convolution operation on feature maps and the quantized weights, and wherein the processor is further configured to execute the instructions to: perform addition on an output of the convolution operation and biases, perform multiplication on an output of the addition and a first scale factor, apply an activation function to an output of the first multiplication, perform multiplication on an output of the activation function and a second scale factor, and quantize an output of the second multiplication. 17. The apparatus of claim 16 , wherein the weights are fixed-point weights. 18. The apparatus of claim 16 , wherein the fi
Activation functions · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.