What technology area does this patent fall under?

Primary CPC classification G06N3/0495. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for learning low-precision neural network that combines weight quantization and activation quantization

US11270187B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11270187-B2
Application number	US-201815914229-A
Country	US
Kind code	B2
Filing date	Mar 7, 2018
Priority date	Nov 7, 2017
Publication date	Mar 8, 2022
Grant date	Mar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is provided. The method includes selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modifying the neural network model by inserting a plurality of quantization layers within the neural network model; associating a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and training the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, further including optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modifying the neural network model by inserting a plurality of quantization layers within the neural network model; associating a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and training the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, further including optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor. 2. The method of claim 1 , further comprising optimizing the weight scaling factor and the activation scaling factor based on minimizing a mean square quantization error (MSQE). 3. The method of claim 1 , further comprising inserting each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model. 4. The method of claim 1 , wherein the cost function includes a second coefficient corresponding to a second regularization term based on the weight scaling factor and the activation scaling factor being power-of-two numbers. 5. The method of claim 1 , further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, wherein the fixed-point neural network includes a plurality of convolutional layers, wherein each of the plurality of convolutional layers includes a convolution operation configured to perform convolution on feature maps and the quantized weights, a bias addition operation configured to perform addition on an output of the convolution operation and biases, a first multiplying operation configured to perform multiplication on an output of the bias addition operation and a first scale factor, an activation operation configured to apply an activation function to an output of the first multiplying operation, a second multiplying operation configured to perform multiplication on an output of the activation operation and a second scale factor, and a quantization operation configured to quantize an output of the second multiplying operation. 6. The method of claim 5 , wherein the weights are fixed-point weights. 7. The method of claim 5 , wherein the first scale factor is a product of the weight scaling factor and the activation scaling factor. 8. The method of claim 5 , wherein the activation operation is a non-linear activation function. 9. The method of claim 1 , wherein training the neural network comprises: updating the weights by a stochastic gradient descent method; updating the weight scaling factor by the stochastic gradient descent method; updating the activation scaling factor by the stochastic gradient descent method; if the weight scaling factor and the activation scaling factor are of a power of two, including additional gradients of the stochastic descent method; updating regularization coefficients by the stochastic gradient descent method; and terminating the training if either the regularization coefficient is greater than a pre-determined constant or a number of iterations of the method is greater than a predetermined limit. 10. The method of claim 1 , further comprising applying the quantized weights, the weight scaling factor, and the activation scaling factor to a fixed-point neural network, wherein the fixed-point neural network includes a plurality of convolutional layers, wherein each of the plurality of convolutional layers includes a convolution operation configured to perform convolution on feature maps and the quantized weights, a bias addition operation configured to perform addition on an output of the convolution operation and biases, a rectified linear unit (ReLU) activation operation configured to apply an ReLU activation function to an output of the bias addition operation, a scale-factor multiplying operation configured to perform multiplication on an output of the ReLU activation operation and a scale factor, and a quantization operation configured to quantize an output of the scale-factor multiplying operation. 11. The method of claim 10 , wherein the scale factor is a product of a weight scale factor and a quantization scale factor. 12. An apparatus, comprising: a memory storing instructions; and a processor, wherein the processor is configured to execute the instructions causing the processor to: select a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modify the neural network model by inserting a plurality of quantization layers within the neural network model; associate a cost function with the modified neural network model, wherein the cost function includes a first coefficient corresponding to a first regularization term, and wherein an initial value of the first coefficient is pre-defined; and train the modified neural network model to generate quantized weights for a layer by increasing the first coefficient until all weights are quantized and the first coefficient satisfies a pre-defined threshold, and optimize a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, wherein the quantized weights are quantized using the optimized weight scaling factor. 13. The apparatus of claim 12 , wherein the processor is further configured to execute the instructions to optimize the weight scaling factor and the activation scaling factor based on minimizing a mean square quantization error (MSQE). 14. The apparatus of claim 12 , wherein the processor is further configured to execute the instructions to insert each quantization layer of the plurality of quantization layers after each activation output in each layer within the neural network model. 15. The apparatus of claim 12 , wherein the cost function includes a second coefficient corresponding to a second regularization term based on the weight scaling factor and the activation scaling factor being power-of-two numbers. 16. The apparatus of claim 12 , wherein the neural network model is a fixed-point neural network to which the quantized weights, the weight scaling factor, and the activation scaling factor are applied, wherein the fixed-point neural network includes a plurality of convolutional layers, wherein each of the plurality of convolutional layers is configured to perform a convolution operation on feature maps and the quantized weights, and wherein the processor is further configured to execute the instructions to: perform addition on an output of the convolution operation and biases, perform multiplication on an output of the addition and a first scale factor, apply an activation function to an output of the first multiplication, perform multiplication on an output of the activation function and a second scale factor, and quantize an output of the second multiplication. 17. The apparatus of claim 16 , wherein the weights are fixed-point weights. 18. The apparatus of claim 16 , wherein the fi

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

G06N3/048
Activation functions · CPC title
G06N3/0495Primary
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/082Primary
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

View patent family 66327406

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11270187B2 cover?: A method is provided. The method includes selecting a neural network model, wherein the neural network model includes a plurality of layers, and wherein each of the plurality of layers includes weights and activations; modifying the neural network model by inserting a plurality of quantization layers within the neural network model; associating a cost function with the modified neural network m…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/0495. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Computation method and device used in a convolutional neural network

Quantized neural network training and inference

Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same

Fixed point neural network based on floating point neural network quantization

Bit width selection for fixed point neural networks

Frequently asked questions