Training method for quantizing the weights and inputs of a neural network

US12288163B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12288163-B2
Application numberUS-202017031817-A
CountryUS
Kind codeB2
Filing dateSep 24, 2020
Priority dateSep 24, 2019
Publication dateApr 29, 2025
Grant dateApr 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and processing unit for training a neural network to selectively quantize weights of a filter of the neural network as either binary weights or ternary weights. A plurality of training iterations a performed that each comprise: quantizing a set of real-valued weights of a filter to generate a corresponding set of quantized weights; generating an output feature tensor based on matrix multiplication of an input feature tensor and the set of quantized weights; computing, based on the output feature tensor, a loss based on a regularization function that is configured to move the loss towards a minimum value when either: (i) the quantized weights move towards binary weights, or (ii) the quantized weights move towards a ternary weights; computing a gradient with an objective of minimizing the loss; updating the real-valued weights based on the computed gradient. When the training iterations are complete, a set of weights quantized from the updated real-valued weights is stored as either a set of binary weights or a set of ternary weights.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for training a convolutional neural network (CNN) to selectively quantize weights of a filter of the CNN as either binary weights or ternary weights, the method comprising: performing a plurality of training iterations for training the CNN that each comprise: quantizing a set of real-valued weights of the filter to generate a corresponding set of quantized weights; generating an output feature tensor based on matrix multiplication of an input feature tensor and the set of quantized weights; computing, based on the output feature tensor, a loss based on a regularization function that is configured to move the loss towards a minimum value when either: (i) the quantized weights move towards binary weights, or (ii) the quantized weights move towards ternary weights, wherein the regularization function includes a learnable shape parameter, wherein changing a magnitude of the shape parameter in one direction causes the regularization function to approximate a binary regularization function and changing the magnitude of the shape parameter in an opposite direction causes the regularization function to approximate a ternary regularization function; computing a gradient with an objective of minimizing the loss; and backpropagating the loss through the CNN to update values of the real-valued weights based on the computed gradient; when the training iterations are complete, storing a final set of weights quantized from the updated real-valued weights as either a set of binary weights or a set of ternary weights, the final set of weights representing a learned set of quantized weights; and deploying the CNN model including the final set of weights to a computationally constrained hardware device, to cause the device to perform forward inference using the CNN model with respect to input data, the CNN model including the final set of weights representing a compressed CNN. 2. The method of claim 1 wherein the loss is further based on a difference between one or more values predicted by the CNN with respect to an original input feature tensor from a training set and corresponding one or more true values known for the original input feature tensor. 3. The method of claim 1 comprising sampling an initial set of real-valued weights from a bimodal distribution to use as the set of real-valued weights for a first iteration of the plurality of iterations. 4. The method of claim 1 wherein the matrix multiplication is part of a convolution operation, and the method comprises training a plurality of filters. 5. The method of claim 1 wherein the input feature tensor is a binarized input feature tensor. 6. The method of claim 5 comprising binarizing a real-valued input feature tensor to provide the binarized input feature tensor. 7. The method of claim 1 wherein generating the output feature tensor comprises applying an activation function that binarizes an output provided by the matrix multiplication of the input feature tensor and the set of quantized weights. 8. The method of claim 1 wherein each element in a set of binary weights has a value of either −1 or +1, and each element in a set of ternary weights has a value of either −1, or 0, or +1. 9. A processing unit for training a convolutional neural network (CNN) to selectively quantize weights of a filter of the CNN as either binary weights or ternary weights, the processing unit comprising a processor device and a persistent storage coupled to the processor device storing instructions that when executed by the processor device cause the processing unit to: perform a plurality of training iterations for training the CNN that each comprise: quantizing a set of real-valued weights of the filter to generate a corresponding set of quantized weights; generating an output feature tensor based on matrix multiplication of an input feature tensor and the set of quantized weights; computing, based on the output feature tensor, a loss based on a regularization function that is configured to move the loss towards a minimum value when either: (i) the quantized weights move towards binary weights, or (ii) the quantized weights move towards ternary weights, wherein the regularization function includes a learnable shape parameter, wherein changing a magnitude of the shape parameter in one direction causes the regularization function to approximate a binary regularization function and changing the magnitude of the shape parameter in an opposite direction causes the regularization function to approximate a ternary regularization function; computing a gradient with an objective of minimizing the loss; and backpropagating the loss through the CNN to update values of the real-valued weights based on the computed gradient; when the training iterations are complete, store a final set of weights quantized from the updated real-valued weights as either a set of binary weights or a set of ternary weights, the final set of weights representing a learned set of quantized weights; and deploy the CNN model including the final set of weights to a computationally constrained hardware device, to cause the device to perform forward inference using the CNN model with respect to input data, the CNN model including the final set of weights representing a compressed CNN. 10. The processing unit of claim 9 wherein the loss is further based on a difference between one or more values predicted by the CNN with respect to an original input feature tensor from a training set and corresponding one or more true values known for the original input feature tensor. 11. The processing unit of claim 9 wherein the processing unit is caused to sample an initial set of real-valued weights from a bimodal distribution to use as the set of real-valued weights for a first iteration of the plurality of iterations. 12. The processing unit of claim 9 wherein the matrix multiplication is part of a convolution operation, and the method comprises training a plurality of filters. 13. The processing unit of claim 9 wherein the input feature tensor is a binarized input feature tensor. 14. The processing unit of claim 13 wherein the processing unit is caused to binarize a real-valued input feature tensor to provide the binarized input feature tensor. 15. The processing unit of claim 9 wherein generating the output feature tensor comprises applying an activation function that binarizes an output provided by the matrix multiplication of the input feature tensor and the set of quantized weights. 16. The processing unit of claim 9 wherein each element in a set of binary weights has a value of either −1 or +1, and each element in a set of ternary weights has a value of either −1, or 0, or +1. 17. A non-transitory computer readable medium that persistently stores software instructions for training a convolutional neural network (CNN) to selectively quantize weights of a filter of the CNN as either binary weights or ternary weights, the software instructions including instructions for causing a processing unit to: perform a plurality of training iterations for training the CNN that each comprise: quantizing a set of real-valued weights of the filter to generate a corresponding set of quantized weights; generating an output feature tensor based on matrix multiplication of an input feature tensor and the set of quantized weights; computing, based on the output feature tensor, a loss based on a regularization function that is configured to move the loss towards a minimum value when either: (i) the quantized weights move towards binary weights

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Activation functions · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12288163B2 cover?
A method and processing unit for training a neural network to selectively quantize weights of a filter of the neural network as either binary weights or ternary weights. A plurality of training iterations a performed that each comprise: quantizing a set of real-valued weights of a filter to generate a corresponding set of quantized weights; generating an output feature tensor based on matrix mu…
Who is the assignee on this patent?
Partovi Nia Vahid, Razani Ryan, Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).