Quantization method and device for weights of batch normalization layer

US11455539B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11455539-B2
Application numberUS-201916541275-A
CountryUS
Kind codeB2
Filing dateAug 15, 2019
Priority dateNov 12, 2018
Publication dateSep 27, 2022
Grant dateSep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment of the present invention provides a quantization method for weights of a plurality of batch normalization layers, including: receiving a plurality of previously learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of first weights; performing a first quantization on the plurality of first weights using the first distribution information to obtain a plurality of second weights; obtaining second distribution information of the plurality of second weights; and performing a second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights, and thereby reducing an error that may occur when quantizing the weight of the batch normalization layer.

First claim

Opening claim text (preview).

What is claimed is: 1. A quantization method for performing a quantization, comprising: for a plurality of batch normalization layers implemented in hardware in a neural network, performing operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: receiving a plurality of previously-learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the plurality of second weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 2. The quantization method of claim 1 , wherein the first bit width and the second bit width are 4 bits. 3. The quantization method of claim 1 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 4. The quantization method of claim 1 , wherein the first quantization is an integer power-of-two quantization, and the second quantization is a dynamic range floating point quantization. 5. The quantization method of claim 1 , further comprising repeating the receiving, the obtaining of the first distribution information, and the first quantizing, for the plurality of previously-learned first weights a predetermined number of times. 6. The quantization method of claim 1 , further comprising repeatedly applying a quantization process for a first layer of remaining layers among the plurality of batch normalization layers. 7. A batch normalization layer quantization device for performing a quantization, comprising: an input part that receives a plurality of previously-learned first weights of a plurality of batch normalization layers implemented in hardware in a neural network, and input data of the plurality of batch normalization layers; a processor that, for the plurality of batch normalization layers, performs operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the second plurality of weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; and performing normalization on the input data using the plurality of final weights; and a memory that stores the plurality of final weights using the first bit width and the second bit width; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 8. The batch normalization layer quantization device of claim 7 , wherein the first bit width and the second bit width are 4 bits. 9. The batch normalization layer quantization device of claim 7 , wherein the first quantization is an integer power-of-two quantization, and the second quantization is a dynamic range floating point quantization. 10. The batch normalization layer quantization device of claim 7 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 11. The batch normalization layer quantization device of claim 7 , wherein the processor repeats the receiving, the obtaining of the first distribution information, and the first quantizing, for the plurality of previously-learned first weights a predetermined number of times. 12. A quantization method for performing a quantization, comprising: for a plurality of batch normalization layers implemented in hardware in a neural network, performing operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: receiving a plurality of previously-learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the plurality of second weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; and performing normalization on an input data using the plurality of final weights; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 13. The quantization method of claim 12 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 14. The quantization method of claim 13 , further comprising repeatedly applying a quantization process for a first layer of remaining layers among the plurality of batch normalization layers.

Assignees

Inventors

Classifications

  • G06N3/048Primary

    Activation functions · CPC title

  • G06N3/0495Primary

    Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455539B2 cover?
An embodiment of the present invention provides a quantization method for weights of a plurality of batch normalization layers, including: receiving a plurality of previously learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of first weights; performing a first quantization on the plurality of first weights using the …
Who is the assignee on this patent?
Electronics & Telecommunications Res Inst
What technology area does this patent fall under?
Primary CPC classification G06N3/048. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).