Information processing method and terminal device
US-2021182077-A1 · Jun 17, 2021 · US
US11455539B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11455539-B2 |
| Application number | US-201916541275-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 15, 2019 |
| Priority date | Nov 12, 2018 |
| Publication date | Sep 27, 2022 |
| Grant date | Sep 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An embodiment of the present invention provides a quantization method for weights of a plurality of batch normalization layers, including: receiving a plurality of previously learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of first weights; performing a first quantization on the plurality of first weights using the first distribution information to obtain a plurality of second weights; obtaining second distribution information of the plurality of second weights; and performing a second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights, and thereby reducing an error that may occur when quantizing the weight of the batch normalization layer.
Opening claim text (preview).
What is claimed is: 1. A quantization method for performing a quantization, comprising: for a plurality of batch normalization layers implemented in hardware in a neural network, performing operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: receiving a plurality of previously-learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the plurality of second weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 2. The quantization method of claim 1 , wherein the first bit width and the second bit width are 4 bits. 3. The quantization method of claim 1 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 4. The quantization method of claim 1 , wherein the first quantization is an integer power-of-two quantization, and the second quantization is a dynamic range floating point quantization. 5. The quantization method of claim 1 , further comprising repeating the receiving, the obtaining of the first distribution information, and the first quantizing, for the plurality of previously-learned first weights a predetermined number of times. 6. The quantization method of claim 1 , further comprising repeatedly applying a quantization process for a first layer of remaining layers among the plurality of batch normalization layers. 7. A batch normalization layer quantization device for performing a quantization, comprising: an input part that receives a plurality of previously-learned first weights of a plurality of batch normalization layers implemented in hardware in a neural network, and input data of the plurality of batch normalization layers; a processor that, for the plurality of batch normalization layers, performs operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the second plurality of weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; and performing normalization on the input data using the plurality of final weights; and a memory that stores the plurality of final weights using the first bit width and the second bit width; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 8. The batch normalization layer quantization device of claim 7 , wherein the first bit width and the second bit width are 4 bits. 9. The batch normalization layer quantization device of claim 7 , wherein the first quantization is an integer power-of-two quantization, and the second quantization is a dynamic range floating point quantization. 10. The batch normalization layer quantization device of claim 7 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 11. The batch normalization layer quantization device of claim 7 , wherein the processor repeats the receiving, the obtaining of the first distribution information, and the first quantizing, for the plurality of previously-learned first weights a predetermined number of times. 12. A quantization method for performing a quantization, comprising: for a plurality of batch normalization layers implemented in hardware in a neural network, performing operations for quantizing weights of the plurality of batch normalization layers to reduce a bit-width requirement amount and to reduce a memory capacity required to store the weights, the operations including: receiving a plurality of previously-learned first weights of the plurality of batch normalization layers; obtaining first distribution information of the plurality of previously-learned first weights; performing a first quantization on the plurality of previously-learned first weights using the first distribution information to obtain a plurality of second weights; assigning a first bit width, which is a part of all bit widths assigned to the quantization, to the first quantization; obtaining second distribution information of the plurality of second weights; assigning a second bit width, which is a part of all bit widths assigned to the quantization, to a second quantization; performing the second quantization on the plurality of second weights using the second distribution information to obtain a plurality of final weights having the first bit width and the second bit width; and performing normalization on an input data using the plurality of final weights; wherein the first bit width and the second bit width are a same bit width that is reduced from bit widths of the previously-learned first weights before the quantization. 13. The quantization method of claim 12 , wherein the first distribution information includes an average value and a variance value of the plurality of previously-learned first weights, and the second distribution information includes an average value and a variance value of the plurality of second weights. 14. The quantization method of claim 13 , further comprising repeatedly applying a quantization process for a first layer of remaining layers among the plurality of batch normalization layers.
Activation functions · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.