Parametric Power-Of-2 Clipping Activations for Quantization for Convolutional Neural Networks
US-2021224658-A1 · Jul 22, 2021 · US
US2022398430A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022398430-A1 |
| Application number | US-202217659067-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 13, 2022 |
| Priority date | Jun 9, 2021 |
| Publication date | Dec 15, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for quantizing a deep neural network is provided, which includes extracting first statistical information on output values of a first normalization layer included in the deep neural network, determining a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information, and quantizing the input values of the subsequent layer into discretized values having the determined discretization interval.
Opening claim text (preview).
1 . A method for quantizing a deep neural network used for inference, the method performed by one or more processors and comprising: extracting first statistical information on output values of a first normalization layer included in the deep neural network; determining a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information; and quantizing the input values of the subsequent layer into discretized values having the determined discretization interval, wherein the extracting the first statistical information includes extracting a first scale factor for one or more channels associated with the first normalization layer from information indicative of a distribution of the output values of the first normalization layer, the method further comprising extracting, from information indicative of a distribution of output values of a second normalization layer disposed on a shortcut path included in the deep neural network and disposed before the subsequent layer, a second scale factor for one or more channels associated with the second normalization layer, wherein the determining the discretization interval includes calculating a clipping value using the extracted first scale factor and the extracted second scale factor. 2 . The method according to claim 1 , wherein the deep neural network is a model trained by using training data, and the determining the discretization interval includes: calculating the clipping value using the extracted first scale factor and the extracted second scale factor without requiring use of at least a portion of the training data for the deep neural network. 3 . The method according to claim 1 , wherein the second normalization layer is indirectly connected to the subsequent layer, while there is no separate normalization layer disposed between the subsequent layer and the second normalization layer. 4 . The method according to claim 1 , wherein the determining the discretization interval includes determining the discretization interval associated with the input values of the subsequent layer by using the calculated clipping value and a number of bits of data used for inference in the deep neural network. 5 . The method according to claim 1 , wherein the calculating the clipping value includes: selecting a maximum value from among values calculated based on the first scale factor and the second scale factor for each of the one or more channels associated with the first normalization layer and the one or more channels associated with the second normalization layer; and calculating the clipping value by using the selected maximum value and a preset value corresponding to a performance equal to or greater than a predetermined reference. 6 . The method according to claim 1 , wherein the output values of the first normalization layer and the output values of the second normalization layer have a normal distribution. 7 . The method according to claim 1 , wherein a number of bits of data used for training the deep neural network is greater than a number of bits of data used for inference of the deep neural network. 8 . A computer program stored in a non-transitory computer-readable recording medium for executing, on a computer, the method according to claim 1 . 9 . A computing device, comprising: a memory storing one or more instructions; and a processor configured to execute the stored one or more instructions to: extract first statistical information on output values of a first normalization layer included in a deep neural network; determine a discretization interval associated with input values of a subsequent layer of the first normalization layer by using the extracted first statistical information; and quantize the input values of the subsequent layer into discretized values having the determined discretization interval, wherein the processor is further configured to: extract a first scale factor for one or more channels associated with the first normalization layer from information indicative of a distribution of the output values of the first normalization layer; extract, from information indicative of a distribution of output values of a second normalization layer disposed on a shortcut path included in the deep neural network and disposed before the subsequent layer, a second scale factor for one or more channels associated with the second normalization layer; and calculate a clipping value using the extracted first scale factor and the extracted second scale factor. 10 . The computing device according to claim 9 , wherein the deep neural network is a model trained by using training data, and the processor is further configured to calculate the clipping value using the extracted first scale factor and the extracted second scale factor without requiring use of at least a portion of the training data for the deep neural network. 11 . The computing device according to claim 9 , wherein the second normalization layer is indirectly connected to the subsequent layer, while there is no separate normalization layer disposed between the subsequent layer and the second normalization layer. 12 . The computing device according to claim 9 , wherein the processor is further configured to determine the discretization interval associated with the input values of the subsequent layer by using the calculated clipping value and a number of bits of data used for inference in the deep neural network. 13 . The computing device according to claim 9 , wherein the processor is further configured to: select a maximum value from among values calculated based on the first scale factor and the second scale factor for each of the one or more channels associated with the first normalization layer and the one or more channels associated with the second normalization layer; and calculate the clipping value by using the selected maximum value and a preset value corresponding to a performance equal to or greater than a predetermined reference. 14 . The computing device according to claim 9 , wherein the output values of the first normalization layer and the output values of the second normalization layer have a normal distribution. 15 . The computing device according to claim 9 , wherein a number of bits of data used for training the deep neural network is greater than a number of bits of data used for inference of the deep neural network.
Architecture, e.g. interconnection topology · CPC title
Learning methods · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.