Method and apparatus with neural network parameter quantization
US-11531893-B2 · Dec 20, 2022 · US
US12468946B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12468946-B2 |
| Application number | US-202217987079-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 15, 2022 |
| Priority date | Jun 3, 2019 |
| Publication date | Nov 11, 2025 |
| Grant date | Nov 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor-implemented method includes determining a first quantization value by performing log quantization on a parameter from one of input activation values and weight values in a layer of a neural network, comparing a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter, determining a second quantization value by performing log quantization on the error in response to the error being greater than the threshold value as a result of the comparing; and quantizing the parameter to a value in which the first quantization value and the second quantization value are grouped.
Opening claim text (preview).
What is claimed is: 1 . A processor-implemented method, the method comprising: determining a first quantization value by performing log quantization on a parameter processed in a layer of a neural network; comparing a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter; and quantizing the parameter into two or more quantization values including the first quantization value based on the result of the comparing to avoid degradation of the neural network. 2 . The method of claim 1 , wherein the determining of the first quantization value comprises: determining the first quantization value by performing log quantization on a value corresponding to a quantization level closest to the parameter, from among a plurality of quantization levels. 3 . The method of claim 1 , wherein the quantizing of the parameter comprises: determining a second quantization value by performing log quantization on the error in response to the error being greater than the threshold value as a result of the comparing; and quantizing the parameter to a value in which the first quantization value and the second quantization value are grouped. 4 . The method of claim 3 , wherein the determining of the second quantization value comprises: determining the second quantization value by performing log quantization on a value corresponding to a quantization level closest to the error, from among the plurality of quantization levels. 5 . The method of claim 3 , wherein the second quantization value is represented by a same number of bits as a number of bits representing the first quantization value. 6 . The method of claim 3 , wherein the quantizing comprises: adding a tag bit to each of the first quantization value and the second quantization value. 7 . The method of claim 6 , wherein the adding comprises: adding a first tag bit, indicating that there is the second quantization value subsequent to the first quantization value, before a first bit of bits representing the first quantization value or after a last bit of the bits; and adding a second tag bit, indicating that there is no quantization value subsequent to the second quantization value, before a first bit of bits representing the second quantization value or after a last bit of the bits. 8 . The method of claim 3 , wherein the quantizing comprises: adding a code value, indicating that the first quantization value and the second quantization value are consecutive values, before a first bit of bits representing the first quantization value or after a last bit of bits representing the second quantization value. 9 . The method of claim 3 , further comprising: dequantizing the value in which the first quantization value and the second quantization value are grouped; and performing a convolution operation between a dequantization value obtained by dequantizing the value and input activation values. 10 . The method of claim 9 , wherein the dequantizing of the value comprises: calculating each of a first dequantization value, which is a value obtained by dequantization of the first quantization value, and a second dequantization value, which is a value obtained by dequantization of the second quantization value; and obtaining the dequantization value by adding the first dequantization value and the second dequantization value. 11 . The method of claim 1 , wherein the threshold value is determined based on a predetermined trade-off relationship between a recognition rate of the neural network and a size of data according to the quantization of the parameter. 12 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 . 13 . An apparatus, the apparatus comprising: one or more processors configured to: determine a first quantization value by performing log quantization on a parameter processed in a layer of a neural network; compare a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter; and quantize the parameter into two or more quantization values including the first quantization value based on the result of the comparing to avoid degradation of the neural network.
Architecture, e.g. interconnection topology · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.