Method and apparatus with neural network parameter quantization

US12468946B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12468946-B2
Application numberUS-202217987079-A
CountryUS
Kind codeB2
Filing dateNov 15, 2022
Priority dateJun 3, 2019
Publication dateNov 11, 2025
Grant dateNov 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor-implemented method includes determining a first quantization value by performing log quantization on a parameter from one of input activation values and weight values in a layer of a neural network, comparing a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter, determining a second quantization value by performing log quantization on the error in response to the error being greater than the threshold value as a result of the comparing; and quantizing the parameter to a value in which the first quantization value and the second quantization value are grouped.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor-implemented method, the method comprising: determining a first quantization value by performing log quantization on a parameter processed in a layer of a neural network; comparing a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter; and quantizing the parameter into two or more quantization values including the first quantization value based on the result of the comparing to avoid degradation of the neural network. 2 . The method of claim 1 , wherein the determining of the first quantization value comprises: determining the first quantization value by performing log quantization on a value corresponding to a quantization level closest to the parameter, from among a plurality of quantization levels. 3 . The method of claim 1 , wherein the quantizing of the parameter comprises: determining a second quantization value by performing log quantization on the error in response to the error being greater than the threshold value as a result of the comparing; and quantizing the parameter to a value in which the first quantization value and the second quantization value are grouped. 4 . The method of claim 3 , wherein the determining of the second quantization value comprises: determining the second quantization value by performing log quantization on a value corresponding to a quantization level closest to the error, from among the plurality of quantization levels. 5 . The method of claim 3 , wherein the second quantization value is represented by a same number of bits as a number of bits representing the first quantization value. 6 . The method of claim 3 , wherein the quantizing comprises: adding a tag bit to each of the first quantization value and the second quantization value. 7 . The method of claim 6 , wherein the adding comprises: adding a first tag bit, indicating that there is the second quantization value subsequent to the first quantization value, before a first bit of bits representing the first quantization value or after a last bit of the bits; and adding a second tag bit, indicating that there is no quantization value subsequent to the second quantization value, before a first bit of bits representing the second quantization value or after a last bit of the bits. 8 . The method of claim 3 , wherein the quantizing comprises: adding a code value, indicating that the first quantization value and the second quantization value are consecutive values, before a first bit of bits representing the first quantization value or after a last bit of bits representing the second quantization value. 9 . The method of claim 3 , further comprising: dequantizing the value in which the first quantization value and the second quantization value are grouped; and performing a convolution operation between a dequantization value obtained by dequantizing the value and input activation values. 10 . The method of claim 9 , wherein the dequantizing of the value comprises: calculating each of a first dequantization value, which is a value obtained by dequantization of the first quantization value, and a second dequantization value, which is a value obtained by dequantization of the second quantization value; and obtaining the dequantization value by adding the first dequantization value and the second dequantization value. 11 . The method of claim 1 , wherein the threshold value is determined based on a predetermined trade-off relationship between a recognition rate of the neural network and a size of data according to the quantization of the parameter. 12 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 . 13 . An apparatus, the apparatus comprising: one or more processors configured to: determine a first quantization value by performing log quantization on a parameter processed in a layer of a neural network; compare a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter; and quantize the parameter into two or more quantization values including the first quantization value based on the result of the comparing to avoid degradation of the neural network.

Assignees

Inventors

Classifications

  • Architecture, e.g. interconnection topology · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12468946B2 cover?
A processor-implemented method includes determining a first quantization value by performing log quantization on a parameter from one of input activation values and weight values in a layer of a neural network, comparing a threshold value with an error between a first dequantization value obtained by dequantization of the first quantization value and the parameter, determining a second quantiza…
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Ulsan Nat Inst Science & Tech Unist
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).