Neural network method and apparatus

US12511535B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12511535-B2
Application numberUS-202117551572-A
CountryUS
Kind codeB2
Filing dateDec 15, 2021
Priority dateMay 25, 2017
Publication dateDec 30, 2025
Grant dateDec 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a neural network method and an apparatus, the method including obtaining a set of floating point data processed in a layer included in a neural network, determining a weighted entropy based on data values included in the set of floating point data, adjusting quantization levels assigned to the data values based on the weighted entropy, and quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor-implemented neural network method, the method comprising: obtaining a set of weights and a set of activations from a set of floating point data processed in a layer included in a neural network; quantizing the set of weights using a clustering based quantization method; quantizing the set of activations using a logarithm data representation-based quantization method, which is different from the clustering based quantization method, in consideration that, while the set of weights is fixed after training the neural network, the set of activations varies in accordance with input data in an inference process implementing the neural network, with the quantizing the set of activations comprising, using one or more processors: repeating, for a plurality of iterations, for a current iteration among the plurality of iterations, quantizing the set of activations into a corresponding plurality of log-based quantization levels based on a corresponding one or more parameters that control a corresponding first size of a first value corresponding to a first quantization level among the corresponding plurality of log-based quantization levels and a corresponding first interval size between the corresponding plurality of log-based quantization levels, for the current iteration, determining the activation weighted entropy for the corresponding plurality of log-based quantization levels of the current iteration by applying corresponding determined importance weights for the corresponding plurality of log-based quantization levels of the current iteration to relative, compared to a total number of activations in the set of activations, activation frequencies of each of the corresponding plurality of log-based quantization levels of the current iteration, and setting the corresponding one or more parameters of a next iteration among the plurality of iterations, while the determined activation weighted entropy of the current iteration is not determined to be maximized and the current iteration is not a final iteration among the plurality of iterations; quantizing the set of activations into a final plurality of log-based quantization levels that are based on the set corresponding one or more parameters of one of the plurality of iterations that has a determined maximized activation weighted entropy; executing, based on input data provided to the neural network, the neural network using the set of activations that have been quantized into the final plurality of log-based quantization levels; and indicating a result of the executing of the neural network. 2 . The method of claim 1 , wherein, the obtaining of the set of activations, the repeating, the quantizing of the set of activations into the final plurality of log-based quantization levels, the executing of the neural network, and the indicating are performed for each of a plurality of layers included in the neural network. 3 . The method of claim 1 , further comprising training the layer of the neural network based on the set of activations that have been quantized into the final plurality of log-based quantization levels, wherein the executing of the neural network and the indicating of the result are operations of the training of the layer of the neural network. 4 . The method of claim 1 , wherein the floating point data includes weight quantization levels, assigned for the set of weights, that are adjusted based on a weight weighted entropy, and wherein the quantizing of the set of weights using the clustering based quantization method is performed in accordance with the adjusted weight quantization levels. 5 . The method of claim 1 , wherein the corresponding determined importance weights of each of the plurality of iterations are based on respective set importances of the each of the activations. 6 . A neural network apparatus, the apparatus comprising: one or more processors; and one or more memories comprising code, which when executed by the one or more processors configures the one or more processors to: obtain a set of weights and a set of activations from a set of floating point data processed in a layer included in a neural network; quantize the set of weights using a clustering based quantization method; quantize the set of activations using a logarithm data representation-based quantization method, which is different from the clustering based quantization method, in consideration that, while the set of weights is fixed after training the neural network, the set of activations varies in accordance with input data in an inference process implementing the neural network, with the quantization of the set of activations comprising a repetition, for a plurality of iterations, for a current iteration among the plurality of iterations, a quantization of the set of activations into a corresponding plurality of log-based quantization levels based on a corresponding one or more parameters that control a corresponding first size of a first value corresponding to a first quantization level among the corresponding plurality of log-based quantization levels and a corresponding first interval size between the corresponding plurality of log-based quantization levels, for the current iteration, a determination of the activation weighted entropy for the corresponding plurality of log-based quantization levels of the current iteration through an application of corresponding determined importance weights for the corresponding plurality of log-based quantization levels of the current iteration to relative, compared to a total number of activations in the set of activations, activation frequencies of each of the corresponding plurality of log-based quantization levels of the current iteration, and set the corresponding one or more parameters of a next iteration among the plurality of iterations, while the determined activation weighted entropy of the current iteration is not determined to be maximized and the current iteration is not a final iteration among the plurality of iterations; quantize the quantized set of activations into a final plurality of log-based quantization levels that are based on the set corresponding one or more parameters of one of the plurality of iterations that has a determined maximized activation weighted entropy; execute, based on input data provided to the neural network, the neural network using the set of activations that have been quantized into the final plurality of log-based quantization levels; and indicate a result of the execution of the neural network. 7 . The apparatus of claim 6 , wherein the execution of the code configures the one or more processors to perform the obtaining of the set of activations, the repeating, the quantizing of the set of activations into the final plurality of log-based quantization levels, the executing of the neural network, and the indicating for each of a plurality of layers included in the neural network. 8 . The apparatus of claim 6 , wherein the floating point data includes weight quantization levels, assigned for the set of weights, that are adjusted based on a weight weighted entropy, and wherein the quantizing of the set of weights using the clustering based quantization method is performed in accordance with the adjusted weight quantization levels. 9 . The apparatus of claim 6 , wherein the corresponding determined importance weights of each of the plurality of iterations are based on respective set importances of the each of the activations. 10 . The apparatus of claim 6 , wherein the execution of the code configures the one or more processors to train the layer of the neural network based on the set of activations quantized into the corresponding

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Combinations of networks · CPC title

  • G06N3/0495Primary

    Quantised networks; Sparse networks; Compressed networks · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12511535B2 cover?
Provided are a neural network method and an apparatus, the method including obtaining a set of floating point data processed in a layer included in a neural network, determining a weighted entropy based on data values included in the set of floating point data, adjusting quantization levels assigned to the data values based on the weighted entropy, and quantizing the data values included in the…
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Seoul Nat Univ R&Db Foundation
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).