Neural network method and apparatus

US11250320B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11250320-B2
Application numberUS-201815880690-A
CountryUS
Kind codeB2
Filing dateJan 26, 2018
Priority dateMay 25, 2017
Publication dateFeb 15, 2022
Grant dateFeb 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a neural network method and an apparatus, the method including obtaining a set of floating point data processed in a layer included in a neural network, determining a weighted entropy based on data values included in the set of floating point data, adjusting quantization levels assigned to the data values based on the weighted entropy, and quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor-implemented neural network method, the method comprising: obtaining a set of floating point data processed in a layer included in a neural network; determining a weighted entropy based on data values included in the set of floating point data; adjusting quantization levels assigned to the data values based on the weighted entropy; quantizing the data values included in the set of floating point data in accordance with the adjusted quantization levels; implementing the neural network using the quantized data values and based on input data provided to the neural network; and indicating a result of the implementation, wherein, the set of floating point data includes a set of weights, and the determining of the weighted entropy comprises: grouping the set of weights into a plurality of clusters; determining respective relative frequencies for each of the grouped clusters by dividing a total number of weights included in each of the respective grouped clusters by a total number of weights included in the set of weights; determining respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters; and determining the weighted entropy based on the respective relative frequencies and the respective representative importances, wherein the respective representative importances are average values of importances corresponding to the weights included in each of the grouped clusters, each of the importances being quadratically proportional to the size of corresponding weight, and wherein, the set of floating point data includes a set of activations, activation quantization levels assigned, using an entropy-based logarithm data representation-based quantization method, to data values corresponding to the set of activations are adjusted based on an activation weighted entropy, and the data values corresponding to the set of activations are quantized in accordance with the adjusted activation quantization levels. 2. The method of claim 1 , wherein the determining of the weighted entropy includes applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data. 3. The method of claim 1 , wherein the quantizing comprises: determining respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters; and quantizing the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters. 4. The method of claim 1 , wherein the adjusting comprises adjusting the quantization levels assigned to the data values by adjusting boundaries of each of the clusters in a direction that increases the weighted entropy. 5. The method of claim 1 , wherein the determining of the activation weighted entropy comprises: determining respective relative activation frequencies for each of the activation quantization levels by dividing a total number of activations included in each of the respective activation quantization levels by a total number of activations included in the set of activations; determining respective activation data values corresponding to each of the activation quantization levels as respective representative activation importances of each of the activation quantization levels; and determining the activation weighted entropy based on the respective relative activation frequencies and the respective representative activation importances. 6. The method of claim 5 , wherein the adjusting of the activation quantization levels comprises adjusting the activation quantization levels assigned to the respective activation data values by adjusting a value corresponding to a first activation quantization level among the activation quantization levels and a size of an interval between the activation quantization levels in a direction of increasing the activation weighted entropy. 7. The method of claim 5 , wherein the adjusting of the activation quantization levels comprises adjusting a log base, which is controlling of the activation quantization levels, in a direction that maximizes the activation weighted entropy. 8. The method of claim 1 , wherein, the obtaining, determining, adjusting, and quantizing are performed with respect to each of a plurality of layers included in the neural network, with respective adjusted quantization levels being optimized and assigned for each of the plurality of layers. 9. The method of claim 1 , wherein the implementing of the neural network comprises training the neural network based on the quantized data values. 10. A non-transitory computer-readable medium storing instructions, which when executed by a processor, cause the processor to implement the method of claim 1 . 11. A neural network apparatus, the apparatus comprising: a processor configured to: obtain a set of floating point data processed in a layer included in a neural network; determine a weighted entropy based on data values included in the set of floating point data; adjust quantization levels assigned to the data values based on the weighted entropy; quantize the data values included in the set of floating point data in accordance with the adjusted quantization levels; implement the neural network using the quantized data values and based on input data provided to the neural network; and indicate a result of the implementation, wherein, the set of floating point data includes a set of weights, and the processor is further configured to: group the set of weights into a plurality of clusters; determine respective relative frequencies for each of the grouped clusters by dividing a total number of weights included in each of the respective grouped clusters by a total number of weights included in the set of weights; determine respective representative importances of each of the grouped clusters based on sizes of weights included in each of the grouped clusters; and determine the weighted entropy based on the respective relative frequencies and the respective representative importances, wherein the respective representative importances are average values of importances corresponding to the weights included in each of the grouped clusters, each of the importances being quadratically proportional to the size of corresponding weight, and wherein, the set of floating point data includes a set of activations, activation quantization levels assigned, using an entropy-based logarithm data representation-based quantization method, to data values corresponding to the set of activations are adjusted based on an activation weighted entropy, and the data values corresponding to the set of activations are quantized in accordance with the adjusted activation quantization levels. 12. The apparatus of claim 11 , wherein the determining of the weighted entropy includes applying a weighting factor based on determined sizes of the data values to a determined distribution of the data values included in the set of floating point data. 13. The apparatus of claim 11 , wherein the processor is further configured to: determine respective weights corresponding to the respective representative importances of each of the grouped clusters as a corresponding representative weight for each of the grouped clusters; and quantize the weights included in each of the grouped clusters respectively into the corresponding representative weight for each of the grouped clusters. 14. T

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Combinations of networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • G06N3/0495Primary

    Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11250320B2 cover?
Provided are a neural network method and an apparatus, the method including obtaining a set of floating point data processed in a layer included in a neural network, determining a weighted entropy based on data values included in the set of floating point data, adjusting quantization levels assigned to the data values based on the weighted entropy, and quantizing the data values included in the…
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Seoul Nat Univ R&Db Foundation
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).