Neural network computation circuit, control circuit therefor, and control method therefor
US-2024411520-A1 · Dec 12, 2024 · US
US2019102673A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019102673-A1 |
| Application number | US-201715720298-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 29, 2017 |
| Priority date | Sep 29, 2017 |
| Publication date | Apr 4, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus relating to online activation compression with K-means are described. In one embodiment, logic (e.g., in a processor) compresses one or more activation functions for a convolutional network based on non-uniform quantization. The non-uniform quantization for each layer of the convolutional network is performed offline, and an activation function for a specific layer of the convolutional network is quantized during runtime. Other embodiments are also disclosed and claimed.
Opening claim text (preview).
1 . An apparatus comprising: logic, at least a portion of which is in hardware, to compress one or more activation functions for a convolutional network based on non-uniform quantization, wherein the non-uniform quantization for each layer of the convolutional network is to be performed offline, wherein an activation function for a specific layer of the convolutional network is to be quantized during runtime. 2 . The apparatus of claim 1 , further comprising memory to store an index corresponding to the quantized activation function during runtime. 3 . The apparatus of claim 2 , wherein the index is to be stored in in a lookup table. 4 . The apparatus of claim 1 , wherein compression of the one or more activation functions is to reduce memory bandwidth usage for processing information between layers of the convolutional network. 5 . The apparatus of claim 1 , wherein compression of the one or more activation functions is to reduce representation size for each of the one or more activation functions to 4 bits. 6 . The apparatus of claim 1 , wherein distribution of each layer of the convolutional network is to be determined offline. 7 . The apparatus of claim 1 , wherein the quantized activation function is to be decompressed during runtime. 8 . The apparatus of claim 1 , wherein the logic is to compress the one or more activation functions without retraining the convolutional network. 9 . The apparatus of claim 1 , wherein the convolutional network is to assist in image processing. 10 . The apparatus of claim 1 , wherein the convolutional network is to comprise a Convolutional Neural Network (CNN) or a Deep Convolutional Network (DCN). 11 . The apparatus of claim 1 , wherein a processor comprises the logic. 12 . The apparatus of claim 11 , wherein the processor comprises a Graphics Processing Unit (GPU) or a General-Purpose GPU (GPGPU), wherein the GPU or the GPGPU comprises one or more graphics processing cores. 13 . The apparatus of claim 11 , wherein the processor comprises one or more processor cores. 14 . The apparatus of claim 1 , wherein one or more of: a processor, the logic, and memory are on a single integrated circuit die. 15 . A method comprising: compressing one or more activation functions for a convolutional network based on non-uniform quantization, wherein the non-uniform quantization for each layer of the convolutional network is performed offline, wherein an activation function for a specific layer of the convolutional network is quantized during runtime. 16 . The method of claim 15 , further comprising storing an index corresponding to the quantized activation function in memory during runtime. 17 . The method of claim 16 , further comprising storing the index in in a lookup table. 18 . The method of claim 15 , further comprising reducing memory bandwidth usage for processing information between layers of the convolutional network in response to the compression of the one or more activation functions. 19 . The method of claim 15 , further comprising reducing representation size for each of the one or more activation functions to 4 bits in response to the compression of the one or more activation functions. 20 . The method of claim 15 , further comprising determining the distribution of each layer of the convolutional network offline. 21 . The method of claim 15 , further comprising decompressing the quantized activation function during runtime. 22 . The method of claim 15 , further comprising compressing the one or more activation functions without retraining the convolutional network. 23 . One or more computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: compress one or more activation functions for a convolutional network based on non-uniform quantization, wherein the non-uniform quantization for each layer of the convolutional network is performed offline, wherein an activation function for a specific layer of the convolutional network is quantized during runtime. 24 . The computer-readable medium of claim 23 , further comprising one or more instructions that when executed on the at least one processor configure the at least one processor to perform one or more operations to cause compressing of the one or more activation functions without retraining the convolutional network. 25 . The computer-readable medium of claim 23 , further comprising one or more instructions that when executed on the at least one processor configure the at least one processor to perform one or more operations to cause determination of the distribution of each layer of the convolutional network offline.
Related publications grouped by family.
Answers are generated from the same data shown on this page.