Online activation compression with k-means

US2019102673A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019102673-A1
Application numberUS-201715720298-A
CountryUS
Kind codeA1
Filing dateSep 29, 2017
Priority dateSep 29, 2017
Publication dateApr 4, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus relating to online activation compression with K-means are described. In one embodiment, logic (e.g., in a processor) compresses one or more activation functions for a convolutional network based on non-uniform quantization. The non-uniform quantization for each layer of the convolutional network is performed offline, and an activation function for a specific layer of the convolutional network is quantized during runtime. Other embodiments are also disclosed and claimed.

First claim

Opening claim text (preview).

1 . An apparatus comprising: logic, at least a portion of which is in hardware, to compress one or more activation functions for a convolutional network based on non-uniform quantization, wherein the non-uniform quantization for each layer of the convolutional network is to be performed offline, wherein an activation function for a specific layer of the convolutional network is to be quantized during runtime. 2 . The apparatus of claim 1 , further comprising memory to store an index corresponding to the quantized activation function during runtime. 3 . The apparatus of claim 2 , wherein the index is to be stored in in a lookup table. 4 . The apparatus of claim 1 , wherein compression of the one or more activation functions is to reduce memory bandwidth usage for processing information between layers of the convolutional network. 5 . The apparatus of claim 1 , wherein compression of the one or more activation functions is to reduce representation size for each of the one or more activation functions to 4 bits. 6 . The apparatus of claim 1 , wherein distribution of each layer of the convolutional network is to be determined offline. 7 . The apparatus of claim 1 , wherein the quantized activation function is to be decompressed during runtime. 8 . The apparatus of claim 1 , wherein the logic is to compress the one or more activation functions without retraining the convolutional network. 9 . The apparatus of claim 1 , wherein the convolutional network is to assist in image processing. 10 . The apparatus of claim 1 , wherein the convolutional network is to comprise a Convolutional Neural Network (CNN) or a Deep Convolutional Network (DCN). 11 . The apparatus of claim 1 , wherein a processor comprises the logic. 12 . The apparatus of claim 11 , wherein the processor comprises a Graphics Processing Unit (GPU) or a General-Purpose GPU (GPGPU), wherein the GPU or the GPGPU comprises one or more graphics processing cores. 13 . The apparatus of claim 11 , wherein the processor comprises one or more processor cores. 14 . The apparatus of claim 1 , wherein one or more of: a processor, the logic, and memory are on a single integrated circuit die. 15 . A method comprising: compressing one or more activation functions for a convolutional network based on non-uniform quantization, wherein the non-uniform quantization for each layer of the convolutional network is performed offline, wherein an activation function for a specific layer of the convolutional network is quantized during runtime. 16 . The method of claim 15 , further comprising storing an index corresponding to the quantized activation function in memory during runtime. 17 . The method of claim 16 , further comprising storing the index in in a lookup table. 18 . The method of claim 15 , further comprising reducing memory bandwidth usage for processing information between layers of the convolutional network in response to the compression of the one or more activation functions. 19 . The method of claim 15 , further comprising reducing representation size for each of the one or more activation functions to 4 bits in response to the compression of the one or more activation functions. 20 . The method of claim 15 , further comprising determining the distribution of each layer of the convolutional network offline. 21 . The method of claim 15 , further comprising decompressing the quantized activation function during runtime. 22 . The method of claim 15 , further comprising compressing the one or more activation functions without retraining the convolutional network. 23 . One or more computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: compress one or more activation functions for a convolutional network based on non-uniform quantization, wherein the non-uniform quantization for each layer of the convolutional network is performed offline, wherein an activation function for a specific layer of the convolutional network is quantized during runtime. 24 . The computer-readable medium of claim 23 , further comprising one or more instructions that when executed on the at least one processor configure the at least one processor to perform one or more operations to cause compressing of the one or more activation functions without retraining the convolutional network. 25 . The computer-readable medium of claim 23 , further comprising one or more instructions that when executed on the at least one processor configure the at least one processor to perform one or more operations to cause determination of the distribution of each layer of the convolutional network offline.

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Activation functions · CPC title

  • Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019102673A1 cover?
Methods and apparatus relating to online activation compression with K-means are described. In one embodiment, logic (e.g., in a processor) compresses one or more activation functions for a convolutional network based on non-uniform quantization. The non-uniform quantization for each layer of the convolutional network is performed offline, and an activation function for a specific layer of the …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).