Decompression and compression of neural network data using different compression schemes

US11537853B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11537853-B1
Application numberUS-201916455258-A
CountryUS
Kind codeB1
Filing dateJun 27, 2019
Priority dateNov 28, 2018
Publication dateDec 27, 2022
Grant dateDec 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a neural network accelerator (NNA) with a decompression unit that can be configured to perform multiple types of decompression. The decompression may include a separate subunit for each decompression type. The subunits can be coupled to form a pipeline in which partially decompressed results generated by one subunit are input for further decompression by another subunit. Depending on which types of compression were applied to incoming data, any number of the subunits may be used to produce a decompressed output. In some embodiments, the decompression unit is configured to decompress data that has been compressed using a zero value compression scheme, a shared value compression scheme, or both. The NNA can also include a compression unit implemented in a manner similar to that of the decompression unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system, comprising: a first memory storing compressed data, wherein the compressed data corresponds to original data that was compressed by applying a zero value compression scheme in combination with a shared value compression scheme, the original data comprising a first data value, a second data value, and a third data value, wherein the first data value, the second data value, and the third data value each represent a different weight of a neural network, wherein the zero value compression scheme involves removal of the first data value based on the first data value being within a first range around zero, and wherein the shared value compression scheme involves: assigning the second data value and the third data value to a first cluster in a set of clusters based on proximity of the second data value and the third data value to a value associated with the first cluster, substituting a binary-encoded value for the second data value, the binary-encoded value being an index to the first cluster, and substituting the binary-encoded value for the third data value; a second memory; a data move engine configured to generate decompressed data by processing the compressed data through a first decompression unit and a second decompression unit, wherein the first decompression unit is configured to introduce a zero value in place of the first data value by: identifying a position of the first data value within the original data, the position of the first data value being indicated by a compression map that contains information on a position of any data value that was removed from the original data as a result of zero value compression, and inserting, into the compressed data, a zero at the position of the first data value, wherein the second decompression unit is configured to: identify the value associated with the first cluster by referencing a lookup table that maps the index to the value associated with the first cluster, and replace each instance of the binary-encoded value within the compressed data with the value associated with the first cluster, and wherein the data move engine is further configured to store the decompressed data into the second memory; and a processing unit that performs a computation using the decompressed data stored in the second memory. 2. The computing system of claim 1 , wherein: the computation using the decompressed data stored in the second memory produces activation values for the neural network, the activation values including a first activation value, a second activation value, and a third activation value; the data move engine is configured to generate additional compressed data by processing the activation values through a first compression unit and a second compression unit; processing of the activation values by the first compression unit comprises removing the first activation value based on determining that the first activation value is within a second range around zero; and processing of the activation values by the second compression unit comprises: assigning the second activation value and the third activation value to a second cluster in a second set of clusters based on proximity of the second activation value and the third activation value to a value associated with the second cluster, wherein the second set of clusters is different from the first set of clusters; substituting a second binary-encoded value for the second activation value, the second binary-encoded value being an index to the second cluster; and substituting the second binary-encoded value for the third activation value. 3. The computing system of claim 1 , wherein the first decompression unit receives additional compressed data, the additional compressed data corresponding to data compressed by substituting a cluster index for at least one data value, but without removal of any data values within a range around zero, wherein the first decompression unit is directly coupled to the second decompression unit, and wherein the first decompression unit passes the additional compressed data to the second decompression unit without performing any decompression processing on the additional compressed data. 4. A computing system, comprising: a host processor configured to generate first compressed data by compressing first data according to a first compression scheme; and a neural network processor configured to execute a neural network, the neural network processor comprising: a memory; a processing unit; a first decompression unit operable to perform decompression in accordance with the first compression scheme, wherein the first decompression unit is configured to: receive first compression information from the host processor or as part of the first compressed data; and generate first decompressed data by decompressing the first compressed data using the first compression information; and a second decompression unit configured to: receive the first decompressed data from the first decompression unit; and send the first decompressed data to the memory or the processing unit; wherein the memory is configured to: store the first compressed data prior to decompression of the first compressed data by the first decompression unit; or store the first decompressed data; and wherein the processing unit generates inferences using the first decompressed data. 5. The computing system of claim 4 , wherein to generate the first compressed data, the host processor: determines a first range around zero, wherein removal of values from the first data that fall within the first range results in a threshold amount of compression or a threshold level of inference accuracy; and removes a first value from the first data, wherein the first value is within the first range. 6. The computing system of claim 4 , wherein the first compression information includes a compression map that indicates a position, within the first data, of a first value that is not included in the first compressed data, and wherein to generate the first decompressed data, the first decompression unit: identifies the position of the first value based on the compression map; and replaces the first value with a zero. 7. The computing system of claim 4 , wherein the first compression information includes a binary bit-map, the binary bit-map comprising a plurality of bits, each bit of the plurality of bits representing a different position within the first data, and wherein the plurality of bits includes a first bit indicating that the first compressed data was generated by removing, from the first data, a value at a position represented by the first bit. 8. The computing system of claim 4 , wherein to generate the first compressed data, the host processor: assigns a first value from the first data and a second value from the first data to a first cluster in a group of clusters based on proximity of the first value and the second value to a third value associated with the first cluster, the proximity being indicated by a first difference between the first value and the third value and a second difference between the second value and the third value, wherein each cluster in the group of clusters represents a different value, and wherein the first value and the second value differ from each other in addition to differing from the third value; substitutes an index of the first cluster for the first value, wherein the index identifies the first cluster; and substitutes the index of the first cluster for the second value. 9. The computing system of claim 8 , wherein the host processor assigns the first value and the second value to the first cluster by minimizing a cost function, wherein the co

Assignees

Inventors

Classifications

  • Inference or reasoning models · CPC title

  • for shifting, e.g. justifying, scaling, normalising {(digital stores in which the information is moved stepwise, e.g. shift-registers G11C19/00; digital stores in which the information circulates G11C21/00)} · CPC title

  • based on threshold decision · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • Activation functions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11537853B1 cover?
Described herein is a neural network accelerator (NNA) with a decompression unit that can be configured to perform multiple types of decompression. The decompression may include a separate subunit for each decompression type. The subunits can be coupled to form a pipeline in which partially decompressed results generated by one subunit are input for further decompression by another subunit. Dep…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/05. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).