Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism
US-2019205746-A1 · Jul 4, 2019 · US
US11562247B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11562247-B2 |
| Application number | US-201916256998-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 24, 2019 |
| Priority date | Jan 24, 2019 |
| Publication date | Jan 24, 2023 |
| Grant date | Jan 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
Opening claim text (preview).
What is claimed is: 1. A computing system comprising: one or more processors; bulk memory comprising computer-readable storage devices and/or memory; a floating-point compressor formed from at least one of the processors, the floating-point compressor being in communication with the bulk memory; and the computing system being configured to: with at least one of the processors, perform forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format having a normal mantissa format; with the floating-point compressor, convert at least one of the activation values to a second floating-point format to produce compressed activation values by mapping activation value mantissas to a non-uniform mantissa format; and with at least one of the processors, storing the compressed activation values in the bulk memory. 2. The computing system of claim 1 , wherein the second floating-point format has a lower-precision mantissa than the first floating-point format. 3. The computing system of claim 1 , wherein the mapping comprises: mapping mantissas having two or more mantissa values in the normal mantissa format to a single mantissa value in the non-uniform mantissa format. 4. The computing system of claim 1 , wherein the mapping comprises: mapping first mantissas having one or more mantissa values in the normal mantissa format to a single mantissa value in the non-uniform mantissa format; and mapping second mantissas having at least one more mantissa values in the normal mantissa format to a single mantissa value in the non-uniform mantissa format. 5. The computing system of claim 1 , wherein the compressor is further configured to further compress the compressed activation values prior to the storing by performing at least one or more of the following: entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 6. The computing system of claim 1 , wherein the computing system is further configured to: perform backward propagation for a layer of the neural network by converting the stored, compressed activation values to activation values in the first floating-point format to produce uncompressed activation values; and perform a gradient operation with the uncompressed activation values. 7. The computing system of claim 1 , wherein the layer is a first layer, the compressed activation values are first compressed activation values, the non-uniform mantissa format is a first non-uniform mantissa format, and wherein the computing system is further configured to: with at least one of the processors, perform forward propagation for a different, second layer of a neural network to produce second activation values in the first floating-point format; with the floating-point compressor, for at least one of the second activation values, convert the at least one of the second activation values to a third floating-point format to produce second compressed activation values, the third floating-point format having a activation value mantissas in a second non-uniform mantissa format different than the first non-uniform mantissa format; and with at least one of the processors, storing the second compressed activation values in the bulk memory. 8. The computing system of claim 1 , wherein: the processors comprise at least one of the following: a tensor processing unit, a neural network accelerator, a graphics processing unit, or a processor implemented in a reconfigurable logic array; and the bulk memory is situated on a different integrated circuit than the processors. 9. The computing system of claim 1 , wherein the bulk memory includes dynamic random access memory (DRAM) or embedded DRAM and the system further comprises a hardware accelerator including a memory temporarily storing the first activation values for at least a portion of only one layer of the neural network, the hardware accelerator memory including static RAM (SRAM) or a register file. 10. A method of operating a computing system implementing a neural network, the method comprising: with the computing system: forward propagating a layer of the neural network to generate activation values in a first floating-point format; converting at least one of the activation values to a second, block floating-point format having non-uniform mantissas, generating compressed activation values; and storing the compressed activation values in a computer-readable memory or storage device. 11. The method of claim 10 , wherein the second block floating-point format has one of the following mantissa formats: lite lossy format, normal lossy format, or aggressive lossy format. 12. The method of claim 10 , wherein the second block floating-point format has a lite lossy mantissa format, the lite lossy mantissa format comprising: a one-to-one mapping for a selected lowest value mantissa in the first floating-point format; a one-to-one mapping for a selected highest value mantissa in the first floating-point format; and a two or more-to-one mapping for at least two other mantissa values in the first floating-point format. 13. The method of claim 10 , wherein the second block floating-point format has an aggressive lossy mantissa format, the aggressive lossy mantissa format comprising: a one-to-one mapping for a selected lowest value mantissa in the first floating-point format; and a mapping for all other mantissa values besides the selected lowest value mantissa in the first floating-point format. 14. The method of claim 10 , further comprising: prior to the storing, further compressing the compressed activation values stored in the computer-readable memory or storage device by one or more of the following techniques: entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 15. The method of claim 10 , wherein the second block floating-point format has a two or more-to-one mapping for at least two mantissa values in the first floating-point format, the method further comprising: with the computing system, dequantizing the compressed activation values by converting at least one mantissa of the compressed activation values to an average value based on the at least two mantissa values. 16. The method of claim 10 , wherein the second block floating-point format has a two or more-to-one mapping for at least two mantissa values in the first floating-point format, the method further comprising: with the computing system, dequantizing the compressed activation values by converting at least one mantissa of the compressed activation values to a randomly-selected value of the at least two mantissa values. 17. The method of claim 10 , further comprising: with the computing system, performing backward propagation for a layer of the neural network by converting the stored, compressed activation values to activation values in the first floating-point format to uncompressed activation values; and with the computing system, performing a gradient operation with the uncompressed activation values; and with the computing system, updating weights for at least one node of the neural network based on the uncompressed activation values. 18. A computer system comprising: at least one processor; a bulk memory or storage device; and the computing system being configured to, with the at least one processor and memory: implement a first layer of neural network using first weights and
Software · CPC title
Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title
Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression · CPC title
Implementation provisions of register files, e.g. ports · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.