Neural network activation compression with non-uniform mantissas

US11562247B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11562247-B2
Application numberUS-201916256998-A
CountryUS
Kind codeB2
Filing dateJan 24, 2019
Priority dateJan 24, 2019
Publication dateJan 24, 2023
Grant dateJan 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: one or more processors; bulk memory comprising computer-readable storage devices and/or memory; a floating-point compressor formed from at least one of the processors, the floating-point compressor being in communication with the bulk memory; and the computing system being configured to: with at least one of the processors, perform forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format having a normal mantissa format; with the floating-point compressor, convert at least one of the activation values to a second floating-point format to produce compressed activation values by mapping activation value mantissas to a non-uniform mantissa format; and with at least one of the processors, storing the compressed activation values in the bulk memory. 2. The computing system of claim 1 , wherein the second floating-point format has a lower-precision mantissa than the first floating-point format. 3. The computing system of claim 1 , wherein the mapping comprises: mapping mantissas having two or more mantissa values in the normal mantissa format to a single mantissa value in the non-uniform mantissa format. 4. The computing system of claim 1 , wherein the mapping comprises: mapping first mantissas having one or more mantissa values in the normal mantissa format to a single mantissa value in the non-uniform mantissa format; and mapping second mantissas having at least one more mantissa values in the normal mantissa format to a single mantissa value in the non-uniform mantissa format. 5. The computing system of claim 1 , wherein the compressor is further configured to further compress the compressed activation values prior to the storing by performing at least one or more of the following: entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 6. The computing system of claim 1 , wherein the computing system is further configured to: perform backward propagation for a layer of the neural network by converting the stored, compressed activation values to activation values in the first floating-point format to produce uncompressed activation values; and perform a gradient operation with the uncompressed activation values. 7. The computing system of claim 1 , wherein the layer is a first layer, the compressed activation values are first compressed activation values, the non-uniform mantissa format is a first non-uniform mantissa format, and wherein the computing system is further configured to: with at least one of the processors, perform forward propagation for a different, second layer of a neural network to produce second activation values in the first floating-point format; with the floating-point compressor, for at least one of the second activation values, convert the at least one of the second activation values to a third floating-point format to produce second compressed activation values, the third floating-point format having a activation value mantissas in a second non-uniform mantissa format different than the first non-uniform mantissa format; and with at least one of the processors, storing the second compressed activation values in the bulk memory. 8. The computing system of claim 1 , wherein: the processors comprise at least one of the following: a tensor processing unit, a neural network accelerator, a graphics processing unit, or a processor implemented in a reconfigurable logic array; and the bulk memory is situated on a different integrated circuit than the processors. 9. The computing system of claim 1 , wherein the bulk memory includes dynamic random access memory (DRAM) or embedded DRAM and the system further comprises a hardware accelerator including a memory temporarily storing the first activation values for at least a portion of only one layer of the neural network, the hardware accelerator memory including static RAM (SRAM) or a register file. 10. A method of operating a computing system implementing a neural network, the method comprising: with the computing system: forward propagating a layer of the neural network to generate activation values in a first floating-point format; converting at least one of the activation values to a second, block floating-point format having non-uniform mantissas, generating compressed activation values; and storing the compressed activation values in a computer-readable memory or storage device. 11. The method of claim 10 , wherein the second block floating-point format has one of the following mantissa formats: lite lossy format, normal lossy format, or aggressive lossy format. 12. The method of claim 10 , wherein the second block floating-point format has a lite lossy mantissa format, the lite lossy mantissa format comprising: a one-to-one mapping for a selected lowest value mantissa in the first floating-point format; a one-to-one mapping for a selected highest value mantissa in the first floating-point format; and a two or more-to-one mapping for at least two other mantissa values in the first floating-point format. 13. The method of claim 10 , wherein the second block floating-point format has an aggressive lossy mantissa format, the aggressive lossy mantissa format comprising: a one-to-one mapping for a selected lowest value mantissa in the first floating-point format; and a mapping for all other mantissa values besides the selected lowest value mantissa in the first floating-point format. 14. The method of claim 10 , further comprising: prior to the storing, further compressing the compressed activation values stored in the computer-readable memory or storage device by one or more of the following techniques: entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 15. The method of claim 10 , wherein the second block floating-point format has a two or more-to-one mapping for at least two mantissa values in the first floating-point format, the method further comprising: with the computing system, dequantizing the compressed activation values by converting at least one mantissa of the compressed activation values to an average value based on the at least two mantissa values. 16. The method of claim 10 , wherein the second block floating-point format has a two or more-to-one mapping for at least two mantissa values in the first floating-point format, the method further comprising: with the computing system, dequantizing the compressed activation values by converting at least one mantissa of the compressed activation values to a randomly-selected value of the at least two mantissa values. 17. The method of claim 10 , further comprising: with the computing system, performing backward propagation for a layer of the neural network by converting the stored, compressed activation values to activation values in the first floating-point format to uncompressed activation values; and with the computing system, performing a gradient operation with the uncompressed activation values; and with the computing system, updating weights for at least one node of the neural network based on the uncompressed activation values. 18. A computer system comprising: at least one processor; a bulk memory or storage device; and the computing system being configured to, with the at least one processor and memory: implement a first layer of neural network using first weights and

Assignees

Inventors

Classifications

  • Software · CPC title

  • Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title

  • Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression · CPC title

  • Implementation provisions of register files, e.g. ports · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11562247B2 cover?
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).