Neural network activation compression with non-uniform mantissas

US12067495B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12067495-B2
Application numberUS-202318092876-A
CountryUS
Kind codeB2
Filing dateJan 3, 2023
Priority dateJan 24, 2019
Publication dateAug 20, 2024
Grant dateAug 20, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: one or more processors; at least one memory coupled to the one or more processors; and one or more computer-readable storage media storing computer-executable instructions that, when executed, cause the computing system to perform operations comprising: performing forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format using a first number of bits to represent values of a mantissa of respective activation values of the first activation values in the first floating-point format; converting at least one of the activation values to a second floating-point format to produce compressed activation values by representing activation value mantissas using a second number of bits, the second number of bits being less than the first number of bits; and storing the compressed activation values in the at least one memory. 2. The computing system of claim 1 , wherein values representable by the first number of bits are evenly distributed in the second floating-point format using the second number of bits. 3. The computing system of claim 1 , wherein values representable by the first number of bits are unevenly distributed in the second floating-point format using the second number of bits. 4. The computing system of claim 3 , wherein an uneven distribution scheme is defined by specifying discrete sets of mantissa values representable by the first number of bits to be represented by a bit value of the second number of bits. 5. The computing system of claim 1 , wherein the converting comprises: mapping two or more mantissa values in the first floating-point format to a single mantissa value in the second floating-point format. 6. The computing system of claim 1 , wherein the converting comprises: mapping a first set of one or more mantissa values in the first floating-point to a single mantissa value in the second floating-point format; and mapping a second set of a plurality of mantissa values in the first floating-point format to a single mantissa value in the second floating-point format, wherein the second set has a greater number of elements than the first set. 7. The computing system of claim 1 , the operations further comprising: further compressing the compressed activation values prior to the storing by performing one or more of entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 8. The computing system of claim 1 , the operations further comprising: performing backward propagation for a layer of the neural network by converting the stored, compressed activation values to activation values in the first floating-point format thereby producing uncompressed activation values; and perform a gradient operation with the uncompressed activation values. 9. The computing system of claim 8 , wherein the converting the stored, compressed activation values to activation values in the first floating-point format comprises deterministically selecting a dequantized value for a quantized mantissa value of a compressed activation value of the stored, compressed activation values. 10. The computing system of claim 8 , wherein the converting the stored, compressed activation values to activation values in the first floating-point format comprises randomly selecting a dequantized value for a quantized mantissa value of a compressed activation value of the stored, compressed activation values. 11. The computing system of claim 10 , wherein the randomly selecting is performed according to a uniform distribution or probability distribution. 12. The computing system of claim 1 , wherein the layer is a first layer, the operations further comprising: performing forward propagation for a different, second layer of the neural network to produce second activation values in the first floating-point format; for at least one of the second activation values, converting the at least one of the second activation values to a third floating-point format to produce second compressed activation values, the third floating-point format representing activation value mantissas using a third number of bits, the third number of bits being different than the first number of bits and the second number of bits; and storing the second compressed activation values in the at least one memory. 13. The computing system of claim 12 , wherein the third floating-point format is selected as a compression method based at least in part on an aspect of the second layer. 14. The computing system of claim 12 , wherein the second activation values are produced using first activation values propagated to the second layer from the first layer. 15. The computing system of claim 1 , wherein the one or more processors comprise at least one of a tensor processing unit, a neural network accelerator, a graphics processing unit, or a processor implemented in a reconfigurable logic array; and the at least one memory is situated on a different integrated circuit than the processors. 16. The computing system of claim 1 , wherein the at least one memory comprises dynamic random access memory (DRAM) or embedded DRAM and the computing system further comprises a hardware accelerator including a memory temporarily storing the first activation values for at least a portion of only one layer of the neural network, the hardware accelerator memory comprising static RAM (SRAM) or a register file. 17. The computing system of claim 1 , wherein the first floating-point format has a higher-precision exponent that the second floating-point format. 18. The computing system of claim 1 , wherein the first floating-point format uses a first technique to determine exponent sharing between at least a portion of the activation values and the second floating-point format uses a second technique, different than the first technique, to determine exponent sharing between at least a portion of the activation values as represented in the second floating-point format. 19. A method, implemented in a computing system comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, the method comprising: performing forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format using a first number of bits to represent values of a mantissa of respective activation values of the first activation values in the first floating-point format; converting at least one of the activation values to a second floating-point format to produce compressed activation values by representing activation value mantissas using a second number of bits, the second number of bits being less than the first number of bits; and storing the compressed activation values in the at least one memory. 20. One or more computer-readable storage media comprising: computer-executable instructions that, when executed by a computing system comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, cause the computing system to perform forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format using a first number of bits to represent values of a mantissa of respective activation values of the first activation values in the first floati

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Software · CPC title

  • Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title

  • Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12067495B2 cover?
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).