Automated methods for conversions to a lower precision data format
US-2018211152-A1 · Jul 26, 2018 · US
US12443848B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12443848-B2 |
| Application number | US-201816237197-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 31, 2018 |
| Priority date | Dec 31, 2018 |
| Publication date | Oct 14, 2025 |
| Grant date | Oct 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a narrower numerical precision than the first block floating-point format. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
Opening claim text (preview).
What is claimed is: 1. A computing system comprising: one or more hardware processors; at least one memory coupled to the one or more hardware processors; and one or more computer-readable storage media storing computer-executable instructions, or hardware comprising logic implementing the computer-executable instructions, that, when executed by the computing system, cause the computing system to perform operations comprising: for each of multiple layers of a neural network comprising a plurality of layers: performing forward propagation for a given layer of the multiple layers of the neural network using a set of input data to produce activation values in a first quantized block floating-point format for the given layer, the first quantized block floating-point format having a first numerical precision; converting at least one of the produced activation values in the first quantized block floating-point format for the given layer to a second, different, quantized block floating-point format to produce compressed activation values for the given layer, the second quantized block floating-point format having a second numerical precision less than the first numerical precision; storing the compressed activation values to provide stored compressed activation values for the given layer; and for layers of the multiple layers that are not a final layer of the neural network, forward propagating the activation values in the first quantized block-floating point format for the given layer to a next layer of the multiple layers; calculating a measure of loss for an initial set of input data provided to the neural network based on results of forward propagation through the multiple layers using the activation values in the first quantized block floating-point format provided by an output layer of the neural network, wherein the output layer receives a set of input values of a prior layer of the multiple layers in the first quantized block floating-point format; retrieving the stored compressed activation values for the multiple layers; decompressing the stored compressed activation values for given layers of the multiple layers from the second quantized block floating-point format to the first block floating point format to provide decompressed activation values for the given layers of the multiple layers; and performing backpropagation for the multiple layers of the neural network using the measure of loss and respective decompressed activation values for respective given layers of the multiple layers. 2. The computing system of claim 1 , wherein the second quantized block floating-point format has a lower-precision mantissa and/or a lower-precision exponent than the first quantized block floating-point format. 3. The computing system of claim 1 , the operations further comprising: converting the activation values in the first quantized block floating-point format to a normal-precision format, producing converted normal-precision values; and converting the converted normal-precision values to the second quantized block floating-point format. 4. The computing system of claim 1 , wherein the second quantized block floating-point format has a different sharing format of a common exponent than the first quantized block floating-point format, the different sharing format being different based on per-row, per-column, or per-tile sharing of a common exponent for the compressed activation values. 5. The computing system of claim 1 , the operations further comprising: further compressing the compressed activation values prior to the storing by performing at least one or more of: entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 6. The computing system of claim 1 , the operations further comprising: during the backpropagation, performing a gradient operation with the decompressed activation values for the respective given layers. 7. The computing system of claim 1 , the operations further comprising: performing forward propagation for a layer of the plurality of layers, the layer of the plurality of layers not being a layer of the multiple layers, to provide further activation values in the first quantized block floating-point format; converting the further activation values to a third quantized block floating-point format, the third quantized block floating-point format having a numerical precision lower than the second numerical precision; and storing the further activation values in the third quantized block floating-point format for use during the backpropagation. 8. The computing system of claim 1 , wherein: the one or more hardware processors comprise a first processor comprising at least one of: a tensor processing unit, a neural network accelerator, a graphics processing unit, or a processor implemented in a reconfigurable logic array; and the at least one memory comprises bulk memory, the bulk memory being situated on a different integrated circuit than the first processor. 9. The computing system of claim 8 , wherein the bulk memory includes dynamic random access memory (DRAM) or embedded DRAM and the computing system further comprises a hardware accelerator including a memory temporarily storing the activation values for at least a portion of only one layer of the multiple layers of the neural network, the at least one memory comprising hardware accelerator memory, the hardware accelerator memory comprising static RAM (SRAM) or a register file. 10. A method, implemented in a computing environment comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, the method comprising: for each of multiple layers of a neural network comprising a plurality of layers: performing forward propagation for a given layer of the multiple layers of the neural network using a set of input data to produce activation values in a first quantized block floating-point format for the given layer, the first quantized block floating-point format having a first numerical precision; converting at least one of the produced activation values in the first quantized block floating-point format for the given layer to a second, different, quantized block floating-point format to produce compressed activation values for the given layer, the second quantized block floating-point format having a second numerical precision less than the first numerical precision; storing the compressed activation values to provide stored compressed activation values for the given layer; and for layers of the multiple layers that are not a final layer of the neural network, forward propagating the activation values in the first quantized block-floating point format for the given layer to a next layer of the multiple layers; calculating a measure of loss for an initial set of input data provided to the neural network based on results of forward propagation through the multiple layers using the activation values in the first quantized block floating-point format provided by an output layer of the neural network, wherein the output layer receives a set of input values of a prior layer of the multiple layers in the first quantized block floating-point format; retrieving the stored compressed activation values for the multiple layers; decompressing the stored compressed activation values for given layers of the multiple layers from the second quantized block floating-point format to the first quantized block floating point format to provide decompressed activation values for the given layers of the multiple layers; and performing backpropagation for the multiple layers of the neural network using the measure of los
Forward inferencing; Production systems · CPC title
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion · CPC title
Mantissa overflow or underflow in handling floating-point numbers · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.