Tagged indirect branch predictor (tip)
US-2020150968-A1 · May 14, 2020 · US
US12067495B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12067495-B2 |
| Application number | US-202318092876-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 3, 2023 |
| Priority date | Jan 24, 2019 |
| Publication date | Aug 20, 2024 |
| Grant date | Aug 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
Opening claim text (preview).
What is claimed is: 1. A computing system comprising: one or more processors; at least one memory coupled to the one or more processors; and one or more computer-readable storage media storing computer-executable instructions that, when executed, cause the computing system to perform operations comprising: performing forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format using a first number of bits to represent values of a mantissa of respective activation values of the first activation values in the first floating-point format; converting at least one of the activation values to a second floating-point format to produce compressed activation values by representing activation value mantissas using a second number of bits, the second number of bits being less than the first number of bits; and storing the compressed activation values in the at least one memory. 2. The computing system of claim 1 , wherein values representable by the first number of bits are evenly distributed in the second floating-point format using the second number of bits. 3. The computing system of claim 1 , wherein values representable by the first number of bits are unevenly distributed in the second floating-point format using the second number of bits. 4. The computing system of claim 3 , wherein an uneven distribution scheme is defined by specifying discrete sets of mantissa values representable by the first number of bits to be represented by a bit value of the second number of bits. 5. The computing system of claim 1 , wherein the converting comprises: mapping two or more mantissa values in the first floating-point format to a single mantissa value in the second floating-point format. 6. The computing system of claim 1 , wherein the converting comprises: mapping a first set of one or more mantissa values in the first floating-point to a single mantissa value in the second floating-point format; and mapping a second set of a plurality of mantissa values in the first floating-point format to a single mantissa value in the second floating-point format, wherein the second set has a greater number of elements than the first set. 7. The computing system of claim 1 , the operations further comprising: further compressing the compressed activation values prior to the storing by performing one or more of entropy compression, zero compression, run length encoding, compressed sparse row compression, or compressed sparse column compression. 8. The computing system of claim 1 , the operations further comprising: performing backward propagation for a layer of the neural network by converting the stored, compressed activation values to activation values in the first floating-point format thereby producing uncompressed activation values; and perform a gradient operation with the uncompressed activation values. 9. The computing system of claim 8 , wherein the converting the stored, compressed activation values to activation values in the first floating-point format comprises deterministically selecting a dequantized value for a quantized mantissa value of a compressed activation value of the stored, compressed activation values. 10. The computing system of claim 8 , wherein the converting the stored, compressed activation values to activation values in the first floating-point format comprises randomly selecting a dequantized value for a quantized mantissa value of a compressed activation value of the stored, compressed activation values. 11. The computing system of claim 10 , wherein the randomly selecting is performed according to a uniform distribution or probability distribution. 12. The computing system of claim 1 , wherein the layer is a first layer, the operations further comprising: performing forward propagation for a different, second layer of the neural network to produce second activation values in the first floating-point format; for at least one of the second activation values, converting the at least one of the second activation values to a third floating-point format to produce second compressed activation values, the third floating-point format representing activation value mantissas using a third number of bits, the third number of bits being different than the first number of bits and the second number of bits; and storing the second compressed activation values in the at least one memory. 13. The computing system of claim 12 , wherein the third floating-point format is selected as a compression method based at least in part on an aspect of the second layer. 14. The computing system of claim 12 , wherein the second activation values are produced using first activation values propagated to the second layer from the first layer. 15. The computing system of claim 1 , wherein the one or more processors comprise at least one of a tensor processing unit, a neural network accelerator, a graphics processing unit, or a processor implemented in a reconfigurable logic array; and the at least one memory is situated on a different integrated circuit than the processors. 16. The computing system of claim 1 , wherein the at least one memory comprises dynamic random access memory (DRAM) or embedded DRAM and the computing system further comprises a hardware accelerator including a memory temporarily storing the first activation values for at least a portion of only one layer of the neural network, the hardware accelerator memory comprising static RAM (SRAM) or a register file. 17. The computing system of claim 1 , wherein the first floating-point format has a higher-precision exponent that the second floating-point format. 18. The computing system of claim 1 , wherein the first floating-point format uses a first technique to determine exponent sharing between at least a portion of the activation values and the second floating-point format uses a second technique, different than the first technique, to determine exponent sharing between at least a portion of the activation values as represented in the second floating-point format. 19. A method, implemented in a computing system comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, the method comprising: performing forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format using a first number of bits to represent values of a mantissa of respective activation values of the first activation values in the first floating-point format; converting at least one of the activation values to a second floating-point format to produce compressed activation values by representing activation value mantissas using a second number of bits, the second number of bits being less than the first number of bits; and storing the compressed activation values in the at least one memory. 20. One or more computer-readable storage media comprising: computer-executable instructions that, when executed by a computing system comprising at least one hardware processor and at least one memory coupled to the at least one hardware processor, cause the computing system to perform forward propagation for a layer of a neural network to produce first activation values in a first floating-point format, the first floating-point format using a first number of bits to represent values of a mantissa of respective activation values of the first activation values in the first floati
Activation functions · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Software · CPC title
Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title
Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.