Isa enhancements for accelerated deep learning
US-2021255860-A1 · Aug 19, 2021 · US
US11449574B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11449574-B2 |
| Application number | US-201816603789-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 13, 2018 |
| Priority date | Apr 14, 2017 |
| Publication date | Sep 20, 2022 |
| Grant date | Sep 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to perform stochastic rounding, thus in some circumstances enabling reducing systematic bias in long dependency chains of floating-point computations. The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network.
Opening claim text (preview).
What is claimed is: 1. A method of selective control of systematic biases arising in hardware acceleration of floating-point computations comprising long dependency chains, the method comprising: performing a floating-point operation of the floating-point computations, the performing using multiply-accumulate and normalizer logic to generate a floating-point result comprising a mantissa, the mantissa comprising a portion subject to rounding coupled to a first input of an incrementer and a portion subject to discarding in accordance with the rounding; selecting a pseudo-random number source from a plurality of pseudo-random number sources, the selecting being responsive to a selector control coupled to an input of the plurality of pseudo-random number sources; adding a pseudo-random number derived from the selected pseudo-random number source coupled to a first input of an adder with at least a more significant portion of the mantissa portion subject to discarding coupled to a second input of the adder; conditionally incrementing by the incrementer the mantissa portion subject to rounding, based at least on a carry out of the adding coupled to a second input of the incrementer, the conditionally incrementing resulting in a stochastically rounded mantissa output from the incrementer and ready for providing to a selected destination and use in at least a subsequent dependent one of the floating-point computations; and wherein the multiply-accumulate and normalizer logic, the pseudo-random number sources, the adder, the incrementer, and the selected destination are implemented in hardware and each of the coupled inputs are physically coupled hardware inputs. 2. The method of claim 1 , wherein the floating-point result further comprises an exponent, and further comprising conditionally adjusting the exponent in response to the conditionally incrementing. 3. The method of claim 2 , wherein the floating-point operation is a first floating-point operation, the floating-point result is a first floating-point result, and further comprising performing a second floating-point operation that generates an optionally rounded second floating-point result to use as an input operand to the first floating-point operation. 4. The method of claim 3 , wherein the optionally rounding comprises deterministically rounding. 5. The method of claim 1 , wherein at least one of the pseudo-random number sources is enabled to generate a first pseudo-random number in response to an instruction. 6. The method of claim 5 , wherein the instruction specifies an operation and the first pseudo-random number is an input operand to the operation. 7. The method of claim 1 , wherein with respect to the mantissa, the mantissa portion subject to rounding is contiguous with and more significant than the mantissa portion subject to discarding, the mantissa portion subject to discarding comprises mantissa bits subject to adding and zero or more mantissa bits subject to ignoring, the mantissa portion subject to rounding is contiguous with and more significant than the mantissa bits subject to adding, with respect to the mantissa portion subject to discarding, the mantissa bits subject to adding are contiguous with and more significant than the mantissa bits subject to ignoring, and the adding comprises adding the derived pseudo-random number with the mantissa bits subject to adding. 8. The method of claim 7 , wherein the mantissa bits subject to adding is three bits in length. 9. The method of claim 2 , further comprising writing a value to storage, the value comprising the conditionally incremented mantissa portion subject to rounding, the conditionally adjusted exponent, and a sign bit. 10. The method of claim 1 , wherein a result of the adding is an unbiased estimate of the at least a more significant portion of the mantissa portion subject to discarding. 11. The method of claim 1 , wherein the performing, the selecting, the adding, and the conditionally incrementing comprise portions of training a neural network, and further comprising initializing with a seed value the selected pseudo-random number source prior to the training the neural network. 12. The method of claim 1 , wherein a first processing element of a plurality of processing elements coupled by a fabric carries out the performing, the selecting, the adding, and the conditionally incrementing. 13. The method of claim 12 , further comprising a second processing element of the plurality of processing elements providing an input for the floating-point operation via the fabric. 14. The method of claim 13 , wherein the plurality of processing elements and the fabric are implemented via wafer-scale integration. 15. A system of selective control of systematic biases arising in hardware acceleration of floating-point computations comprising long dependency chains, the system comprising: means for performing a floating-point operation of the floating-point computations to generate a floating-point result comprising a mantissa, the mantissa comprising a portion subject to rounding and a portion subject to discarding in accordance with the rounding; means for selecting a pseudo-random number source from a plurality of pseudo-random number sources, the means for selecting being responsive to a selector control input; means for adding a pseudo-random number derived from the selected pseudo-random number source coupled to a first input of the means for adding with at least a more significant portion of the mantissa portion subject to discarding coupled to a second input of the means for adding; means for conditionally incrementing the mantissa portion subject to rounding coupled to a first input of the means for conditionally incrementing, based at least on a carry out of the means for adding coupled to a second input of the means for conditionally incrementing, a resulting stochastically rounded mantissa being output from the means for conditionally incrementing and ready for providing to a selected destination and use in at least a subsequent dependent one of the floating-point computations; and wherein the means for performing, the means for selecting, the means for adding, and the means for conditionally incrementing are implemented in hardware and each of the coupled inputs are physically coupled hardware inputs. 16. The system of claim 15 , wherein the floating-point result further comprises an exponent, and further comprising means for conditionally adjusting the exponent in response to the means for conditionally incrementing. 17. The system of claim 16 , wherein the floating-point operation is a first floating-point operation, the floating-point result is a first floating-point result, and further comprising means for performing a second floating-point operation that generates an optionally rounded second floating-point result to use as an input operand to the first floating-point operation. 18. The system of claim 17 , wherein the means for performing the second floating-point operation comprises means for deterministically rounding. 19. The system of claim 15 , wherein at least one of the pseudo-random number sources is enabled to generate a first pseudo-random number in response to an instruction. 20. The system of claim 19 , wherein the instruction specifies an operation and the first pseudo-random number is an input operand to the operation. 21. The system of claim 15 , wherein with respect to the mantissa, the mantissa portion subject to rounding is contiguous with and more significa
using electronic means · CPC title
Rounding · CPC title
Mantissa overflow or underflow in handling floating-point numbers · CPC title
Implementation of IEEE-754 Standard · CPC title
with variable precision · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.