Floating-point unit stochastic rounding for accelerated deep learning

US11449574B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11449574-B2
Application numberUS-201816603789-A
CountryUS
Kind codeB2
Filing dateApr 13, 2018
Priority dateApr 14, 2017
Publication dateSep 20, 2022
Grant dateSep 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to perform stochastic rounding, thus in some circumstances enabling reducing systematic bias in long dependency chains of floating-point computations. The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of selective control of systematic biases arising in hardware acceleration of floating-point computations comprising long dependency chains, the method comprising: performing a floating-point operation of the floating-point computations, the performing using multiply-accumulate and normalizer logic to generate a floating-point result comprising a mantissa, the mantissa comprising a portion subject to rounding coupled to a first input of an incrementer and a portion subject to discarding in accordance with the rounding; selecting a pseudo-random number source from a plurality of pseudo-random number sources, the selecting being responsive to a selector control coupled to an input of the plurality of pseudo-random number sources; adding a pseudo-random number derived from the selected pseudo-random number source coupled to a first input of an adder with at least a more significant portion of the mantissa portion subject to discarding coupled to a second input of the adder; conditionally incrementing by the incrementer the mantissa portion subject to rounding, based at least on a carry out of the adding coupled to a second input of the incrementer, the conditionally incrementing resulting in a stochastically rounded mantissa output from the incrementer and ready for providing to a selected destination and use in at least a subsequent dependent one of the floating-point computations; and wherein the multiply-accumulate and normalizer logic, the pseudo-random number sources, the adder, the incrementer, and the selected destination are implemented in hardware and each of the coupled inputs are physically coupled hardware inputs. 2. The method of claim 1 , wherein the floating-point result further comprises an exponent, and further comprising conditionally adjusting the exponent in response to the conditionally incrementing. 3. The method of claim 2 , wherein the floating-point operation is a first floating-point operation, the floating-point result is a first floating-point result, and further comprising performing a second floating-point operation that generates an optionally rounded second floating-point result to use as an input operand to the first floating-point operation. 4. The method of claim 3 , wherein the optionally rounding comprises deterministically rounding. 5. The method of claim 1 , wherein at least one of the pseudo-random number sources is enabled to generate a first pseudo-random number in response to an instruction. 6. The method of claim 5 , wherein the instruction specifies an operation and the first pseudo-random number is an input operand to the operation. 7. The method of claim 1 , wherein with respect to the mantissa, the mantissa portion subject to rounding is contiguous with and more significant than the mantissa portion subject to discarding, the mantissa portion subject to discarding comprises mantissa bits subject to adding and zero or more mantissa bits subject to ignoring, the mantissa portion subject to rounding is contiguous with and more significant than the mantissa bits subject to adding, with respect to the mantissa portion subject to discarding, the mantissa bits subject to adding are contiguous with and more significant than the mantissa bits subject to ignoring, and the adding comprises adding the derived pseudo-random number with the mantissa bits subject to adding. 8. The method of claim 7 , wherein the mantissa bits subject to adding is three bits in length. 9. The method of claim 2 , further comprising writing a value to storage, the value comprising the conditionally incremented mantissa portion subject to rounding, the conditionally adjusted exponent, and a sign bit. 10. The method of claim 1 , wherein a result of the adding is an unbiased estimate of the at least a more significant portion of the mantissa portion subject to discarding. 11. The method of claim 1 , wherein the performing, the selecting, the adding, and the conditionally incrementing comprise portions of training a neural network, and further comprising initializing with a seed value the selected pseudo-random number source prior to the training the neural network. 12. The method of claim 1 , wherein a first processing element of a plurality of processing elements coupled by a fabric carries out the performing, the selecting, the adding, and the conditionally incrementing. 13. The method of claim 12 , further comprising a second processing element of the plurality of processing elements providing an input for the floating-point operation via the fabric. 14. The method of claim 13 , wherein the plurality of processing elements and the fabric are implemented via wafer-scale integration. 15. A system of selective control of systematic biases arising in hardware acceleration of floating-point computations comprising long dependency chains, the system comprising: means for performing a floating-point operation of the floating-point computations to generate a floating-point result comprising a mantissa, the mantissa comprising a portion subject to rounding and a portion subject to discarding in accordance with the rounding; means for selecting a pseudo-random number source from a plurality of pseudo-random number sources, the means for selecting being responsive to a selector control input; means for adding a pseudo-random number derived from the selected pseudo-random number source coupled to a first input of the means for adding with at least a more significant portion of the mantissa portion subject to discarding coupled to a second input of the means for adding; means for conditionally incrementing the mantissa portion subject to rounding coupled to a first input of the means for conditionally incrementing, based at least on a carry out of the means for adding coupled to a second input of the means for conditionally incrementing, a resulting stochastically rounded mantissa being output from the means for conditionally incrementing and ready for providing to a selected destination and use in at least a subsequent dependent one of the floating-point computations; and wherein the means for performing, the means for selecting, the means for adding, and the means for conditionally incrementing are implemented in hardware and each of the coupled inputs are physically coupled hardware inputs. 16. The system of claim 15 , wherein the floating-point result further comprises an exponent, and further comprising means for conditionally adjusting the exponent in response to the means for conditionally incrementing. 17. The system of claim 16 , wherein the floating-point operation is a first floating-point operation, the floating-point result is a first floating-point result, and further comprising means for performing a second floating-point operation that generates an optionally rounded second floating-point result to use as an input operand to the first floating-point operation. 18. The system of claim 17 , wherein the means for performing the second floating-point operation comprises means for deterministically rounding. 19. The system of claim 15 , wherein at least one of the pseudo-random number sources is enabled to generate a first pseudo-random number in response to an instruction. 20. The system of claim 19 , wherein the instruction specifies an operation and the first pseudo-random number is an input operand to the operation. 21. The system of claim 15 , wherein with respect to the mantissa, the mantissa portion subject to rounding is contiguous with and more significa

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • Rounding · CPC title

  • Mantissa overflow or underflow in handling floating-point numbers · CPC title

  • Implementation of IEEE-754 Standard · CPC title

  • with variable precision · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11449574B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floatin…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).