Numerical representation for neural networks

US11062202B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11062202-B2
Application numberUS-201917262717-A
CountryUS
Kind codeB2
Filing dateJul 17, 2019
Priority dateJul 25, 2018
Publication dateJul 13, 2021
Grant dateJul 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to optionally and/or selectively perform floating-point operations in accordance with a programmable exponent bias and/or various floating-point computation variations. In some circumstances, the programmable exponent bias and/or the floating-point computation variations enable neural network processing with improved accuracy, decreased training time, decreased inference latency, and/or increased energy efficiency.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a programmable processor enabled to execute instructions, the instructions comprising floating-point instructions, and first and second load control resource instructions; a first control resource of the programmable processor enabled to receive an exponent bias responsive to an operation corresponding to the first load control resource instruction; a floating-point unit of the programmable processor enabled to perform floating-point operations corresponding to the floating-point instructions, the performing floating-point operations being in accordance with floating-point operands and floating-point results each comprising a respective biased exponent, the performing floating-point operations being responsive to the exponent bias, and compatible, according to the exponent bias, with interpreting the biased exponents of the floating-point operands and producing the biased exponents of the floating-point results; and wherein the apparatus further comprises a second control resource of the programmable processor enabled to receive an exponent size specifier responsive to an operation corresponding to the second load control resource instruction, the floating-point unit is further enabled to perform the floating-point operations responsive to the exponent size specifier specifying one of a plurality of floating-point formats, and a first of the plurality of floating-point formats comprises a biased exponent of N bits and a mantissa of M bits, and a second of the plurality of floating-point formats comprises a biased exponent of (N+delta) bits and a mantissa of (M-delta) bits, and the performing floating-point operations is further compatible, according to the exponent size specifier, with interpreting the biased exponents of the floating-point operands and producing the biased exponents of the floating-point results. 2. The apparatus of claim 1 , wherein the programmable processor is one of a plurality of like programmable processors fabricated on a wafer in accordance with wafer-scale integration, and a datacenter element enabled to perform neural network processing comprises the wafer. 3. The apparatus of claim 1 , wherein the programmable processor comprises one or more hardware registers comprising at least one of the first and the second control resources, and at least one of the first and the second load control resource instructions is a load hardware register instruction. 4. The apparatus of claim 1 , wherein at least one of the first and the second control resources is memory-mapped, and at least one of the first and the second load control resource instructions is a memory store instruction. 5. The apparatus of claim 1 , wherein: the instructions further comprise a third load control resource instruction, the apparatus further comprises a third control resource of the programmable processor enabled to receive a rounding mode specifier responsive to an operation corresponding to the third load control resource instruction, and the floating-point unit is further enabled to perform the floating-point operations in accordance with one of a plurality of rounding modes specified by the rounding mode specifier, and at least one of the rounding modes comprises saturating the floating-point results to a value comprising a maximum biased exponent. 6. The apparatus of claim 1 , wherein: the instructions further comprise a third load control resource instruction, the apparatus further comprises a third control resource of the programmable processor enabled to receive a flush-to-zero specifier responsive to an operation corresponding to the third load control resource instruction, and the floating-point unit is further enabled to perform the floating-point operations responsive to the flush-to-zero specifier specifying a flush-to-zero mode, and the flush-to-zero mode comprises flushing subnormal results to zero. 7. An apparatus comprising: a programmable processor enabled to execute instructions, the instructions comprising floating-point instructions, and first and second load control resource instructions; a first control resource of the programmable processor enabled to receive an exponent bias responsive to an operation corresponding to the first load control resource instruction; a floating-point unit of the programmable processor enabled to perform floating-point operations corresponding to the floating-point instructions, the performing floating-point operations being in accordance with floating-point operands and floating-point results each comprising a respective biased exponent, the performing floating-point operations being responsive to the exponent bias, and compatible, according to the exponent bias, with interpreting the biased exponents of the floating-point operands and producing the biased exponents of the floating-point results; and wherein the apparatus further comprises a second control resource of the programmable processor enabled to receive a maximum biased exponent normal specifier responsive to an operation corresponding to the second load control resource instruction, the floating-point unit is further enabled to perform the floating-point operations responsive to the maximum biased exponent normal specifier specifying one of a plurality of floating-point modes, and a first of the plurality of floating-point modes comprises using a maximum biased exponent to represent a biased exponent, and the performing floating-point operations is further compatible, according to the maximum biased exponent normal specifier, with interpreting the biased exponents of the floating-point operands and producing the biased exponents of the floating-point results. 8. The apparatus of claim 7 , wherein: the instructions further comprise a third load control resource instruction, the apparatus further comprises a third control resource of the programmable processor enabled to receive a rounding mode specifier responsive to an operation corresponding to the third load control resource instruction, and the floating-point unit is further enabled to perform the floating-point operations in accordance with one of a plurality of rounding modes specified by the rounding mode specifier, and at least one of the rounding modes comprises saturating the floating-point results to a value comprising a maximum biased exponent. 9. The apparatus of claim 7 , wherein a second of the plurality of floating-point modes comprises using the maximum biased exponent to represent at least one of: infinite numbers and IEEE compatible NaN. 10. The apparatus of claim 7 , wherein the programmable processor is one of a plurality of like programmable processors fabricated on a wafer in accordance with wafer-scale integration, and a datacenter element enabled to perform neural network processing comprises the wafer. 11. The apparatus of claim 7 , wherein the programmable processor comprises one or more hardware registers comprising at least one of the first and the second control resources, and at least one of the first and the second load control resource instructions is a load hardware register instruction. 12. The apparatus of claim 7 , wherein at least one of the first and the second control resources is memory-mapped, and at least one of the first and the second load control resource instructions is a memory store instruction. 13. An apparatus comprising: a programmable processor enabled to execute instructions, the instructions comprising floating-point instructions, and first and second load control resource instructions; a first control resource of the programmable processor enabled to receive an exponent bias re

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11062202B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floatin…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).