Supercomputer using wafer scale integration
US-2016246337-A1 · Aug 25, 2016 · US
US10726329B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10726329-B2 |
| Application number | US-201816089261-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 17, 2018 |
| Priority date | Apr 17, 2017 |
| Publication date | Jul 28, 2020 |
| Grant date | Jul 28, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
Opening claim text (preview).
What is claimed is: 1. A compute element comprising: a memory; means for decoding an instruction, the instruction comprising an operand field; means for accessing an operand descriptor based at least in part on the operand field; means for decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; means for accessing an operand in accordance with the operand descriptor and the particular type; means for performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via the memory; and wherein execution of the instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 2. A method comprising: in a compute element, decoding an instruction, the instruction comprising an operand field; in the compute element, accessing an operand descriptor based at least in part on the operand field; in the compute element, decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; in the compute element, accessing an operand in accordance with the operand descriptor and the particular type; performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via a memory of the compute element; and wherein execution of the instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 3. A method comprising: in a compute element, decoding an instruction, the instruction comprising an operand field; in the compute element, accessing an operand descriptor based at least in part on the operand field; in the compute element, decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; in the compute element, accessing an operand in accordance with the operand descriptor and the particular type; performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via a memory of the compute element; and wherein the operand comprises at least a portion of one or more of: a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. 4. A system comprising: a fabric of processing elements, each processing element comprising a fabric router coupled to a compute element, the fabric of processing elements enabled to perform dataflow-based processing and instruction-based processing, the fabric of processing elements implemented via wafer-scale integration; wherein each processing element is enabled to selectively communicate fabric packets with others of the processing elements at least in part via the fabric router of the respective processing element; wherein each compute element comprises a memory and is enabled to decode an instruction, the instruction comprising an operand field, access an operand descriptor based at least in part on the operand field, decode the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to, the plurality of types comprising a fabric type and a memory type, access an operand in accordance with the operand descriptor and the particular type, wherein the access of the operand is via the respective fabric router coupled to the compute element when the particular type is the fabric type, and wherein the access of the operand is via the memory when the particular type is the memory type; wherein the operand descriptor identifies an access pattern as one of a one-dimensional memory vector access pattern, a four-dimensional memory vector access pattern, and a circular memory buffer access pattern; wherein the operand descriptor is enabled to specify one of a plurality of extended operand descriptors; and wherein the extended operand descriptors are enabled to specify one or more of stride information and dimension information of a four-dimensional memory vector. 5. A system comprising: a fabric of processing elements, each processing element comprising a fabric router coupled to a compute element, the fabric of processing elements enabled to perform dataflow-based processing and instruction-based processing, the fabric of processing elements implemented via wafer-scale integration; wherein each processing element is enabled to selectively communicate fabric packets with others of the processing elements at least in part via the fabric router of the respective processing element; wherein each compute element comprises a memory and is enabled to decode an instruction, the instruction comprising an operand field, access an operand descriptor based at least in
Related publications grouped by family.
Answers are generated from the same data shown on this page.