Data structure descriptors for deep learning acceleration

US10726329B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10726329-B2
Application numberUS-201816089261-A
CountryUS
Kind codeB2
Filing dateApr 17, 2018
Priority dateApr 17, 2017
Publication dateJul 28, 2020
Grant dateJul 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A compute element comprising: a memory; means for decoding an instruction, the instruction comprising an operand field; means for accessing an operand descriptor based at least in part on the operand field; means for decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; means for accessing an operand in accordance with the operand descriptor and the particular type; means for performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via the memory; and wherein execution of the instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 2. A method comprising: in a compute element, decoding an instruction, the instruction comprising an operand field; in the compute element, accessing an operand descriptor based at least in part on the operand field; in the compute element, decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; in the compute element, accessing an operand in accordance with the operand descriptor and the particular type; performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via a memory of the compute element; and wherein execution of the instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 3. A method comprising: in a compute element, decoding an instruction, the instruction comprising an operand field; in the compute element, accessing an operand descriptor based at least in part on the operand field; in the compute element, decoding the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to; in the compute element, accessing an operand in accordance with the operand descriptor and the particular type; performing an iteration of the instruction via accessing, in accordance with an access pattern described by the operand descriptor, sufficient data elements of a vector for the iteration; wherein the types comprise a fabric type and a memory type; wherein the compute element is comprised in a processing element that comprises a fabric router, the processing element is one of a fabric of processing elements each comprising a respective compute element and a respective fabric router; wherein the processing elements are interconnected via a fabric coupled to the respective fabric routers; wherein the fabric of processing elements is enabled to perform dataflow-based and instruction-based processing; wherein the fabric of processing elements is implemented via wafer-scale integration; wherein when the particular type is the fabric type, the operand is accessed via the fabric; wherein when the particular type is the memory type, the operand is accessed via a memory of the compute element; and wherein the operand comprises at least a portion of one or more of: a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. 4. A system comprising: a fabric of processing elements, each processing element comprising a fabric router coupled to a compute element, the fabric of processing elements enabled to perform dataflow-based processing and instruction-based processing, the fabric of processing elements implemented via wafer-scale integration; wherein each processing element is enabled to selectively communicate fabric packets with others of the processing elements at least in part via the fabric router of the respective processing element; wherein each compute element comprises a memory and is enabled to decode an instruction, the instruction comprising an operand field, access an operand descriptor based at least in part on the operand field, decode the operand descriptor to determine a particular one of a plurality of types the operand descriptor refers to, the plurality of types comprising a fabric type and a memory type, access an operand in accordance with the operand descriptor and the particular type, wherein the access of the operand is via the respective fabric router coupled to the compute element when the particular type is the fabric type, and wherein the access of the operand is via the memory when the particular type is the memory type; wherein the operand descriptor identifies an access pattern as one of a one-dimensional memory vector access pattern, a four-dimensional memory vector access pattern, and a circular memory buffer access pattern; wherein the operand descriptor is enabled to specify one of a plurality of extended operand descriptors; and wherein the extended operand descriptors are enabled to specify one or more of stride information and dimension information of a four-dimensional memory vector. 5. A system comprising: a fabric of processing elements, each processing element comprising a fabric router coupled to a compute element, the fabric of processing elements enabled to perform dataflow-based processing and instruction-based processing, the fabric of processing elements implemented via wafer-scale integration; wherein each processing element is enabled to selectively communicate fabric packets with others of the processing elements at least in part via the fabric router of the respective processing element; wherein each compute element comprises a memory and is enabled to decode an instruction, the instruction comprising an operand field, access an operand descriptor based at least in

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10726329B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data st…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).