Data structure descriptors for deep learning acceleration

US11727257B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11727257-B2
Application numberUS-202217582904-A
CountryUS
Kind codeB2
Filing dateJan 24, 2022
Priority dateApr 17, 2017
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a plurality of processing elements, each processing element respectively comprising a coupled fabric router and compute element, the processing elements being interconnected at least in part via the respective fabric routers; each compute element comprising a memory, an instruction decoder enabled to decode each of a plurality of vector instructions comprising at least one respective operand identifier of at least one respective operand comprising an operand type of a mutually exclusive one of a memory operand type accessible via the memory and a fabric operand type accessible via the coupled fabric router, data structure registers enabled to store data structure descriptors, each stored data structure descriptor specifying the operand type and other attributes of a vector instruction operand, and a data sequencer enabled for each of the plurality of vector instructions to determine locations of one or more data elements of the at least one respective operand, the determination being based at least in part on at least one of the data structure descriptors read from the data structure registers, the reading being based at least in part on the at least one respective operand identifier; and wherein the compute element is enabled for each of the plurality of vector instructions to access the one or more data elements of the at least one respective operand in accordance with the at least one of the data structure descriptors. 2. The system of claim 1 , wherein each compute element is further enabled, responsive to the operand type being the fabric operand type and the at least one respective operand being a source, to access the operand via reading data elements from an input queue of the compute element, the input queue comprising at least in part the coupling with the fabric router. 3. The system of claim 1 , wherein each compute element is further enabled, responsive to the operand type being the fabric operand type and the at least one respective operand being a destination, to access the operand via writing data elements to an output queue of the compute element, the output queue comprising at least in part the coupling with the fabric router. 4. The system of claim 1 , wherein the at least one of the data structure descriptors is enabled to identify one of a plurality of extended operand descriptors. 5. The system of claim 4 , wherein the extended operand descriptors are enabled to specify one or more of stride information and dimension information of a four-dimensional memory vector. 6. The system of claim 4 , wherein the extended operand descriptors are enabled to specify one or more of a start address and an end address of a circular memory buffer. 7. The system of claim 4 , wherein the extended operand descriptors are enabled to specify FIFO or non-FIFO operation of a circular memory buffer. 8. The system of claim 1 , wherein execution of one or more of the plurality of vector instructions implements at least a portion of any one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 9. The system of claim 1 , wherein the at least one respective operand comprises at least a portion of any one or more of: a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. 10. The system of claim 1 , wherein a substantially whole wafer comprises the processing elements. 11. The system of claim 1 , wherein the compute element is enabled to perform an iteration one or more of the plurality of vector instructions via accessing, in accordance with an access pattern described by the at least one of the data structure descriptors, sufficient data elements of a vector for the iteration. 12. The system of claim 11 , wherein the access pattern is one of a fabric vector, a one-dimensional memory vector, a four-dimensional memory vector, and a circular memory buffer. 13. The system of claim 1 , wherein the compute element is enabled to read from the memory when the operand type is the memory operand type and the operand is a source. 14. The system of claim 1 , wherein the compute element is enabled to write to the memory when the operand type is the memory operand type and the operand is a destination. 15. The system of claim 1 , wherein the other attributes comprises information describing a length of the vector instruction operand. 16. The system of claim 1 , wherein the compute element is enabled to execute a first of the plurality of vector instructions, and the at least one of the data structure descriptors comprises microthreading information describing how the compute element is to operate when there is a stall accessing the one or more data elements of the at least one respective operand. 17. The system of claim 16 , wherein responsive to the stall and the microthreading information indicating microthreading not enabled, the compute element stalling. 18. The system of claim 16 , wherein responsive to the stall and the microthreading information indicating microthreading is enabled, the compute element suspending processing of the first of the plurality of vector instructions and selecting a first of one or more other instructions for processing. 19. The system of claim 18 , wherein the first of the plurality of vector instructions is associated with a first task and the first of one or more other instructions is associated with a second task. 20. The system of claim 1 , wherein the at least one respective operand comprises at least a portion of one or more of: the vector, a matrix, and a tensor. 21. The system of claim 1 , wherein when the operand type is the fabric operand type, the at least one of the data structure descriptors is associated with a fabric virtual channel of the interconnected processing elements. 22. The system of claim 1 , wherein the at least one of the data structure descriptors indicates how many of the data elements of the at least one respective operand to process in parallel. 23. The system of claim 1 , wherein the at least one of the data structure descriptors comprises an indicator of whether to terminate processing responsive at least in part to a control fabric packet, conveying one of the data elements of the at least one respective operand, being received via the fabric. 24. The system of claim 1 , wherein the at least one of the data structure descriptors comprises an indicator of a virtual channel to selectively activate responsive to completion of a first of the plurality of vector instructions. 25. The system of claim 1 , wherein the interconnected processing elements are enabled to perform dataflow-based and instruction-based processing. 26. The system of claim 1 , wherein the interconnection of the processing elements is via a fabric, wherein the fabric is at least in part a collection of couplings between the processing elements, the couplings comprising one or more of logical couplings and physical couplings. 27. The system of claim 1 , wherein each processing element is enabled to selectively communicate fabric packets with others of the processing el

Assignees

Inventors

Classifications

  • Probabilistic or stochastic networks · CPC title

  • Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Generative networks · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11727257B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data st…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).