Accelerated deep learning
US-11580394-B2 · Feb 14, 2023 · US
US11727257B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11727257-B2 |
| Application number | US-202217582904-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 24, 2022 |
| Priority date | Apr 17, 2017 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a plurality of processing elements, each processing element respectively comprising a coupled fabric router and compute element, the processing elements being interconnected at least in part via the respective fabric routers; each compute element comprising a memory, an instruction decoder enabled to decode each of a plurality of vector instructions comprising at least one respective operand identifier of at least one respective operand comprising an operand type of a mutually exclusive one of a memory operand type accessible via the memory and a fabric operand type accessible via the coupled fabric router, data structure registers enabled to store data structure descriptors, each stored data structure descriptor specifying the operand type and other attributes of a vector instruction operand, and a data sequencer enabled for each of the plurality of vector instructions to determine locations of one or more data elements of the at least one respective operand, the determination being based at least in part on at least one of the data structure descriptors read from the data structure registers, the reading being based at least in part on the at least one respective operand identifier; and wherein the compute element is enabled for each of the plurality of vector instructions to access the one or more data elements of the at least one respective operand in accordance with the at least one of the data structure descriptors. 2. The system of claim 1 , wherein each compute element is further enabled, responsive to the operand type being the fabric operand type and the at least one respective operand being a source, to access the operand via reading data elements from an input queue of the compute element, the input queue comprising at least in part the coupling with the fabric router. 3. The system of claim 1 , wherein each compute element is further enabled, responsive to the operand type being the fabric operand type and the at least one respective operand being a destination, to access the operand via writing data elements to an output queue of the compute element, the output queue comprising at least in part the coupling with the fabric router. 4. The system of claim 1 , wherein the at least one of the data structure descriptors is enabled to identify one of a plurality of extended operand descriptors. 5. The system of claim 4 , wherein the extended operand descriptors are enabled to specify one or more of stride information and dimension information of a four-dimensional memory vector. 6. The system of claim 4 , wherein the extended operand descriptors are enabled to specify one or more of a start address and an end address of a circular memory buffer. 7. The system of claim 4 , wherein the extended operand descriptors are enabled to specify FIFO or non-FIFO operation of a circular memory buffer. 8. The system of claim 1 , wherein execution of one or more of the plurality of vector instructions implements at least a portion of any one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 9. The system of claim 1 , wherein the at least one respective operand comprises at least a portion of any one or more of: a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. 10. The system of claim 1 , wherein a substantially whole wafer comprises the processing elements. 11. The system of claim 1 , wherein the compute element is enabled to perform an iteration one or more of the plurality of vector instructions via accessing, in accordance with an access pattern described by the at least one of the data structure descriptors, sufficient data elements of a vector for the iteration. 12. The system of claim 11 , wherein the access pattern is one of a fabric vector, a one-dimensional memory vector, a four-dimensional memory vector, and a circular memory buffer. 13. The system of claim 1 , wherein the compute element is enabled to read from the memory when the operand type is the memory operand type and the operand is a source. 14. The system of claim 1 , wherein the compute element is enabled to write to the memory when the operand type is the memory operand type and the operand is a destination. 15. The system of claim 1 , wherein the other attributes comprises information describing a length of the vector instruction operand. 16. The system of claim 1 , wherein the compute element is enabled to execute a first of the plurality of vector instructions, and the at least one of the data structure descriptors comprises microthreading information describing how the compute element is to operate when there is a stall accessing the one or more data elements of the at least one respective operand. 17. The system of claim 16 , wherein responsive to the stall and the microthreading information indicating microthreading not enabled, the compute element stalling. 18. The system of claim 16 , wherein responsive to the stall and the microthreading information indicating microthreading is enabled, the compute element suspending processing of the first of the plurality of vector instructions and selecting a first of one or more other instructions for processing. 19. The system of claim 18 , wherein the first of the plurality of vector instructions is associated with a first task and the first of one or more other instructions is associated with a second task. 20. The system of claim 1 , wherein the at least one respective operand comprises at least a portion of one or more of: the vector, a matrix, and a tensor. 21. The system of claim 1 , wherein when the operand type is the fabric operand type, the at least one of the data structure descriptors is associated with a fabric virtual channel of the interconnected processing elements. 22. The system of claim 1 , wherein the at least one of the data structure descriptors indicates how many of the data elements of the at least one respective operand to process in parallel. 23. The system of claim 1 , wherein the at least one of the data structure descriptors comprises an indicator of whether to terminate processing responsive at least in part to a control fabric packet, conveying one of the data elements of the at least one respective operand, being received via the fabric. 24. The system of claim 1 , wherein the at least one of the data structure descriptors comprises an indicator of a virtual channel to selectively activate responsive to completion of a first of the plurality of vector instructions. 25. The system of claim 1 , wherein the interconnected processing elements are enabled to perform dataflow-based and instruction-based processing. 26. The system of claim 1 , wherein the interconnection of the processing elements is via a fabric, wherein the fabric is at least in part a collection of couplings between the processing elements, the couplings comprising one or more of logical couplings and physical couplings. 27. The system of claim 1 , wherein each processing element is enabled to selectively communicate fabric packets with others of the processing el
Probabilistic or stochastic networks · CPC title
Learning methods · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Generative networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.