Processors, methods, and systems with a configurable spatial accelerator
US-2018189063-A1 · Jul 5, 2018 · US
US11727254B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11727254-B2 |
| Application number | US-202017005140-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 27, 2020 |
| Priority date | Apr 17, 2017 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.
Opening claim text (preview).
What is claimed is: 1. A method comprising: sending a packet by a sending processing element to a fabric, the packet comprising a specifier and an indicator, the specifier being one of a plurality of specifiers each associated with a respective set of one or more sets of packets, the indicator being enabled to selectively indicate that the packet is a control packet, and the sending comprising executing a first instruction that comprises a destination operand that specifies a destination operand descriptor that is usable to determine at least in part the specifier and the indicator; routing the packet from the sending processing element to one or more receiving processing elements, the routing being in accordance with the specifier and via the fabric and one or more routing elements; in at least one of the receiving processing elements, receiving the packet from the fabric and processing the packet; wherein the receiving comprises associating the packet with the respective set associated with the specifier; wherein the processing comprises executing a second instruction that comprises a source operand that specifies a source operand descriptor that is usable to indicate at least in part the specifier and to selectively indicate termination upon receipt of a control packet; and wherein the executing the second instruction comprises terminating the second instruction, responsive to (a) the packet being older than any other packets associated with the respective set associated with the specifier, (b) the indicator selectively indicating the packet is a control packet, and (c) the source operand descriptor selectively indicating to terminate upon receipt of a control packet. 2. The method of claim 1 , wherein the packet is a first packet, the specifier is a first specifier, the source operand descriptor is further usable to indicate at least in part a second specifier of the plurality of specifiers, and further comprising selecting for processing a second packet associated with the respective set associated with the second specifier. 3. The method of claim 1 , wherein: the packet further comprises an index; the destination operand descriptor is usable to determine at least in part the index; the processing further comprises the at least one receiving processing element reading and executing one or more instructions; and the reading is from a memory of the at least one receiving processing element at an address determined based at least in part on the index. 4. The method of claim 3 , wherein the address is based at least in part on a portion of the index plus a base register of the at least one receiving processing element. 5. The method of claim 1 , wherein the specifier is a first specifier, the source operand descriptor is further usable to indicate at least in part a second specifier of the plurality of specifiers, the first specifier and the second specifier are respectively associated with a first task and a second task, and the first task and the second task respectively implement a first portion of a neural network and a second portion of the neural network. 6. The method of claim 5 , wherein the first portion of the neural network and the second portion of the neural network implement portions of one or more of: receiving activations of a neuron of a neural network, computing activations of a neuron of a neural network, transmitting activations of a neuron of a neural network, computing partial sums of activations of a neural network, receiving deltas of a neural network, computing deltas of a neural network, transmitting deltas of a neural network, receiving errors of a neural network, computing errors of a neural network, transmitting errors of a neural network, computing gradient estimates of a neural network, and updating weights of a neural network. 7. The method of claim 6 , wherein the sending processing element is one of a fabric of processing elements, each processing element of the fabric of processing elements comprises a fabric router and a compute engine collectively enabled to perform dataflow-based and instruction-based processing, and the fabric of processing elements is implemented via wafer-scale integration. 8. The method of claim 5 , wherein the second portion of the neural network is dependent on the first portion of the neural network. 9. The method of claim 1 , wherein one or more of the sending, the routing, the receiving, and the processing comprise transitioning between implementing a first portion of a neural network and implementing a second portion of the neural network. 10. The method of claim 1 , wherein the packet further comprises an index and further comprising: reading one or more instructions from a memory of the at least one receiving processing element; initiating execution of one of the one or more instructions; wherein, responsive to the indicator selectively indicating the packet is a control packet, the reading is in accordance with the index; and wherein, responsive to the indicator selectively indicating the packet is other than a control packet, the reading is in accordance with the specifier. 11. The method of claim 10 , wherein the sending processing element is one of a fabric of processing elements, and each processing element of the fabric of processing elements comprises a fabric router and a compute engine enabled to perform dataflow-based and instruction-based processing. 12. The method of claim 11 , wherein the fabric of processing elements is implemented via wafer-scale integration. 13. A method comprising: sending a first packet by a sending processing element to a fabric, the first packet comprising a first specifier and an indicator, the first specifier being one of a plurality of specifiers each associated with a respective set of one or more sets of packets, the indicator being enabled to selectively indicate that the first packet is a control packet, and the sending comprising executing a first instruction that comprises a destination operand that specifies a destination operand descriptor that is usable to determine at least in part the first specifier and the indicator; routing the first packet from the sending processing element to one or more receiving processing elements, the routing being in accordance with the first specifier and via the fabric and one or more routing elements; in at least one of the receiving processing elements, receiving the first packet from the fabric and processing the first packet; wherein the receiving comprises associating the first packet with the respective set associated with the first specifier; wherein the processing comprises executing a second instruction that comprises a source operand that specifies a source operand descriptor that is usable to indicate at least in part the first specifier, to indicate at least in part a second specifier of the plurality of specifiers, and to selectively indicate termination upon receipt of a control packet; and further comprising selecting for processing a second packet associated with the respective set associated with the second specifier, responsive to (a) the first packet being older than any other packets associated with the respective set associated with the first specifier, (b) the indicator selectively indicating the first packet is a control packet, and (c) the source operand descriptor selectively indicating to terminate upon receipt of a control packet. 14. The method of claim 13 , further comprising terminating the second instruction. 15. The method of claim 13 , wherein the first packet further comprises an index and further comprising: reading one or
Probabilistic or stochastic networks · CPC title
Learning methods · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.