Scaling multi-core neurosynaptic networks across chip boundaries
US-2016224889-A1 · Aug 4, 2016 · US
US11580394B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11580394-B2 |
| Application number | US-202016911203-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 24, 2020 |
| Priority date | Feb 23, 2017 |
| Publication date | Feb 14, 2023 |
| Grant date | Feb 14, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.
Opening claim text (preview).
What is claimed is: 1. A method comprising: training a neural network comprising a plurality of ordered, connected layers; wherein the order identifies for each respective layer which others of the layers are prior to the respective layer and which others of the layers are subsequent to the respective layer; wherein each layer comprises one or more neurons, each neuron comprising weights, and connected to at least one of at least one prior neuron of a prior layer and at least one subsequent neuron of a subsequent layer; and wherein each neuron is implemented by one or more processing elements, each processing element comprising at least one coupling to a fabric, the processing element being enabled to communicate via the fabric via a plurality of virtual channels, a first memory enabled to store instructions corresponding to at least computations of the neuron, a second memory enabled to store the weights, and hardware execution resources enabled to execute instructions from the respective first memory and access data from the respective second memory. 2. The method of claim 1 , wherein each of the layers is a respective internal layer of the neural network, and the neural network further comprises an input layer and an output layer. 3. The method of claim 1 , wherein the training comprises: determining a second activation based on a first activation and first weights; determining and saving second weights based on a first delta and the first weights; determining a fourth activation based on a third activation and selected weights, wherein the selected weights are dynamically selected from the first weights and the second weights; and determining and saving third weights based on a second delta and the selected weights. 4. The method of claim 3 , wherein the determining the second activation comprises: receiving the first activation via the fabric from the at least one prior neuron; computing the second activation based at least in part on the first activation and the first weights by at least executing first instructions stored in the first memory and accessing the first weights in the second memory; and selectively transmitting the second activation via the fabric to the at least one subsequent neuron. 5. The method of claim 4 , wherein the determining the fourth activation comprises: receiving the third activation via the fabric from the at least one prior neuron; computing the fourth activation based at least in part on the third activation and the selected weights by at least executing the first instructions and accessing the selected weights in the second memory; and selectively transmitting the fourth activation via the fabric to the at least one subsequent neuron. 6. The method of claim 5 , wherein the determining and saving the second weights comprises: receiving the first delta that is partially based on the second activation via the fabric from the at least one subsequent neuron; computing a first gradient based at least in part on the first delta and the second activation by at least executing second instructions stored in the first memory; computing the second weights based at least in part on the first gradient, a learning rule, and the first weights by at least executing third instructions stored in the first memory and accessing the first weights in the second memory; and storing the second weights in the second memory. 7. The method of claim 6 , wherein the determining and saving the third weights comprises: receiving the second delta that is partially based on the fourth activation via the fabric from the at least one subsequent neuron; computing a second gradient based at least in part on a third delta and the fourth activation by at least executing the second instructions stored in the first memory; computing the third weights based at least in part on the second gradient, the learning rule and the selected weights by at least executing the third instructions stored in the first memory and accessing the selected weights in the second memory; and storing the third weights in the second memory. 8. The method of claim 7 , wherein the computing the second gradient additionally comprises optionally recomputing the fourth activation based at least in part upon the selected weights. 9. The method of claim 7 , wherein the computing the first gradient additionally comprises optionally recomputing the second activation based at least in part upon the first weights. 10. The method of claim 5 , wherein the selectively transmitting the second activation and the selectively transmitting the fourth activation are selectively based upon the respective values of the second activation and the fourth activation. 11. The method of claim 5 , wherein the selectively transmitting the second activation and the selectively transmitting the fourth activation are selectively based upon the respective absolute values of the second activation and the fourth activation exceeding respective first and second thresholds. 12. The method of claim 3 , wherein the determining and saving the second weights comprises: receiving the first delta that is partially based on the second activation via the fabric from the at least one subsequent neuron; computing a first gradient based at least in part on the first delta and the second activation by at least executing second instructions stored in the first memory; computing the second weights based at least in part on the first gradient, a learning rule, and the first weights by at least executing third instructions stored in the first memory and accessing the first weights in the second memory; and storing the second weights in the second memory. 13. The method of claim 12 , wherein the determining and saving the third weights comprises: receiving the second delta that is partially based on the fourth activation via the fabric from the at least one subsequent neuron; computing a second gradient based at least in part on a third delta and the fourth activation by at least executing the second instructions stored in the first memory; computing the third weights based at least in part on the second gradient, the learning rule and the selected weights by at least executing the third instructions stored in the first memory and accessing the selected weights in the second memory; and storing the third weights in the second memory. 14. The method of claim 13 , wherein the determining the fourth activation additionally comprises storing the fourth activation in the second memory and the computing the second gradient additionally comprises accessing the fourth activation in the second memory. 15. The method of claim 12 , wherein the selected weights are dynamically selected in accordance with which of the first weights and the second weights were stored most recently. 16. The method of claim 1 , wherein the method is carried out via a substantially whole wafer comprising the processing elements. 17. An apparatus comprising: a plurality of processing elements; a training workload comprising a set of machine codes selected from a predefined native instruction set of codes for performing training of a neural network comprising a plurality of ordered, connected layers; wherein the order identifies for each respective layer which others of the layers are prior to the respective layer and which others of the layers are subsequent to the respective layer; wherein each layer comprises one or more neurons, each neuron comprising weights, and connected to at least one of at least one prior neuron of a prior layer an
Distributed learning, e.g. federated learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.