Accelerated deep learning

US11580394B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11580394-B2
Application numberUS-202016911203-A
CountryUS
Kind codeB2
Filing dateJun 24, 2020
Priority dateFeb 23, 2017
Publication dateFeb 14, 2023
Grant dateFeb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: training a neural network comprising a plurality of ordered, connected layers; wherein the order identifies for each respective layer which others of the layers are prior to the respective layer and which others of the layers are subsequent to the respective layer; wherein each layer comprises one or more neurons, each neuron comprising weights, and connected to at least one of at least one prior neuron of a prior layer and at least one subsequent neuron of a subsequent layer; and wherein each neuron is implemented by one or more processing elements, each processing element comprising at least one coupling to a fabric, the processing element being enabled to communicate via the fabric via a plurality of virtual channels, a first memory enabled to store instructions corresponding to at least computations of the neuron, a second memory enabled to store the weights, and hardware execution resources enabled to execute instructions from the respective first memory and access data from the respective second memory. 2. The method of claim 1 , wherein each of the layers is a respective internal layer of the neural network, and the neural network further comprises an input layer and an output layer. 3. The method of claim 1 , wherein the training comprises: determining a second activation based on a first activation and first weights; determining and saving second weights based on a first delta and the first weights; determining a fourth activation based on a third activation and selected weights, wherein the selected weights are dynamically selected from the first weights and the second weights; and determining and saving third weights based on a second delta and the selected weights. 4. The method of claim 3 , wherein the determining the second activation comprises: receiving the first activation via the fabric from the at least one prior neuron; computing the second activation based at least in part on the first activation and the first weights by at least executing first instructions stored in the first memory and accessing the first weights in the second memory; and selectively transmitting the second activation via the fabric to the at least one subsequent neuron. 5. The method of claim 4 , wherein the determining the fourth activation comprises: receiving the third activation via the fabric from the at least one prior neuron; computing the fourth activation based at least in part on the third activation and the selected weights by at least executing the first instructions and accessing the selected weights in the second memory; and selectively transmitting the fourth activation via the fabric to the at least one subsequent neuron. 6. The method of claim 5 , wherein the determining and saving the second weights comprises: receiving the first delta that is partially based on the second activation via the fabric from the at least one subsequent neuron; computing a first gradient based at least in part on the first delta and the second activation by at least executing second instructions stored in the first memory; computing the second weights based at least in part on the first gradient, a learning rule, and the first weights by at least executing third instructions stored in the first memory and accessing the first weights in the second memory; and storing the second weights in the second memory. 7. The method of claim 6 , wherein the determining and saving the third weights comprises: receiving the second delta that is partially based on the fourth activation via the fabric from the at least one subsequent neuron; computing a second gradient based at least in part on a third delta and the fourth activation by at least executing the second instructions stored in the first memory; computing the third weights based at least in part on the second gradient, the learning rule and the selected weights by at least executing the third instructions stored in the first memory and accessing the selected weights in the second memory; and storing the third weights in the second memory. 8. The method of claim 7 , wherein the computing the second gradient additionally comprises optionally recomputing the fourth activation based at least in part upon the selected weights. 9. The method of claim 7 , wherein the computing the first gradient additionally comprises optionally recomputing the second activation based at least in part upon the first weights. 10. The method of claim 5 , wherein the selectively transmitting the second activation and the selectively transmitting the fourth activation are selectively based upon the respective values of the second activation and the fourth activation. 11. The method of claim 5 , wherein the selectively transmitting the second activation and the selectively transmitting the fourth activation are selectively based upon the respective absolute values of the second activation and the fourth activation exceeding respective first and second thresholds. 12. The method of claim 3 , wherein the determining and saving the second weights comprises: receiving the first delta that is partially based on the second activation via the fabric from the at least one subsequent neuron; computing a first gradient based at least in part on the first delta and the second activation by at least executing second instructions stored in the first memory; computing the second weights based at least in part on the first gradient, a learning rule, and the first weights by at least executing third instructions stored in the first memory and accessing the first weights in the second memory; and storing the second weights in the second memory. 13. The method of claim 12 , wherein the determining and saving the third weights comprises: receiving the second delta that is partially based on the fourth activation via the fabric from the at least one subsequent neuron; computing a second gradient based at least in part on a third delta and the fourth activation by at least executing the second instructions stored in the first memory; computing the third weights based at least in part on the second gradient, the learning rule and the selected weights by at least executing the third instructions stored in the first memory and accessing the selected weights in the second memory; and storing the third weights in the second memory. 14. The method of claim 13 , wherein the determining the fourth activation additionally comprises storing the fourth activation in the second memory and the computing the second gradient additionally comprises accessing the fourth activation in the second memory. 15. The method of claim 12 , wherein the selected weights are dynamically selected in accordance with which of the first weights and the second weights were stored most recently. 16. The method of claim 1 , wherein the method is carried out via a substantially whole wafer comprising the processing elements. 17. An apparatus comprising: a plurality of processing elements; a training workload comprising a set of machine codes selected from a predefined native instruction set of codes for performing training of a neural network comprising a plurality of ordered, connected layers; wherein the order identifies for each respective layer which others of the layers are prior to the respective layer and which others of the layers are subsequent to the respective layer; wherein each layer comprises one or more neurons, each neuron comprising weights, and connected to at least one of at least one prior neuron of a prior layer an

Assignees

Inventors

Classifications

  • G06N3/098Primary

    Distributed learning, e.g. federated learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580394B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/098. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).