Control wavelet for accelerated deep learning

US11727254B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11727254-B2
Application numberUS-202017005140-A
CountryUS
Kind codeB2
Filing dateAug 27, 2020
Priority dateApr 17, 2017
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: sending a packet by a sending processing element to a fabric, the packet comprising a specifier and an indicator, the specifier being one of a plurality of specifiers each associated with a respective set of one or more sets of packets, the indicator being enabled to selectively indicate that the packet is a control packet, and the sending comprising executing a first instruction that comprises a destination operand that specifies a destination operand descriptor that is usable to determine at least in part the specifier and the indicator; routing the packet from the sending processing element to one or more receiving processing elements, the routing being in accordance with the specifier and via the fabric and one or more routing elements; in at least one of the receiving processing elements, receiving the packet from the fabric and processing the packet; wherein the receiving comprises associating the packet with the respective set associated with the specifier; wherein the processing comprises executing a second instruction that comprises a source operand that specifies a source operand descriptor that is usable to indicate at least in part the specifier and to selectively indicate termination upon receipt of a control packet; and wherein the executing the second instruction comprises terminating the second instruction, responsive to (a) the packet being older than any other packets associated with the respective set associated with the specifier, (b) the indicator selectively indicating the packet is a control packet, and (c) the source operand descriptor selectively indicating to terminate upon receipt of a control packet. 2. The method of claim 1 , wherein the packet is a first packet, the specifier is a first specifier, the source operand descriptor is further usable to indicate at least in part a second specifier of the plurality of specifiers, and further comprising selecting for processing a second packet associated with the respective set associated with the second specifier. 3. The method of claim 1 , wherein: the packet further comprises an index; the destination operand descriptor is usable to determine at least in part the index; the processing further comprises the at least one receiving processing element reading and executing one or more instructions; and the reading is from a memory of the at least one receiving processing element at an address determined based at least in part on the index. 4. The method of claim 3 , wherein the address is based at least in part on a portion of the index plus a base register of the at least one receiving processing element. 5. The method of claim 1 , wherein the specifier is a first specifier, the source operand descriptor is further usable to indicate at least in part a second specifier of the plurality of specifiers, the first specifier and the second specifier are respectively associated with a first task and a second task, and the first task and the second task respectively implement a first portion of a neural network and a second portion of the neural network. 6. The method of claim 5 , wherein the first portion of the neural network and the second portion of the neural network implement portions of one or more of: receiving activations of a neuron of a neural network, computing activations of a neuron of a neural network, transmitting activations of a neuron of a neural network, computing partial sums of activations of a neural network, receiving deltas of a neural network, computing deltas of a neural network, transmitting deltas of a neural network, receiving errors of a neural network, computing errors of a neural network, transmitting errors of a neural network, computing gradient estimates of a neural network, and updating weights of a neural network. 7. The method of claim 6 , wherein the sending processing element is one of a fabric of processing elements, each processing element of the fabric of processing elements comprises a fabric router and a compute engine collectively enabled to perform dataflow-based and instruction-based processing, and the fabric of processing elements is implemented via wafer-scale integration. 8. The method of claim 5 , wherein the second portion of the neural network is dependent on the first portion of the neural network. 9. The method of claim 1 , wherein one or more of the sending, the routing, the receiving, and the processing comprise transitioning between implementing a first portion of a neural network and implementing a second portion of the neural network. 10. The method of claim 1 , wherein the packet further comprises an index and further comprising: reading one or more instructions from a memory of the at least one receiving processing element; initiating execution of one of the one or more instructions; wherein, responsive to the indicator selectively indicating the packet is a control packet, the reading is in accordance with the index; and wherein, responsive to the indicator selectively indicating the packet is other than a control packet, the reading is in accordance with the specifier. 11. The method of claim 10 , wherein the sending processing element is one of a fabric of processing elements, and each processing element of the fabric of processing elements comprises a fabric router and a compute engine enabled to perform dataflow-based and instruction-based processing. 12. The method of claim 11 , wherein the fabric of processing elements is implemented via wafer-scale integration. 13. A method comprising: sending a first packet by a sending processing element to a fabric, the first packet comprising a first specifier and an indicator, the first specifier being one of a plurality of specifiers each associated with a respective set of one or more sets of packets, the indicator being enabled to selectively indicate that the first packet is a control packet, and the sending comprising executing a first instruction that comprises a destination operand that specifies a destination operand descriptor that is usable to determine at least in part the first specifier and the indicator; routing the first packet from the sending processing element to one or more receiving processing elements, the routing being in accordance with the first specifier and via the fabric and one or more routing elements; in at least one of the receiving processing elements, receiving the first packet from the fabric and processing the first packet; wherein the receiving comprises associating the first packet with the respective set associated with the first specifier; wherein the processing comprises executing a second instruction that comprises a source operand that specifies a source operand descriptor that is usable to indicate at least in part the first specifier, to indicate at least in part a second specifier of the plurality of specifiers, and to selectively indicate termination upon receipt of a control packet; and further comprising selecting for processing a second packet associated with the respective set associated with the second specifier, responsive to (a) the first packet being older than any other packets associated with the respective set associated with the first specifier, (b) the indicator selectively indicating the first packet is a control packet, and (c) the source operand descriptor selectively indicating to terminate upon receipt of a control packet. 14. The method of claim 13 , further comprising terminating the second instruction. 15. The method of claim 13 , wherein the first packet further comprises an index and further comprising: reading one or

Assignees

Inventors

Classifications

  • Probabilistic or stochastic networks · CPC title

  • Learning methods · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11727254B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A c…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).