ISA enhancements for accelerated deep learning

US11321087B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11321087-B2
Application numberUS-201917271532-A
CountryUS
Kind codeB2
Filing dateAug 27, 2019
Priority dateAug 29, 2018
Publication dateMay 3, 2022
Grant dateMay 3, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element is enabled to execute instructions in accordance with an ISA. The ISA is enhanced in accordance with improvements with respect to deep learning acceleration.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: means for decoding a vector instruction in a receiving processing element, the vector instruction comprising an operation and an instruction source operand specifying a source operand descriptor stored in a register of the receiving processing element, wherein the source operand descriptor is enabled to specify one of at least a memory vector type and a fabric vector type; means for type-determining that the instruction source operand, as specified by the source operand descriptor, is an instance of the fabric vector type and corresponds to a fabric vector; means for associating, responsive to the means for type-determining, a virtual input queue and a respective receive virtual channel with the fabric vector; means for receiving at least one fabric packet comprising a number of input data elements of the fabric vector in the virtual input queue via a fabric and in accordance with the receive virtual channel; means for selectively reading the number of input data elements from the virtual input queue; means for selectively performing an iteration of the operation using the number of input data elements; means for conditionally terminating a task comprising the vector instruction; and means for carrying out one of conditionally activating and conditionally unblocking one of a plurality of virtual channels specified by the source operand descriptor. 2. The system of claim 1 , wherein a first of the plurality of virtual channels is an activate virtual channel and the means for carrying out comprises means for activating the activate virtual channel responsive to terminating the task. 3. The system of claim 2 , wherein a second of the plurality of virtual channels is an unblock virtual channel and the means for carrying out comprises means for unblocking the unblock virtual channel responsive to terminating the task. 4. The system of claim 2 , wherein the activate virtual channel is a first activate virtual channel, a second of the plurality of virtual channels is a second activate virtual channel, and the means for carrying out comprises means for activating the second activate virtual channel responsive to terminating the task. 5. The system of claim 2 , wherein the conditionally activating is responsive to the source operand descriptor specifying a terminate-on-control mode, the at least one fabric packet comprises a control fabric packet, and the control fabric packet is an oldest fabric packet in the virtual input queue. 6. The system of claim 1 , wherein: a first of the plurality of virtual channels is unconditionally interpreted as a first activate virtual channel, a second of the plurality of virtual channels is selectively interpreted as a second activate virtual channel responsive to an indicator being in a first state, the second of the plurality of virtual channels is selectively interpreted as an unblock virtual channel responsive to the indicator being in a second state, and the source operand descriptor specifies the indicator. 7. The system of claim 1 , wherein a programmable processor comprises the receiving processing element, the programmable processor is one of a plurality of like programmable processors fabricated on a wafer in accordance with wafer-scale integration, and a datacenter element enabled to perform neural network processing comprises the wafer. 8. A system comprising: means for decoding a vector instruction in a programmable processor, the vector instruction comprising an operation and an instruction operand specifying an operand descriptor stored in a register file of the programmable processor, wherein the operand descriptor is enabled to specify one of at least a memory vector type and a fabric vector type; means for type-determining that the instruction operand, as specified by the operand descriptor, is an instance of the memory vector type and corresponds to a FIFO memory buffer; means for performing an iteration of the operation; means for accessing the FIFO memory buffer in accordance with the means for performing; means for setting an indicator, responsive to the means for accessing producing a FIFO event, to a required number of data elements; means for suspending further iterations of the operation until the required number of data elements have been processed; and wherein the operand descriptor comprises the indicator. 9. The system of claim 8 , wherein the means for performing comprises means for popping data elements from the FIFO memory buffer, the means for accessing comprises means for reading the FIFO memory buffer, the FIFO event corresponds to the FIFO memory buffer becoming empty, and the processing of the required number of data elements corresponds to pushing data elements onto the FIFO memory buffer. 10. The system of claim 8 , wherein the means for performing comprises means for pushing data elements onto the FIFO memory buffer, the means for accessing comprises means for writing the FIFO memory buffer, the FIFO event corresponds to the FIFO memory buffer becoming full, and the processing of the required number of data elements corresponds to popping data elements from the FIFO memory buffer. 11. The system of claim 8 , wherein the operand descriptor specifies a virtual channel to conditionally activate responsive to the means for performing. 12. The system of claim 11 , wherein: the conditionally activate comprises responsive to a specifier being in a first state, activate on any iteration, and responsive to the specifier being in a second state, activate only on an iteration resulting in the indicator transitioning to zero; and the operand descriptor comprises the specifier. 13. The system of claim 8 , wherein the programmable processor is one of a plurality of like programmable processors fabricated on a wafer in accordance with wafer-scale integration, and a datacenter element enabled to perform neural network processing comprises the wafer. 14. A method comprising: decoding a vector instruction in a programmable processor, the vector instruction comprising an operation and an instruction operand specifying an operand descriptor stored in a register file of the programmable processor, wherein the operand descriptor is enabled to specify one of at least a memory vector type and a fabric vector type; type-determining that the instruction operand, as specified by the operand descriptor, is an instance of the memory vector type and corresponds to a FIFO memory buffer; performing an iteration of the operation; accessing the FIFO memory buffer in accordance with the performing; responsive to the accessing resulting in a FIFO event, setting an indicator to a required number of data elements; suspending further iterations of the operation until the required number of data elements have been processed; and wherein the operand descriptor comprises the indicator. 15. The method of claim 14 , wherein the performing comprises popping data elements from the FIFO memory buffer, the accessing comprises reading the FIFO memory buffer, the FIFO event corresponds to the FIFO memory buffer becoming empty, and the processing of the required number of data elements corresponds to pushing data elements onto the FIFO memory buffer. 16. The method of claim 14 , wherein the performing comprises pushing data elements onto the FIFO memory buffer, the accessing comprises writing the FIFO memory buffer, the FIFO event corresponds to the FIFO memory buffer becoming full, and the processing of the required number of data elements corresponds to popping data elements from the FIFO memory buffer

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Activation functions · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11321087B2 cover?
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element is enabled to execute in…
Who is the assignee on this patent?
Cerebras Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 03 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).