Task pooling and work affinity in data processing
US-2016098296-A1 · Apr 7, 2016 · US
US11475282B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11475282-B2 |
| Application number | US-201816603950-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 17, 2018 |
| Priority date | Apr 17, 2017 |
| Publication date | Oct 18, 2022 |
| Grant date | Oct 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.
Opening claim text (preview).
What is claimed is: 1. A processing element comprising: a plurality of point-to-point physical couplings connecting respective processing element neighbors comprised in an interconnected fabric of instances of the processing element, wherein each processing element is enabled to transmit one or more fabric packets over a selected one of a plurality of virtual channels comprising configurable groups of one or more of the point-to-point couplings, an identifier of each transmitted fabric packet selecting one of the virtual channels, the virtual channels being associated with a plurality of virtual queues enabled to store a respective number of the fabric packets; a decoder enabled to decode instructions comprising non-iteration type instructions associated with a single instruction-operation event and iteration type instructions associated with one or more instruction-operation events, each of the iteration type instructions being associated with a respective one or more data elements, each of the instruction-operation events of each of the iteration type instructions being associated with at least a respective one of the data elements; scheduling and picking logic enabled to select one or more of the instructions for providing to the decoder; data structure registers dedicated to storing data structure descriptors; iteration instruction state storage dedicated to state information for partially executed iteration type instructions, the iteration instruction state storage being separate from and coupled to the scheduling and picking logic and further separate from and coupled to the data structure registers; first means for determining that at least one of one or more operands of a current instruction of the iteration type instructions is unavailable; second means for examining respective indicators associated with each of the operands, and responsive to at least one of the indicators being in a first of at least two states, conditionally saving information about the current instruction in the iteration instruction state storage, and responsive to neither of the indicators being in the first state, waiting until all the operands become available, the second means being responsive to the first means; third means for executing one or more next instructions until at least all the operands become available, subsequently restoring at least a portion of the information, and performing an iteration of the current instruction, the third means being responsive to the second means, wherein the iteration comprises performing one of the instruction-operation events of the iteration type instructions; and wherein the current instruction specifies a particular one of a plurality of operand descriptors and the particular operand descriptor comprises fields encoding a plurality of attributes and parameters, the attributes and parameters comprising a data structure type, the number of the data elements, and at least one identifier identifying one of the virtual channels, the particular operand descriptor is stored in one of the data structure registers, and the information comprises the identifier. 2. The processing element of claim 1 , wherein the processing element is one of a fabric of processing elements, each processing element comprising a fabric router and a compute engine enabled to perform dataflow-based and instruction-based processing. 3. The processing element of claim 2 , wherein the fabric of processing elements is implemented via wafer-scale integration. 4. The processing element of claim 1 , wherein responsive to the at least one of the operands being a source operand, the source operand is inaccessible responsive to data for the source operand being unavailable from a portion of the virtual queues operating as one or more virtual input queues. 5. The processing element of claim 1 , wherein responsive to the at least one of the operands being a destination operand, the destination operand is inaccessible responsive to space to store the destination operand being unavailable in a portion of the virtual queues operating as one or more virtual output queues. 6. The processing element of claim 1 , wherein the current instruction implements at least a portion of one or more of: computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. 7. The processing element of claim 1 , wherein the one of the virtual channels implements at least a portion of a connection between a plurality of portions of a neuron of a neural network. 8. A processing element comprising: a plurality of point-to-point physical couplings connecting respective processing element neighbors comprised in interconnected fabric of instances of the processing element, wherein each processing element is enabled to transmit one or more fabric packets over a selected one of a plurality of virtual channels comprising configurable groups of one or more of the point-to-point couplings, an identifier of each transmitted fabric packet selecting one of the virtual channels; a decoder enabled to decode instructions comprising non-iteration type instructions associated with a single instruction-operation event and iteration type instructions associated with one or more instruction-operation events, each of the iteration type instructions being associated with a respective one or more data elements, each of the instruction-operation events of each of the iteration type instructions being associated with at least a respective one of the data elements; scheduling and picking logic enabled to select one or more of the instructions for providing to the decoder; data structure registers dedicated to storing data structure descriptors; iteration instruction state storage dedicated to state information for partially executed iteration type instructions, the iteration instruction state storage being separate from and coupled to the scheduling and picking logic and further separate from and coupled to the data structure registers; means for at least partially executing a particular instruction of the iteration type instructions, the at least partially executing comprising performing less than all of the instruction-operation events of the particular instruction, wherein the particular instruction comprises an instruction operand specifying a particular one of a plurality of operand descriptors, the particular operand descriptor comprises fields encoding a plurality of attributes and parameters, the attributes and parameters comprising a data structure type, the number of the data elements, and at least one identifier identifying one of the virtual channels associated with the data elements, and the particular operand descriptor is stored in one of the data structure registers, and the means for at least partially executing comprises means for determining accessibility of the number of the data elements in accordance with the particular operand descriptor; means for conditionally saving in the iteration instruction state storage at least the identifier and at least a portion of the particular instruction, and the means for conditionally saving is responsive to the means for determining accessibility; means for resuming execution of the particular instruction, the resuming execution comprising performing at least some of the remaining instruction-operation events of the plurality of instruction-operation events of the particular instruction; and means for conditionally at least partially executing one or more instructions other than the particular instruction after the conditionally saving and before the resuming. 9. The p
Related publications grouped by family.
Answers are generated from the same data shown on this page.