Tri-configuration neural network unit
US-2017103300-A1 · Apr 13, 2017 · US
US9959498B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9959498-B1 |
| Application number | US-201615336216-A |
| Country | US |
| Kind code | B1 |
| Filing date | Oct 27, 2016 |
| Priority date | Oct 27, 2016 |
| Publication date | May 1, 2018 |
| Grant date | May 1, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method for accelerating inference computations for a neural network having a plurality of neural network layers and using a system comprising multiple hardware compute units, the method comprising: providing, by a controller of the system, a respective single instruction to each hardware compute unit of the multiple hardware compute units, wherein each single instruction provided to the hardware compute units: encodes a neural network layer type of a particular neural network layer of the plurality of neural network layers and identifies data values for performing a tensor computation for the particular neural network layer; receiving, by a respective processing unit in each of the hardware compute units, the respective single instruction provided by the controller of the system to the hardware compute unit; generating, by the respective processing unit in each of the hardware compute units, a respective loop nest comprising a plurality of loops, wherein the loop nest is generated based on the neural network layer type encoded by the single instruction; and performing, by the respective processing unit in each of the hardware compute units, a respective portion of the tensor computation by executing Hall the loop nest comprising Hall the plurality of loops generated by the processing unit, wherein the respective portion of the tensor computation performed by each of the hardware compute units is at least a subset of computations for a respective partition of the particular neural network layer. 2. The method of claim 1 , wherein: a structure of the loop nest is defined at least in part by the neural network layer type of the particular neural network layer; and executing the loop nest comprises accessing operands that correspond to elements of at least one multi-dimensional tensor. 3. The method of claim 1 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one element of a particular dimension of a tensor that includes at least three distinct dimensions, the element being a part of at least one index used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 4. The method of claim 1 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one memory address of an array in a storage medium, the memory address of the array comprising a variable that is read by the respective processing unit during performance of the subset of computations for the respective partition of the particular neural network layer. 5. The method of claim 4 , wherein performing the tensor computation comprises, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to a tensor traversal unit (TTU) of the hardware compute unit to cause the TTU to emit loop indices used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 6. The method of claim 4 , further comprising, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to the TTU to cause an array reference of the TTU to generate an address for a referenced array element used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 7. The method of claim 6 , wherein the single instruction indicates a first TTU counter that is summed with a second TTU counter to generate an address for an array reference associated with the TTU. 8. The method of claim 1 , wherein performing the subset of computations for the partition of the neural network layer comprises, executing, by the respective processing unit in each of the hardware compute units, a sync procedure that manages operands associated with performance of the subset of computations, wherein managing an operand comprises stalling one or more loop nests based on a sync flag condition. 9. An electronic system comprising multiple hardware compute units for accelerating inference computations for a neural network having a plurality of neural network layers, the electronic system comprising: at least one processing unit, the at least one processing unit including one or more processing devices; and one or more non-transitory machine-readable storage devices for storing instructions that are executable by the one or more processing devices to cause performance of operations comprising: providing, by a controller of the electronic system, a respective single instruction to each hardware compute unit of the multiple hardware compute units, wherein each single instruction provided to the hardware compute units: encodes a neural network layer type of a particular neural network layer of the plurality of neural network layers and identifies data values for performing a tensor computation for the particular neural network layer; receiving, by a respective processing unit in each of the hardware compute units, the respective single instruction provided by the controller of the electronic system; generating, by the respective processing unit in each of the hardware compute units, a respective loop nest comprising a plurality of loops, wherein the loop nest is generated based on the neural network layer type encoded by the single instruction; and performing, by the respective processing unit in each of the hardware compute units, a respective portion of the tensor computation by executing the loop nest comprising the plurality of loops generated by the processing unit, wherein the respective portion of the tensor computation performed by each of the hardware compute units is at least a subset of computations for a respective partition of the particular neural network layer. 10. The electronic system of claim 9 , wherein: a structure of the loop nest is defined at least in part by the neural network layer type of the particular neural network layer; and executing the loop nest comprises accessing operands that correspond to elements of at least one multi-dimensional tensor. 11. The electronic system of claim 9 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one element of a particular dimension of a tensor that includes at least three distinct dimensions, the element being a part of at least one index used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 12. The electronic system of claim 9 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one memory address of an array in a storage medium, the memory address of the array comprising a variable that is read by the respective processing unit during performance of the subset of computations for the respective partition of the particular neural network layer. 13. The electronic system of claim 12 , wherein performing the tensor computation comprises, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to a tensor traversal unit (TTU) of the hardware compute unit to cause the TTU to emit loop indices used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 1
Combinations of networks · CPC title
using electronic means · CPC title
using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title
Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title
Neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.