Neural network instruction set architecture

US9959498B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9959498-B1
Application numberUS-201615336216-A
CountryUS
Kind codeB1
Filing dateOct 27, 2016
Priority dateOct 27, 2016
Publication dateMay 1, 2018
Grant dateMay 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for accelerating inference computations for a neural network having a plurality of neural network layers and using a system comprising multiple hardware compute units, the method comprising: providing, by a controller of the system, a respective single instruction to each hardware compute unit of the multiple hardware compute units, wherein each single instruction provided to the hardware compute units: encodes a neural network layer type of a particular neural network layer of the plurality of neural network layers and identifies data values for performing a tensor computation for the particular neural network layer; receiving, by a respective processing unit in each of the hardware compute units, the respective single instruction provided by the controller of the system to the hardware compute unit; generating, by the respective processing unit in each of the hardware compute units, a respective loop nest comprising a plurality of loops, wherein the loop nest is generated based on the neural network layer type encoded by the single instruction; and performing, by the respective processing unit in each of the hardware compute units, a respective portion of the tensor computation by executing Hall the loop nest comprising Hall the plurality of loops generated by the processing unit, wherein the respective portion of the tensor computation performed by each of the hardware compute units is at least a subset of computations for a respective partition of the particular neural network layer. 2. The method of claim 1 , wherein: a structure of the loop nest is defined at least in part by the neural network layer type of the particular neural network layer; and executing the loop nest comprises accessing operands that correspond to elements of at least one multi-dimensional tensor. 3. The method of claim 1 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one element of a particular dimension of a tensor that includes at least three distinct dimensions, the element being a part of at least one index used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 4. The method of claim 1 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one memory address of an array in a storage medium, the memory address of the array comprising a variable that is read by the respective processing unit during performance of the subset of computations for the respective partition of the particular neural network layer. 5. The method of claim 4 , wherein performing the tensor computation comprises, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to a tensor traversal unit (TTU) of the hardware compute unit to cause the TTU to emit loop indices used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 6. The method of claim 4 , further comprising, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to the TTU to cause an array reference of the TTU to generate an address for a referenced array element used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 7. The method of claim 6 , wherein the single instruction indicates a first TTU counter that is summed with a second TTU counter to generate an address for an array reference associated with the TTU. 8. The method of claim 1 , wherein performing the subset of computations for the partition of the neural network layer comprises, executing, by the respective processing unit in each of the hardware compute units, a sync procedure that manages operands associated with performance of the subset of computations, wherein managing an operand comprises stalling one or more loop nests based on a sync flag condition. 9. An electronic system comprising multiple hardware compute units for accelerating inference computations for a neural network having a plurality of neural network layers, the electronic system comprising: at least one processing unit, the at least one processing unit including one or more processing devices; and one or more non-transitory machine-readable storage devices for storing instructions that are executable by the one or more processing devices to cause performance of operations comprising: providing, by a controller of the electronic system, a respective single instruction to each hardware compute unit of the multiple hardware compute units, wherein each single instruction provided to the hardware compute units: encodes a neural network layer type of a particular neural network layer of the plurality of neural network layers and identifies data values for performing a tensor computation for the particular neural network layer; receiving, by a respective processing unit in each of the hardware compute units, the respective single instruction provided by the controller of the electronic system; generating, by the respective processing unit in each of the hardware compute units, a respective loop nest comprising a plurality of loops, wherein the loop nest is generated based on the neural network layer type encoded by the single instruction; and performing, by the respective processing unit in each of the hardware compute units, a respective portion of the tensor computation by executing the loop nest comprising the plurality of loops generated by the processing unit, wherein the respective portion of the tensor computation performed by each of the hardware compute units is at least a subset of computations for a respective partition of the particular neural network layer. 10. The electronic system of claim 9 , wherein: a structure of the loop nest is defined at least in part by the neural network layer type of the particular neural network layer; and executing the loop nest comprises accessing operands that correspond to elements of at least one multi-dimensional tensor. 11. The electronic system of claim 9 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one element of a particular dimension of a tensor that includes at least three distinct dimensions, the element being a part of at least one index used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 12. The electronic system of claim 9 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one memory address of an array in a storage medium, the memory address of the array comprising a variable that is read by the respective processing unit during performance of the subset of computations for the respective partition of the particular neural network layer. 13. The electronic system of claim 12 , wherein performing the tensor computation comprises, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to a tensor traversal unit (TTU) of the hardware compute unit to cause the TTU to emit loop indices used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 1

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title

  • Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title

  • G06N3/02Primary

    Neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9959498B1 cover?
A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on o…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).