What technology area does this patent fall under?

Primary CPC classification G06N3/045. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural network instruction set architecture

US9959498B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9959498-B1
Application number	US-201615336216-A
Country	US
Kind code	B1
Filing date	Oct 27, 2016
Priority date	Oct 27, 2016
Publication date	May 1, 2018
Grant date	May 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for accelerating inference computations for a neural network having a plurality of neural network layers and using a system comprising multiple hardware compute units, the method comprising: providing, by a controller of the system, a respective single instruction to each hardware compute unit of the multiple hardware compute units, wherein each single instruction provided to the hardware compute units: encodes a neural network layer type of a particular neural network layer of the plurality of neural network layers and identifies data values for performing a tensor computation for the particular neural network layer; receiving, by a respective processing unit in each of the hardware compute units, the respective single instruction provided by the controller of the system to the hardware compute unit; generating, by the respective processing unit in each of the hardware compute units, a respective loop nest comprising a plurality of loops, wherein the loop nest is generated based on the neural network layer type encoded by the single instruction; and performing, by the respective processing unit in each of the hardware compute units, a respective portion of the tensor computation by executing Hall the loop nest comprising Hall the plurality of loops generated by the processing unit, wherein the respective portion of the tensor computation performed by each of the hardware compute units is at least a subset of computations for a respective partition of the particular neural network layer. 2. The method of claim 1 , wherein: a structure of the loop nest is defined at least in part by the neural network layer type of the particular neural network layer; and executing the loop nest comprises accessing operands that correspond to elements of at least one multi-dimensional tensor. 3. The method of claim 1 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one element of a particular dimension of a tensor that includes at least three distinct dimensions, the element being a part of at least one index used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 4. The method of claim 1 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one memory address of an array in a storage medium, the memory address of the array comprising a variable that is read by the respective processing unit during performance of the subset of computations for the respective partition of the particular neural network layer. 5. The method of claim 4 , wherein performing the tensor computation comprises, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to a tensor traversal unit (TTU) of the hardware compute unit to cause the TTU to emit loop indices used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 6. The method of claim 4 , further comprising, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to the TTU to cause an array reference of the TTU to generate an address for a referenced array element used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 7. The method of claim 6 , wherein the single instruction indicates a first TTU counter that is summed with a second TTU counter to generate an address for an array reference associated with the TTU. 8. The method of claim 1 , wherein performing the subset of computations for the partition of the neural network layer comprises, executing, by the respective processing unit in each of the hardware compute units, a sync procedure that manages operands associated with performance of the subset of computations, wherein managing an operand comprises stalling one or more loop nests based on a sync flag condition. 9. An electronic system comprising multiple hardware compute units for accelerating inference computations for a neural network having a plurality of neural network layers, the electronic system comprising: at least one processing unit, the at least one processing unit including one or more processing devices; and one or more non-transitory machine-readable storage devices for storing instructions that are executable by the one or more processing devices to cause performance of operations comprising: providing, by a controller of the electronic system, a respective single instruction to each hardware compute unit of the multiple hardware compute units, wherein each single instruction provided to the hardware compute units: encodes a neural network layer type of a particular neural network layer of the plurality of neural network layers and identifies data values for performing a tensor computation for the particular neural network layer; receiving, by a respective processing unit in each of the hardware compute units, the respective single instruction provided by the controller of the electronic system; generating, by the respective processing unit in each of the hardware compute units, a respective loop nest comprising a plurality of loops, wherein the loop nest is generated based on the neural network layer type encoded by the single instruction; and performing, by the respective processing unit in each of the hardware compute units, a respective portion of the tensor computation by executing the loop nest comprising the plurality of loops generated by the processing unit, wherein the respective portion of the tensor computation performed by each of the hardware compute units is at least a subset of computations for a respective partition of the particular neural network layer. 10. The electronic system of claim 9 , wherein: a structure of the loop nest is defined at least in part by the neural network layer type of the particular neural network layer; and executing the loop nest comprises accessing operands that correspond to elements of at least one multi-dimensional tensor. 11. The electronic system of claim 9 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one element of a particular dimension of a tensor that includes at least three distinct dimensions, the element being a part of at least one index used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 12. The electronic system of claim 9 , wherein the single instruction causes the respective processing unit in each of the hardware compute units to access at least one memory address of an array in a storage medium, the memory address of the array comprising a variable that is read by the respective processing unit during performance of the subset of computations for the respective partition of the particular neural network layer. 13. The electronic system of claim 12 , wherein performing the tensor computation comprises, providing, by the respective processing unit in each of the hardware compute units, at least one control signal to a tensor traversal unit (TTU) of the hardware compute unit to cause the TTU to emit loop indices used in executing the loop nest during performance of the subset of computations for the respective partition of the particular neural network layer. 1

Assignees

Google Llc

Inventors

Classifications

G06N3/045Primary
Combinations of networks · CPC title
G06N3/063Primary
using electronic means · CPC title
G06F13/28
using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title
G06F9/30
Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title
G06N3/02Primary
Neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 60452227

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9959498B1 cover?: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on o…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).