Accessing data in multi-dimensional tensors
US-9875104-B2 · Jan 23, 2018 · US
US11422801B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11422801-B2 |
| Application number | US-201916239760-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 4, 2019 |
| Priority date | Oct 27, 2016 |
| Publication date | Aug 23, 2022 |
| Grant date | Aug 23, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
Opening claim text (preview).
What is claimed is: 1. A method for accessing multi-dimensional tensors to accelerate computations for a neural network implemented on a hardware integrated circuit and having a plurality of neural network layers, the method comprising: receiving activations of an activation tensor comprising a plurality of elements, wherein each element represents a respective dimensional location for a corresponding activation of the activation tensor; storing, in a first memory at a hardware compute tile of the integrated circuit, each activation at a distinct address location of the first memory, wherein each address location corresponds to an element of the plurality of elements, where the element is along a particular dimension of the activation tensor; fetching, from the first memory, a first plurality of inputs to a first neural network layer, wherein each input of the first plurality of inputs is an activation that corresponds to a respective element of the plurality of elements, wherein the respective element is among a sequence of elements along a first dimension of the activation tensor, and wherein the first plurality of inputs are activations fetched from non-contiguous address locations of the first memory and sent directly to cells at the compute tile; and in response to fetching the first plurality of inputs, processing, using the cells at the compute tile, each input of the first plurality of inputs through the first neural network layer to generate a neural network output for the first neural network layer. 2. The method of claim 1 , further comprising: fetching, from the first memory, a second plurality of inputs to a second neural network layer, wherein each input of the second plurality of inputs corresponds to a respective element of the plurality of elements, wherein the respective element is one of multiple elements in a sequence of elements along a second, different dimension of the activation tensor, and wherein the second plurality of inputs are fetched from non-contiguous address locations of the first memory. 3. The method of claim 2 , further comprising: processing, through the second neural network layer, each input of the second plurality of inputs fetched from the non-contiguous address locations of the first memory to generate a neural network output for the second neural network layer, wherein the second neural network layer is different than the first neural network layer. 4. The method of claim 3 , wherein the activation tensor includes inputs for processing through the first neural network layer to generate a set of output activations that correspond to the neural network output for the first neural network layer. 5. The method of claim 4 , wherein: the first neural network layer is a convolutional layer of the neural network; and the second neural network layer is a pooling layer of the neural network. 6. The method of claim 1 , further comprising: receiving a plurality of weights for a second tensor comprising a plurality of elements, wherein each element represents a respective dimensional location for a corresponding weight of the received weights; and storing, in a second memory, each weight for the second tensor at a distinct address location of the second memory, wherein each address location that stores a respective weight corresponds to an element along a particular dimension of the second tensor. 7. The method of claim 6 , further comprising: fetching, using a tensor traversal unit, the first plurality of inputs from the non-contiguous address locations of the first memory that correspond to the sequence of elements along the first dimension of the activation tensor; and concurrently fetching, using the tensor traversal unit, the plurality of weights from address locations of memory banks of the second memory that correspond to elements along a dimension of the second tensor. 8. The method of claim 7 , wherein the inputs and the plurality of weights represent operands used to perform a neural network computation and the method further comprises: storing one or more operands in the first memory and in the second memory such that a dimensional layout of the activation tensor and a dimensional layout of the second tensor enables accelerated performance of a plurality of neural network computations for computing an inference. 9. A system for accessing multi-dimensional tensors to accelerate computations for a neural network implemented on a hardware integrated circuit and having a plurality of neural network layers, the system comprising: one or more processors; and one or more non-transitory machine-readable storage mediums for storing instructions that are executable by the one or more processors to cause performance of operations comprising: receiving activations of an activation tensor comprising a plurality of elements, wherein each element represents a respective dimensional location for a corresponding activation of the activation tensor; storing, in a first memory at a hardware compute tile of the integrated circuit, each activation at a distinct address location of the first memory, wherein each address location corresponds to an element of the plurality of elements, where the element is along a particular dimension of the activation tensor; fetching, from the first memory, a first plurality of inputs to a first neural network layer, wherein each input of the first plurality of inputs is an activation that corresponds to a respective element of the plurality of elements, wherein the respective element is among a sequence of elements along a first dimension of the activation tensor, and wherein the first plurality of inputs are activations fetched from non-contiguous address locations of the first memory and sent directly to cells at the compute tile; and in response to fetching the first plurality of inputs, processing, using the cells at the compute tile, each input of the first plurality of inputs through the first neural network layer to generate a neural network output for the first neural network layer. 10. The system of claim 9 , wherein the operations further comprise: fetching, from the first memory, a second plurality of inputs to a second neural network layer, wherein each input of the second plurality of inputs corresponds to a respective element of the plurality of elements, wherein the respective element is one of multiple elements in a sequence of elements along a second, different dimension of the activation tensor, and wherein the second plurality of inputs are fetched from non-contiguous address locations of the first memory. 11. The system of claim 10 , wherein the operations further comprise: processing, through the second neural network layer, each input of the second plurality of inputs fetched from the non-contiguous address locations of the first memory to generate a neural network output for the second neural network layer, wherein the second neural network layer is different than the first neural network layer. 12. The system of claim 11 , wherein the activation tensor includes inputs for processing through the first neural network layer to generate a set of output activations that correspond to the neural network output for the first neural network layer. 13. The system of claim 12 , wherein: the first neural network layer is a convolutional layer of the neural network; and the second neural network layer is a pooling layer of the neural network. 14. The system of claim 9 , wherein the operations further comprise: receiving a plurality of weights for a second tensor comprising a plurality of elements, wherein each element represents a respe
Combinations of networks · CPC title
Architecture, e.g. interconnection topology · CPC title
Feedforward networks · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.