Neural network compute tile

US11422801B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11422801-B2
Application numberUS-201916239760-A
CountryUS
Kind codeB2
Filing dateJan 4, 2019
Priority dateOct 27, 2016
Publication dateAug 23, 2022
Grant dateAug 23, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for accessing multi-dimensional tensors to accelerate computations for a neural network implemented on a hardware integrated circuit and having a plurality of neural network layers, the method comprising: receiving activations of an activation tensor comprising a plurality of elements, wherein each element represents a respective dimensional location for a corresponding activation of the activation tensor; storing, in a first memory at a hardware compute tile of the integrated circuit, each activation at a distinct address location of the first memory, wherein each address location corresponds to an element of the plurality of elements, where the element is along a particular dimension of the activation tensor; fetching, from the first memory, a first plurality of inputs to a first neural network layer, wherein each input of the first plurality of inputs is an activation that corresponds to a respective element of the plurality of elements, wherein the respective element is among a sequence of elements along a first dimension of the activation tensor, and wherein the first plurality of inputs are activations fetched from non-contiguous address locations of the first memory and sent directly to cells at the compute tile; and in response to fetching the first plurality of inputs, processing, using the cells at the compute tile, each input of the first plurality of inputs through the first neural network layer to generate a neural network output for the first neural network layer. 2. The method of claim 1 , further comprising: fetching, from the first memory, a second plurality of inputs to a second neural network layer, wherein each input of the second plurality of inputs corresponds to a respective element of the plurality of elements, wherein the respective element is one of multiple elements in a sequence of elements along a second, different dimension of the activation tensor, and wherein the second plurality of inputs are fetched from non-contiguous address locations of the first memory. 3. The method of claim 2 , further comprising: processing, through the second neural network layer, each input of the second plurality of inputs fetched from the non-contiguous address locations of the first memory to generate a neural network output for the second neural network layer, wherein the second neural network layer is different than the first neural network layer. 4. The method of claim 3 , wherein the activation tensor includes inputs for processing through the first neural network layer to generate a set of output activations that correspond to the neural network output for the first neural network layer. 5. The method of claim 4 , wherein: the first neural network layer is a convolutional layer of the neural network; and the second neural network layer is a pooling layer of the neural network. 6. The method of claim 1 , further comprising: receiving a plurality of weights for a second tensor comprising a plurality of elements, wherein each element represents a respective dimensional location for a corresponding weight of the received weights; and storing, in a second memory, each weight for the second tensor at a distinct address location of the second memory, wherein each address location that stores a respective weight corresponds to an element along a particular dimension of the second tensor. 7. The method of claim 6 , further comprising: fetching, using a tensor traversal unit, the first plurality of inputs from the non-contiguous address locations of the first memory that correspond to the sequence of elements along the first dimension of the activation tensor; and concurrently fetching, using the tensor traversal unit, the plurality of weights from address locations of memory banks of the second memory that correspond to elements along a dimension of the second tensor. 8. The method of claim 7 , wherein the inputs and the plurality of weights represent operands used to perform a neural network computation and the method further comprises: storing one or more operands in the first memory and in the second memory such that a dimensional layout of the activation tensor and a dimensional layout of the second tensor enables accelerated performance of a plurality of neural network computations for computing an inference. 9. A system for accessing multi-dimensional tensors to accelerate computations for a neural network implemented on a hardware integrated circuit and having a plurality of neural network layers, the system comprising: one or more processors; and one or more non-transitory machine-readable storage mediums for storing instructions that are executable by the one or more processors to cause performance of operations comprising: receiving activations of an activation tensor comprising a plurality of elements, wherein each element represents a respective dimensional location for a corresponding activation of the activation tensor; storing, in a first memory at a hardware compute tile of the integrated circuit, each activation at a distinct address location of the first memory, wherein each address location corresponds to an element of the plurality of elements, where the element is along a particular dimension of the activation tensor; fetching, from the first memory, a first plurality of inputs to a first neural network layer, wherein each input of the first plurality of inputs is an activation that corresponds to a respective element of the plurality of elements, wherein the respective element is among a sequence of elements along a first dimension of the activation tensor, and wherein the first plurality of inputs are activations fetched from non-contiguous address locations of the first memory and sent directly to cells at the compute tile; and in response to fetching the first plurality of inputs, processing, using the cells at the compute tile, each input of the first plurality of inputs through the first neural network layer to generate a neural network output for the first neural network layer. 10. The system of claim 9 , wherein the operations further comprise: fetching, from the first memory, a second plurality of inputs to a second neural network layer, wherein each input of the second plurality of inputs corresponds to a respective element of the plurality of elements, wherein the respective element is one of multiple elements in a sequence of elements along a second, different dimension of the activation tensor, and wherein the second plurality of inputs are fetched from non-contiguous address locations of the first memory. 11. The system of claim 10 , wherein the operations further comprise: processing, through the second neural network layer, each input of the second plurality of inputs fetched from the non-contiguous address locations of the first memory to generate a neural network output for the second neural network layer, wherein the second neural network layer is different than the first neural network layer. 12. The system of claim 11 , wherein the activation tensor includes inputs for processing through the first neural network layer to generate a set of output activations that correspond to the neural network output for the first neural network layer. 13. The system of claim 12 , wherein: the first neural network layer is a convolutional layer of the neural network; and the second neural network layer is a pooling layer of the neural network. 14. The system of claim 9 , wherein the operations further comprise: receiving a plurality of weights for a second tensor comprising a plurality of elements, wherein each element represents a respe

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Feedforward networks · CPC title

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11422801B2 cover?
A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).