Neural network compute tile

US10175980B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10175980-B2
Application numberUS-201615335769-A
CountryUS
Kind codeB2
Filing dateOct 27, 2016
Priority dateOct 27, 2016
Publication dateJan 8, 2019
Grant dateJan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: multiple sets of hardware computing units for accelerating inference computations for a plurality of layers of a neural network, wherein each set of hardware computing units comprises: a first computing unit configured to: receive instructions for performing inference computations for a first layer of the plurality of layers of the neural network, layer inputs for the first layer, and a respective set of weights for the first layer; and perform at least a subset of the inference computations for the first layer based on execution of a first loop nest to access the layer inputs for the first layer and the respective set of weights for the first layer; and a second computing unit configured to: receive instructions for performing inference computations for a second layer of the plurality of layers of the neural network, layer inputs for the second layer, and the respective set of weights for the second layer; and perform at least a subset of the inference computations for the second layer based on execution of a second loop nest to access the layer inputs for the second layer and the respective set of weights for the second layer. 2. The system of claim 1 , wherein execution of the first loop nest comprises: executing, by a processor of the first computing unit, a plurality of nested loops included in the first loop nest; and accessing, memory of the first computing unit, to retrieve data corresponding to elements of a tensor, wherein the data includes at least one of: the layer inputs for the first layer or weights for the first layer. 3. The system of claim 2 , wherein the first computing unit comprises: at least one traversal unit that uses the first loop nest to access the elements of the tensor; wherein a structure of the first loop nest indicates a manner in which the at least one traversal unit traverses dimensions of the tensor. 4. The system of claim 1 , wherein execution of the second loop nest comprises: executing, by a processor of the second computing unit, a plurality of nested loops included in the second loop nest; accessing, memory of the second computing unit, to retrieve data corresponding to elements of a tensor, wherein the data includes at least one of: the layer inputs for the second layer or weights for the second layer. 5. The system of claim 4 , wherein the second computing unit comprises: at least one traversal unit that uses the second loop nest to access particular elements of the tensor; wherein a structure of the second loop nest indicates a manner in which the at least one traversal unit traverses dimensions of the tensor. 6. The system of claim 1 , wherein: the first layer is a neural network layer of a first layer type; and the second layer is a neural network layer of a second layer type that is different than the first layer type. 7. The system of claim 1 , further comprising: a data communications instruction bus configured to: receive one or more instructions from an external source; provide, to the first computing unit, the instructions for performing the subset of inference computations for the first layer; and provide, to the second computing unit, the instructions for performing the subset of inference computations for the second layer. 8. The system of claim 7 , further comprising: a data communications ring bus configured to: receive multiple inputs and multiple weights from an external source; provide, to the first computing unit, the layer inputs for the first layer, and the respective set of weights for the first layer; and provide, to the second computing unit, the layer inputs for the second layer, and the respective set of weights for the second layer. 9. A method of accelerating inference computations for a plurality of layers of a neural network using a system comprising multiple sets of hardware computing units, the method comprising: receiving, by a first computing unit, instructions for performing inference computations for a first layer of the plurality of layers of the neural network, layer inputs for the first layer, and a respective set of weights for the first layer; performing, by the first computing unit, at least a subset of the inference computations for the first layer based on execution of a first loop nest to access the layer inputs for the first layer and the respective set of weights for the first layer; receiving, by a second computing unit, instructions for performing inference computations for a second layer of the plurality of layers of the neural network, layer inputs for the second layer, and a respective set of weights for the second layer; and performing, by the second computing unit, at least a subset of the inference computations for the second layer based on execution of a second loop nest to access the layer inputs for the second layer and the respective set of weights for the second layer. 10. The method of claim 9 , wherein execution of the first loop nest comprises: executing, by a processor of the first computing unit, a plurality of nested loops included in the first loop nest; and accessing, memory of the first computing unit, to retrieve data corresponding to elements of a tensor, wherein the data includes at least one of: the layer inputs for the first layer or weights for the first layer. 11. The method of claim 10 , wherein the first computing unit comprises: at least one traversal unit that uses the first loop nest to access the elements of the tensor; wherein a structure of the first loop nest indicates a manner in which the at least one traversal unit traverses one or more dimensions of the tensor. 12. The method of claim 9 , wherein execution of the second loop nest comprises: executing, by a processor of the second computing unit, a plurality of nested loops included in the second loop nest; accessing, memory of the second computing unit, to retrieve data corresponding to elements of a tensor, wherein the data includes at least one of: the layer inputs for the second layer or weights for the second layer. 13. The method of claim 12 , wherein the second computing unit comprises: at least one traversal unit that uses the second loop nest to access particular elements of the tensor; wherein a structure of the second loop nest indicates a manner in which the at least one traversal unit traverses dimensions of the tensor. 14. The method of claim 9 , wherein: the first layer is a neural network layer of a first layer type; and the second layer is a neural network layer of a second layer type that is different than the first layer type. 15. The method of claim 9 , further comprising: receiving, at an instruction bus, one or more instructions from an external source, wherein the instruction bus is configured to provide data communications to the multiple hardware computing units; providing, to the first computing unit and by the instruction bus, the instructions for performing the subset of inference computations for the first layer; and providing, to the second computing unit and by the instruction bus, the instructions for performing the subset of inference computations for the second layer. 16. The method of claim 15 , further comprising: receiving, at a ring bus, multiple inputs and multiple weights from an external source, wherein the ring bus is configured to provide data communications to the multiple hardware computing units; providing, to the first computing unit and by the ring bus, the layer inputs for the first layer, and the respective set of weights for the first layer; and

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title

  • G06F13/28Primary

    using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10175980B2 cover?
A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).