Neural network compute tile

US9710265B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9710265-B1
Application numberUS-201715462180-A
CountryUS
Kind codeB1
Filing dateMar 17, 2017
Priority dateOct 27, 2016
Publication dateJul 18, 2017
Grant dateJul 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing unit for accelerating tensor computations, comprising: a first memory bank having a first data width for storing at least one of input activations or output activations; a second memory bank having a second data width that is larger than the first data width for storing one or more parameters used in performing computations; a direct memory access traversal unit configured to: access at least one memory location of the first memory bank to obtain at least one input activation, or at least one output activation, based on instructions received from a source external to the computing unit; and access at least one memory location of the second memory bank to obtain the one or more parameters based on the received instructions; at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters obtained from the second memory bank and performs the computations; a tensor traversal unit in data communication with at least the first memory bank, the tensor traversal unit configured to provide a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the at least one MAC operator; and wherein the computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the at least one MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank. 2. The computing unit of claim 1 , wherein the computing unit performs the one or more computations by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest indicates a manner in which the tensor traversal unit traverses one or more dimensions of the data array. 3. The computing unit of claim 2 , wherein the one or more computations are performed based, in part, on a tensor operation provided by the tensor traversal unit, the tensor operation including a loop nest structure for accessing one or more elements of the data array. 4. The computing unit of claim 1 , wherein the computing unit includes a non-linear unit and a first portion of the computations comprises producing one or more output activations based on the multiply operation and a second portion of the computations comprises applying, by the non-linear unit, a non-linear function to the one or more output activations. 5. The computing unit of claim 4 , wherein the one or more computations performed by the computing unit comprises using a shift register to shift the output activations to the first memory bank. 6. The computing unit of claim 1 , further comprising a portion of a ring bus that extends outside of the computing unit, wherein the ring bus provides a data path between the first memory bank and a memory bank of another adjacent computing unit, and between the second memory bank and a memory bank of another adjacent computing unit. 7. The computing unit of claim 1 , wherein the second memory bank is configured to store at least one of partial sums or one or more pooling layer inputs. 8. A computer-implemented method for accelerating tensor computations, comprising: receiving instructions from a source external to a computing unit; accessing, by a direct memory access traversal unit, memory locations based on the received instructions, the memory locations being: at least one memory location of a first memory bank that is accessed to obtain at least one input activation or at least one output activation, the first memory bank being disposed in the computing unit and having a first data width; and at least one memory location of a second memory bank that is accessed to obtain one or more parameters, the second memory bank being disposed in the computing unit and having a second data width that is larger than the first data width; providing, from the first memory bank, a first input activation in response to the first memory bank receiving a control signal from a tensor traversal unit, wherein the first input activation is provided to a data bus that is accessible by at least one cell of the computing unit; receiving, by the at least one cell, one or more parameters from the second memory bank, wherein the at least one cell comprises at least one multiply accumulate (“MAC”) operator; and performing, by the at least one MAC operator, one or more computations associated with at least one element of a data array, wherein the one or more computations comprise, in part, a multiply operation of at least the first input activation accessed from the data bus and at least one parameter received from the second memory bank. 9. The computer-implemented method of claim 8 , wherein the one or more computations are performed based, in part, on the computing unit executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest indicates a manner in which the tensor traversal unit traverses one or more dimensions of the data array. 10. The computer-implemented method of claim 9 , further comprising, providing, by the tensor traversal unit, a tensor operation that includes a loop nest structure for accessing one or more elements of the data array. 11. The computer-implemented method of claim 8 , further comprising, performing, a first portion of the one or more computations by producing at least one output activation based on the multiply operation. 12. The computer-implemented method of claim 11 , further comprising, performing, a second portion of the one or more computations by applying a non-linear function to one or more output activations. 13. A non-transitory computer-readable storage medium comprising instructions executable by one or more processors which, upon such execution, causes the one or more processors to perform operations comprising: receiving instructions from a source external to a computing unit; accessing, by a direct memory access traversal unit, memory locations based on the received instructions, the memory locations being: at least one memory location of a first memory bank that is accessed to obtain at least one input activation or at least one output activation, the first memory bank being disposed in the computing unit and having a first data width; and at least one memory location of a second memory bank that is accessed to obtain one or more parameters, the second memory bank being disposed in the computing unit and having a second data width that is larger than the first data width; providing, from the first memory bank, a first input activation in response to the first memory bank receiving a control signal from a tensor traversal unit, wherein the first input activation is provided to a data bus that is accessible by at least one cell of the computing unit; receiving, by the at least one cell, one or more parameters from the second memory bank, wherein the at least one cell comprises at least one multiply accumulate (“MAC”) operator; and performing, by the at least one MAC operator, one or more computations associated with at least one element of a data array, wherein the one or more computations comprise, in part, a multiply operation of at least the first input activation accessed from the data bus and at least one parameter received from the second memory bank. 14. The non-transitory computer-readable medium of claim 13 , wherein the one or more computations are performed based, in part, on the computing unit executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest indicates a manner in which the tensor

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • G06F13/28Primary

    using burst mode transfer, e.g. direct memory access {DMA}, cycle steal (G06F13/32 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9710265B1 cover?
A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).