Two-stage vector reduction using two-dimensional and one-dimensional systolic arrays
US-2016267111-A1 · Sep 15, 2016 · US
US9747546B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9747546-B2 |
| Application number | US-201514844524-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 3, 2015 |
| Priority date | May 21, 2015 |
| Publication date | Aug 29, 2017 |
| Grant date | Aug 29, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, and generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.
Opening claim text (preview).
What is claimed is: 1. A system for performing neural network computations for a neural network having a plurality of neural network layers, the system comprising: a hardware circuit comprising at least a first circuit portion that includes a matrix computation unit comprising M ×N cells, wherein M and N are positive integers that are greater than one, and wherein each cell of the M ×N cells of the first circuit portion includes respective circuitry configured to: obtain, from an adjacent cell along a first dimension of the matrix computation unit, a respective weight input the respective weight input being a weight input for a neural network layer of the plurality of neural network layers; obtain, from an adjacent cell along a second dimension of the matrix computation unit, a respective activation input for the neural network layer; determine a respective multiplication product based on the respective weight input and the respective activation input; determine a respective accumulated value based at least on the respective multiplication product; provide, to another adjacent cell along the first dimension of the matrix computation unit, the respective accumulated value for determining an output for the neural network layer; and provide, to an adjacent cell along the second dimension of the matrix computation unit, the respective activation input for the neural network layer. 2. The system of claim 1 , wherein, for each of M cells along the first dimension, the adjacent cell along the second dimension obtains the respective activation input from a respective value loader. 3. The system of claim 2 , wherein, for each of N cells along the second dimension, the adjacent cell along the first dimension obtains the respective weight input from a weight fetcher interface. 4. The system of claim 1 , wherein the respective weight input is obtained from the adjacent cell along the first dimension and the respective activation input is obtained from the adjacent cell along the second dimension periodically over a predetermined number of clock cycles. 5. The system of claim 1 , wherein for each cell of (M−1)×N cells of the M ×N cells, the respective circuitry is further configured to obtain a respective second accumulated value shifted from another cell of the M ×N cells, wherein determining the respective accumulated value comprises determining the respective accumulated value based at least on the respective multiplication product and the respective second accumulated value. 6. The system of claim 1 , wherein determining the respective multiplication product comprises determining the respective multiplication product based on a control signal. 7. The system of claim 1 , wherein, for each of N cells along the second dimension, the another adjacent cell long the first dimension provides one or more values to a respective accumulator unit. 8. The system of claim 1 , wherein providing the respective accumulated value comprises providing the respective accumulated value based on a control signal. 9. The system of claim 1 , wherein providing the respective activation input comprises providing the respective activation input based on a control signal. 10. The system of claim 1 , further comprising: a first memory configured to provide activation inputs for the plurality of neural network layers; and a second memory configured to provide weight inputs for the plurality of neural network layers. 11. The system of claim 10 , wherein the hardware circuit further comprises a second circuit portion that includes: vector computation circuitry configured to: determine an activation vector based on one or more accumulated values received from the matrix computation unit; and provide the activation vector to the first memory. 12. The system of claim 11 , further comprising: sequencer circuitry configured to provide one or more control signals to the first memory, the second memory, the vector computation circuitry, or the matrix computation unit to control a dataflow of the system. 13. A method for performing neural network computations for a neural network having a plurality of neural network layers, the method comprising: for each cell of M ×N cells of a matrix computation unit that is disposed within at least a first circuit portion of a hardware circuit comprising the neural network: obtaining, from an adjacent cell along a first dimension of the matrix computation unit, a respective weight input for a neural network layer of the plurality of neural network layers; obtaining, from an adjacent cell along a second dimension of the matrix computation unit, a respective activation input for the neural network layer; determining a respective multiplication product based on the respective weight input and the respective activation input; determining a respective accumulated value based at least on the respective multiplication product; providing, to another adjacent cell along the first dimension of the matrix computation unit, the respective accumulated value for determining an output for the neural network layer, wherein M and N are positive integers that are greater than one; and providing, to an adjacent cell along the second dimension of the matrix computation unit, the respective activation input for the neural network layer. 14. The method of claim 13 , wherein, for each of M cells along the first dimension, the adjacent cell along the second dimension obtains the respective activation input from a respective value loader. 15. The method of claim 14 , wherein, for each of the N cells along the second dimension, the adjacent cell along the first dimension obtains the respective weight input from a weight fetcher interface. 16. The method of claim 1 , further comprising: for each cell of (M−1)×N cells of the M×N cells, obtaining a respective second accumulated value shifted from another cell of the M×N cells, wherein determining the respective accumulated value comprises determining the respective accumulated value based at least on the respective multiplication product and the respective second accumulated value. 17. The method of claim 13 , wherein, for each of N cells along the second dimension, the another adjacent cell long the first dimension provides one or more values to a respective accumulator unit. 18. A matrix computation unit configured to be disposed within at least a first circuit portion of a hardware circuit, the matrix computation unit for performing neural network computations for a neural network having a plurality of neural network layers, the matrix computation unit comprising M×N cells, wherein M and N are positive integers that are greater than one, and wherein each cell of the M×N cells of the first circuit portion includes respective circuitry configured to: obtain, from an adjacent cell along a first dimension of the matrix computation unit, a respective weight input for a neural network layer of the plurality of neural network layers; obtain, from an adjacent cell along a second dimension of the matrix computation unit, a respective activation input for the neural network layer; determine a respective multiplication product based on the respective weight input and the respective activation input; determine a respective accumulated value based at least on the respective multiplication product; provide, to another adjacent cell along the second dimension of the matrix computation unit, the respective accumulated value for determining an output for the neural network layer; and prov
Related publications grouped by family.
Answers are generated from the same data shown on this page.