Neural network computation circuit, control circuit therefor, and control method therefor
US-2024411520-A1 · Dec 12, 2024 · US
US2016342891A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016342891-A1 |
| Application number | US-201514844524-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 3, 2015 |
| Priority date | May 21, 2015 |
| Publication date | Nov 24, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, and generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.
Opening claim text (preview).
What is claimed is: 1 . A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer. 2 . The circuit of claim 1 , further comprising: a unified buffer communicatively coupled to the matrix computation unit and the vector computation unit, where the unified buffer is configured to receive and store output from the vector computation unit, and the unified buffer is configured to send the received output as input to the matrix computation unit. 3 . The circuit of claim 2 , further comprising: a sequencer configured to receive instructions from a host device and generate a plurality of control signals from the instructions, where the plurality of control signals control dataflow through the circuit; and a direct memory access engine communicatively coupled to the unified buffer and the sequencer, where the direct memory access engine is configured to send the plurality of activation inputs to the unified buffer, where the unified buffer is configured to send the plurality of activation inputs to the matrix computation unit, and where the direct memory access engine is configured to read result data from the unified buffer. 4 . The circuit of claim 3 , further comprising: a memory unit configured to send the plurality of weight inputs to the matrix computation unit, and where the direct memory access engine is configured to send the plurality of weight inputs to the memory unit. 5 . The circuit of claim 1 , where the matrix computation unit is configured as a two dimensional systolic array comprising a plurality of cells. 6 . The circuit of claim 5 , where the two dimensional systolic array is a square array. 7 . The circuit of claim 5 , where the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and where the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array. 8 . The circuit of claim 7 , where, for a given layer in the plurality of layers, a count of the plurality of activation inputs is greater than a size of the second dimension of the systolic array, and where the systolic array is configured to: divide the plurality of activation inputs into portions, where each portion has a size less than or equal to the size of the second dimension; generating, for each portion, a respective portion of accumulated values; and combining each portion of accumulated values to generate a vector of accumulated values for the given layer. 9 . The circuit of claim 7 , where, for a given layer in the plurality of layers, a count of the plurality of weight inputs is greater than a size of the first dimension of the systolic array, and where the systolic array is configured to: divide the plurality of weight inputs into portions, where each portion has a size less than or equal to the size of the first dimension; generating, for each portion, a respective portion of accumulated values; and combining each portion of accumulated values to generate a vector of accumulated values for the given layer. 10 . The circuit of claim 7 , where each cell in the plurality of cells comprises: a weight register configured to store a weight input; an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the second dimension; a sum-in register configured to store a previously summed value; multiplication circuitry communicatively coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input; and summation circuitry communicatively coupled to the multiplication circuitry and the sum-in register, where the summation circuitry is configured to output a sum of the product and the previously summed value, and where the summation circuitry is configured to send the sum to another sum-in register in a second adjacent cell along the first dimension. 11 . The circuit of claim 10 , where one or more cells in the plurality of cells are each configured to store the respective sum in a respective accumulator unit, where the respective sum is an accumulated value. 12 . The circuit of claim 7 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array. 13 . The circuit of claim 1 , where the vector computation unit normalizes each activated value to generate a plurality of normalized values. 14 . The circuit of claim 1 , where the vector computation unit pools one or more activated values to generate a plurality of pooled values. 15 . A method for performing neural network computations for a neural network comprising a plurality of neural network layers using a circuit comprising a matrix computation unit and a vector computation unit coupled to the matrix computation unit, the method comprising, for each of the plurality of neural network layers: providing a plurality of weight inputs and a plurality of activation inputs for the neural network layer to the matrix computation unit; generating, using the matrix computation unit, a plurality of accumulated values, wherein the matrix computation unit is configured to receive the plurality of weight inputs and the plurality of activation inputs for the neural network layer and generate the plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and generating, using the vector computation unit, a plurality of activated values for the neural network layer, wherein the matrix computation unit is configured to apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer. 16 . The method of claim 15 , further comprising: receiving, by a unified buffer communicatively coupled to the matrix computation unit and the vector computation unit; storing output from the vector computation unit at the unified buffer; sending, from the unified buffer, the received output as input to the matrix computation unit. 17 . The method of claim 16 , further comprising: receiving, at a sequencer, instructions from a host device and generating a plurality of control signals from the instructions, where the plurality of control signals control dataflow through the circuit; sending, from a direct memory access engine communicatively coupled to the unified buffer and the sequencer, the plurality of activation inputs to the unified buffer; sending, from the unified buffer, the plurality of activation inputs to the matrix computation unit; and reading, at the direct me
Related publications grouped by family.
Answers are generated from the same data shown on this page.