Two-stage vector reduction using two-dimensional and one-dimensional systolic arrays
US-2016267111-A1 · Sep 15, 2016 · US
US9697463B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9697463-B2 |
| Application number | US-201615389303-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2016 |
| Priority date | May 21, 2015 |
| Publication date | Jul 4, 2017 |
| Grant date | Jul 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, each depth level being a respective matrix of distinct activation inputs from the plurality of activation inputs; sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array; for each depth level, sending the respective matrix of distinct activation inputs to a distinct cell along a second dimension of the systolic array; causing the systolic array to generate an accumulated output from the respective matrices sent to the cells; and generating the layer output from the accumulated output.
Opening claim text (preview).
What is claimed is: 1. A method for computing a layer output for a convolutional neural network layer from a layer input using a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel having a respective matrix structure of weights, the method comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, each depth level being a respective matrix of distinct activation inputs from the plurality of activation inputs; sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array; for each depth level, sending the respective matrix of distinct activation inputs to a distinct cell along a second dimension of the systolic array; causing the systolic array to generate an accumulated output from the respective matrices sent to the cells; and generating the layer output from the accumulated output. 2. The method of claim 1 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array. 3. The method of claim 1 , further comprising: determining that a count of the plurality of depth levels is less than a size of the second dimension of the systolic array; sending one or more duplicate matrices of distinct activation inputs to unused cells along the second dimension of the systolic array. 4. The method of claim 1 , further comprising: determining that a count of the plurality of kernels is less than a size of the first dimension of the systolic array; sending one or more duplicate kernel matrix structures to unused cells along the first dimension of the systolic array. 5. The method of claim 1 , where a stride parameter for the convolutional neural network is greater than one, the method further comprising: remapping, for each kernel structure, weights in the respective matrix to cause the matrix to have an increased number of depth levels. 6. The method of claim 1 , where generating the layer output from the accumulated output comprises normalizing and pooling the accumulated output to generate the layer output. 7. The method of claim 1 , where sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array comprises: at a given clock cycle, storing a first element in the kernel matrix structure in a first cell of the systolic array; and at a subsequent clock cycle, shifting the first element in the first cell to a second cell that is adjacent to the first cell and storing a second element in the kernel matrix structure in the first cell. 8. The method of claim 1 , where the systolic array comprises a plurality of cells, where the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and where the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array. 9. The method of claim 8 , where each cell in the plurality of cells comprises: a weight register configured to store a weight input; an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the second dimension; a sum-in register configured to store a previously summed value; multiplication circuitry coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input; and summation circuitry coupled to the multiplication circuitry and the sum-in register, where the summation circuitry is configured to output a sum of the product and the previously summed value, and where the summation circuitry is configured to send the sum to another sum-in register in a second adjacent cell along the first dimension. 10. A system for computing a layer output for a convolutional neural network layer from a layer input using a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel having a respective matrix structure of weights, the system comprising: one or more computers; and computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, each depth level being a respective matrix of distinct activation inputs from the plurality of activation inputs; sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array; for each depth level, sending the respective matrix of distinct activation inputs to a distinct cell along a second dimension of the systolic array; causing the systolic array to generate an accumulated output from the respective matrices sent to the cells; and generating the layer output from the accumulated output. 11. The system of claim 10 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array. 12. The system of claim 10 , further comprising: determining that a count of the plurality of depth levels is less than a size of the second dimension of the systolic array; sending one or more duplicate matrices of distinct activation inputs to unused cells along the second dimension of the systolic array. 13. The system of claim 10 , further comprising: determining that a count of the plurality of kernels is less than a size of the first dimension of the systolic array; sending one or more duplicate kernel matrix structures to unused cells along the first dimension of the systolic array. 14. The system of claim 10 , where a stride parameter for the convolutional neural network is greater than one, the method further comprising: remapping, for each kernel structure, weights in the respective matrix to cause the matrix to have an increased number of depth levels. 15. The system of claim 10 , where generating the layer output from the accumulated output comprises normalizing and pooling the accumulated output to generate the layer output. 16. The system of claim 10 , where sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array comprises: at a given clock cycle, storing a first element in the kernel matrix structure in a first cell of the systolic array; and at a subsequent clock cycle, shifting the first element in the first cell to a second cell that is adjacent to the first cell and storing a second element in the kernel matrix structure in the first cell. 17. The system of claim 10 , where the systolic array comprises a plurality of cells, where the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and where the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array. 18. The system of claim 17 , where each cell in t
using electronic means · CPC title
Combinations of networks · CPC title
Inference or reasoning models · CPC title
Neural networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.