Computing convolutions using a neural network processor

US9697463B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9697463-B2
Application numberUS-201615389303-A
CountryUS
Kind codeB2
Filing dateDec 22, 2016
Priority dateMay 21, 2015
Publication dateJul 4, 2017
Grant dateJul 4, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, each depth level being a respective matrix of distinct activation inputs from the plurality of activation inputs; sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array; for each depth level, sending the respective matrix of distinct activation inputs to a distinct cell along a second dimension of the systolic array; causing the systolic array to generate an accumulated output from the respective matrices sent to the cells; and generating the layer output from the accumulated output.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for computing a layer output for a convolutional neural network layer from a layer input using a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel having a respective matrix structure of weights, the method comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, each depth level being a respective matrix of distinct activation inputs from the plurality of activation inputs; sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array; for each depth level, sending the respective matrix of distinct activation inputs to a distinct cell along a second dimension of the systolic array; causing the systolic array to generate an accumulated output from the respective matrices sent to the cells; and generating the layer output from the accumulated output. 2. The method of claim 1 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array. 3. The method of claim 1 , further comprising: determining that a count of the plurality of depth levels is less than a size of the second dimension of the systolic array; sending one or more duplicate matrices of distinct activation inputs to unused cells along the second dimension of the systolic array. 4. The method of claim 1 , further comprising: determining that a count of the plurality of kernels is less than a size of the first dimension of the systolic array; sending one or more duplicate kernel matrix structures to unused cells along the first dimension of the systolic array. 5. The method of claim 1 , where a stride parameter for the convolutional neural network is greater than one, the method further comprising: remapping, for each kernel structure, weights in the respective matrix to cause the matrix to have an increased number of depth levels. 6. The method of claim 1 , where generating the layer output from the accumulated output comprises normalizing and pooling the accumulated output to generate the layer output. 7. The method of claim 1 , where sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array comprises: at a given clock cycle, storing a first element in the kernel matrix structure in a first cell of the systolic array; and at a subsequent clock cycle, shifting the first element in the first cell to a second cell that is adjacent to the first cell and storing a second element in the kernel matrix structure in the first cell. 8. The method of claim 1 , where the systolic array comprises a plurality of cells, where the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and where the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array. 9. The method of claim 8 , where each cell in the plurality of cells comprises: a weight register configured to store a weight input; an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the second dimension; a sum-in register configured to store a previously summed value; multiplication circuitry coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input; and summation circuitry coupled to the multiplication circuitry and the sum-in register, where the summation circuitry is configured to output a sum of the product and the previously summed value, and where the summation circuitry is configured to send the sum to another sum-in register in a second adjacent cell along the first dimension. 10. A system for computing a layer output for a convolutional neural network layer from a layer input using a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel having a respective matrix structure of weights, the system comprising: one or more computers; and computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, each depth level being a respective matrix of distinct activation inputs from the plurality of activation inputs; sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array; for each depth level, sending the respective matrix of distinct activation inputs to a distinct cell along a second dimension of the systolic array; causing the systolic array to generate an accumulated output from the respective matrices sent to the cells; and generating the layer output from the accumulated output. 11. The system of claim 10 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array. 12. The system of claim 10 , further comprising: determining that a count of the plurality of depth levels is less than a size of the second dimension of the systolic array; sending one or more duplicate matrices of distinct activation inputs to unused cells along the second dimension of the systolic array. 13. The system of claim 10 , further comprising: determining that a count of the plurality of kernels is less than a size of the first dimension of the systolic array; sending one or more duplicate kernel matrix structures to unused cells along the first dimension of the systolic array. 14. The system of claim 10 , where a stride parameter for the convolutional neural network is greater than one, the method further comprising: remapping, for each kernel structure, weights in the respective matrix to cause the matrix to have an increased number of depth levels. 15. The system of claim 10 , where generating the layer output from the accumulated output comprises normalizing and pooling the accumulated output to generate the layer output. 16. The system of claim 10 , where sending each respective kernel matrix structure to a distinct cell along a first dimension of the systolic array comprises: at a given clock cycle, storing a first element in the kernel matrix structure in a first cell of the systolic array; and at a subsequent clock cycle, shifting the first element in the first cell to a second cell that is adjacent to the first cell and storing a second element in the kernel matrix structure in the first cell. 17. The system of claim 10 , where the systolic array comprises a plurality of cells, where the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and where the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array. 18. The system of claim 17 , where each cell in t

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • Combinations of networks · CPC title

  • Inference or reasoning models · CPC title

  • Neural networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9697463B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving the layer input, the layer input comprising a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix comprising a plurality of depth levels, ea…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).