Rotating data for neural network computations

US2016342893A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016342893-A1
Application numberUS-201514845022-A
CountryUS
Kind codeA1
Filing dateSep 3, 2015
Priority dateMay 21, 2015
Publication dateNov 24, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving a plurality of activation inputs; forming a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array; generating a plurality of rotated kernel structures from each of the plurality of kernel; sending each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the systolic array; causing the systolic array to generate an accumulated output based on the plurality of value inputs and the plurality of kernels; and generating the layer output from the accumulated output.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel having a respective matrix structure of weights, the method comprising: receiving a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix; forming a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array; generating a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension; sending each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the systolic array; causing the systolic array to generate an accumulated output based on the plurality of value inputs and the plurality of kernels; and generating the layer output from the accumulated output. 2 . The method of claim 1 , where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array. 3 . The method of claim 2 , where sending the plurality of vector inputs to one or more cells comprises: sending, for a particular row of the systolic array, a respective element from each vector input to the particular row; and selecting, at each cell in the particular row, one of the respective elements for use in a register in the cell based on a multiplexor control signal. 4 . The method of claim 2 , where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises: sending each vector input to a distinct series of shift registers, each shift register shifting an element of the vector input to a subsequent shift register on a subsequent clock cycle, each shift register corresponding to a respective row in the systolic array; and selecting, for each row, an output from the corresponding shift registers for use in the TOW. 5 . The method of claim 1 , where forming a plurality of vector inputs from the plurality of activation inputs is based on a size of a particular kernel structure, further comprising: overlapping the particular kernel structure with the matrix representation of the plurality of activation inputs to form a first vector input from elements in the matrix representation; forming one or more other vector inputs from other elements that surround the overlapped particular kernel structure. 6 . The method of claim 1 , where generating the layer output from the accumulated output comprises normalizing the accumulated output, pooling the accumulated output, or both, to generate the layer output. 7 . The method of claim 1 , where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises: at a particular clock cycle, storing a first vector input in the plurality of vector inputs in a first cell of the systolic array; and at a subsequent clock cycle, shifting the first vector input in the first cell to a second cell that is adjacent to the first cell and storing a second vector input in the plurality of vector inputs in the first cell. 8 . A system for computing a layer output for a convolutional neural network layer from a layer input for the convolutional neural network layer using a two-dimensional systolic array, the convolutional neural network layer having a plurality of kernels, each kernel having a respective matrix structure of weights, the system comprising: one or more computers; and computer-readable medium coupled to the one or more computers and having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving a plurality of activation inputs, the plurality of activation inputs represented as a multi-dimensional matrix; forming a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimensional matrix; sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array; generating a plurality of rotated kernel structures from each of the plurality of kernels, where generating a particular rotated kernel structure comprises shifting elements in the respective matrix structure for the kernel along one dimension; sending each kernel structure and each rotated kernel structure to one or more cells along a second dimension of the systolic array; causing the systolic array to generate an accumulated output based on the plurality of value inputs and the plurality of kernels; and generating the layer output from the accumulated output. 9 . The system of claim 8 , where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array. 10 . The system of claim 9 , where sending the plurality of vector inputs to one or more cells comprises: sending, for a particular row of the systolic array, a respective element from each vector input to the particular row; and selecting, at each cell in the particular row, one of the respective elements for use in a register in the cell based on a multiplexor control signal. 11 . The system of claim 9 , where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises: sending each vector input to a distinct series of shift registers, each shift register shifting an element of the vector input to a subsequent shift register on a subsequent clock cycle, each shift register corresponding to a respective row in the systolic array; and selecting, for each row, an output from the corresponding shift registers for use in the TOW. 12 . The system of claim 8 , where forming a plurality of vector inputs from the plurality of activation inputs is based on a size of a particular kernel structure, further comprising: overlapping the particular kernel structure with the matrix representation of the plurality of activation inputs to form a first vector input from elements in the matrix representation; forming one or more other vector inputs from other elements that surround the overlapped particular kernel structure. 13 . The system of claim 8 , where generating the layer output from the accumulated output comprises normalizing the accumulated output, pooling the accumulated output, or both, to generate the layer output. 14 . The system of claim 8 , where sending the plurality of vector inputs to one or more cells along a first dimension of the systolic array comprises: at a particular clock cycle, storing a first vector input in the plurality of vector inputs in a first cell of the systolic array; and at a subsequent clock cycle, shifting the first vector input in the first cell to a second cell that is adjacent to the first cell and storing a second vector input in the plurality of vector inputs in the first cell. 15 . A computer-readable medium having instructions stored thereon, which, wh

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Multidimensional correlation or convolution · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016342893A1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing a layer output for a convolutional neural network layer, the method comprising: receiving a plurality of activation inputs; forming a plurality of vector inputs from the plurality of activation inputs, each vector input comprising values from a distinct region within the multi-dimension…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).