Block operations for an image processor having a two-dimensional execution lane array and a two-dimensional shift register

US9986187B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9986187-B2
Application numberUS-201715628527-A
CountryUS
Kind codeB2
Filing dateJun 20, 2017
Priority dateJul 1, 2016
Publication dateMay 29, 2018
Grant dateMay 29, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resident in respective locations of the two dimensional shift register array that the shifted first content has been shifted into.

First claim

Opening claim text (preview).

The invention claimed is: 1. A processor comprising: a two-dimensional array of processing elements; and multiple shift-register planes, each shift-register plane comprising a separate two-dimensional shift-register array, wherein each shift register of each shift-register array is dedicated to one of the processing elements, wherein the processor is configured to execute instructions to perform a matrix multiplication operation on a first matrix stored in a first shift-register plane and a second matrix stored in a second shift-register plane to generate a result of the matrix multiplication operation stored in a third shift-register plane, wherein the first matrix comprises N columns, and wherein the instructions cause each processing element to perform, for N iterations, operations on shift registers dedicated to the processing element, the operations comprising: performing a multiplication operation between a first value stored in a first shift register of the first shift-register plane and a second value stored in a second shift register of the second shift-register plane, performing an addition operation between a result of the multiplication operation and a summation value stored in a third shift register of the third shift-register plane, updating the summation value stored in the third shift register of the third shift-register plane using a result of the addition operation, and based on determining that a current iteration is not a last iteration: shifting data in the first shift-register plane one unit in a first dimension corresponding to rows of the first matrix, and shifting data in the second shift-register plane one unit in a second dimension corresponding to columns of the second matrix. 2. The processor of claim 1 , wherein the first matrix comprises R rows and the second matrix comprises S columns, and wherein the operations further comprise: performing a row-wise rotational shearing transformation on the first matrix including shifting each row i, from row 0 to row R−1, by i units; and performing a column-wise rotational shearing transformation on the second matrix including shifting each column j, from column 0 to column S−1, by j units. 3. The processor of claim 1 , wherein the operations further comprise: loading an initial condition for the matrix multiplication by loading initial values into shift registers of the third shift-register plane. 4. The processor of claim 1 , wherein the operations further comprise computing a two-dimensional discrete Fourier transform (2D DFT) using a first matrix of real values, a first matrix of imaginary values, a second matrix of real values, and a second matrix of imaginary values including: computing a real portion of the 2D DFT including: performing a first matrix multiplication operation between the first matrix of real values and the second matrix of real values, performing a second matrix multiplication operation between the first matrix of imaginary values and the second matrix of imaginary values, and subtracting a result of the second matrix multiplication operation from the first matrix multiplication operation; and computing an imaginary portion of the 2D DFT including: performing a third matrix multiplication operation between the first matrix of real values and the second matrix of imaginary values, performing a fourth matrix multiplication operation between the second matrix of real values and the first matrix of imaginary values, and adding a result of the third matrix multiplication operation and a result of the fourth matrix multiplication operation. 5. The processor of claim 4 , wherein the operations further comprise: moving the computed real portion of the 2D DFT to another shift-register plane or to memory; reloading the first matrix of real values and the first matrix of imaginary values back into respective shift-register planes; and performing a row-wise rotational shearing transformation on the first matrix of real values and the first matrix of imaginary values. 6. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by a processor comprising: a two-dimensional array of processing elements, and multiple shift-register planes, each shift-register plane comprising a separate two-dimensional shift-register array, wherein each shift register of each shift-register array is dedicated to one of the processing elements, cause the processor to perform a matrix multiplication operation on a first matrix stored in a first shift-register plane and a second matrix stored in a second shift-register plane to generate a result of the matrix multiplication operation stored in a third shift-register plane, wherein the first matrix comprises N columns, and wherein the instructions cause each processing element to perform, for N iterations, operations on shift registers dedicated to the processing element, the operations comprising: performing a multiplication operation between a first value stored in a first shift register of the first shift-register plane and a second value stored in a second shift register of the second shift-register plane, performing an addition operation between a result of the multiplication operation and a summation value stored in a third shift register of the third shift-register plane, updating the summation value stored in the third shift register of the third shift-register plane using a result of the addition operation, and based on determining that a current iteration is not a last iteration: shifting data in the first shift-register plane one unit in a first dimension corresponding to rows of the first matrix, and shifting data in the second shift-register plane one unit in a second dimension corresponding to columns of the second matrix. 7. The computer program product of claim 6 , wherein the first matrix comprises R rows and the second matrix comprises S columns, and wherein the operations further comprise: performing a row-wise rotational shearing transformation on the first matrix including shifting each row i, from row 0 to row R−1, by i units; and performing a column-wise rotational shearing transformation on the second matrix including shifting each column j, from column 0 to column S−1, by j units. 8. The computer program product of claim 6 , wherein the operations further comprise: loading an initial condition for the matrix multiplication by loading initial values into shift registers of the third shift-register plane. 9. The computer program product of claim 6 , wherein the operations further comprise computing a two-dimensional discrete Fourier transform (2D DFT) using a first matrix of real values, a first matrix of imaginary values, a second matrix of real values, and a second matrix of imaginary values including: computing a real portion of the 2D DFT including: performing a first matrix multiplication operation between the first matrix of real values and the second matrix of real values, performing a second matrix multiplication operation between the first matrix of imaginary values and the second matrix of imaginary values, and subtracting a result of the second matrix multiplication operation from the first matrix multiplication operation; and computing an imaginary portion of the 2D DFT including: performing a third matrix multiplication operation between the first matrix of real values and the second matrix of imaginary values, performing a fourth matrix multiplication operation between the second matrix of real values and the first matrix of imaginary values, and adding a result of the third matrix multiplication operation and a result of the fourth matrix multiplication operat

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • H04N25/767Primary

    Horizontal readout lines, multiplexers or registers · CPC title

  • Arithmetic instructions · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Two dimensional arrays, e.g. mesh, torus · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9986187B2 cover?
A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resi…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 29 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).