Convolutional neural network on programmable two dimensional image processor

US10789505B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10789505-B2
Application numberUS-201715631906-A
CountryUS
Kind codeB2
Filing dateJun 23, 2017
Priority dateJul 1, 2016
Publication dateSep 29, 2020
Grant dateSep 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.

First claim

Opening claim text (preview).

The invention claimed is: 1. A processor comprising: a two-dimensional shift-register array; and a two-dimensional array of processing elements, wherein each shift register of the two-dimensional shift-register array is dedicated to one of the processing elements in the two-dimensional array of processing elements, wherein the processor is configured to execute instructions to perform a stencil function on each of one or more pixel values of a two-dimensional sheet of image data loaded into the two-dimensional array of processing elements, wherein the stencil function is associated with a plurality of coefficients, and wherein the instructions cause the processor to repeatedly shift data in the two-dimensional shift-register array to perform a traversal of a stencil of image data for the stencil function and, after each shift, to perform operations comprising: concurrently multiplying, by the processing elements in the two-dimensional array of processing elements, a same next coefficient of the stencil function by every respective pixel value in the two-dimensional sheet of image data stored in the two-dimensional shift-register array, performing, by each processing element in the two-dimensional array of processing elements, an addition of (i) a respective result of the multiplication with (ii) a respective current sum for the processing element to update the respective current sum for the processing element, and based on determining that all coefficients for the stencil function have not been processed, performing a next shift in the traversal of the stencil of image data including shifting, by the processor, each pixel value of image data stored in the two-dimensional shift-register array in a same direction. 2. The processor of claim 1 , wherein the operations further comprise based on determining that all coefficient values for the stencil function have been processed, outputting a result of the stencil function computed for each of the one or more pixel values in the two-dimensional sheet of image data. 3. The processor of claim 1 , wherein the operations further comprise: loading, by the processor, the two-dimensional sheet of image data into the two-dimensional shift-register array; and loading, by the processor, the plurality of coefficients for the stencil function into memory local to the processor. 4. The processor of claim 1 , wherein concurrently multiplying and performing the addition comprises executing, by each processing element of the two-dimensional array of processing elements, a multiply-add instruction using (i) the same next coefficient of the stencil function, (ii) a respective pixel value in a shift register of the one or more respective shift registers that are dedicated to the processing element, and (iii) a respective current sum for the processing element. 5. The processor of claim 4 , wherein the processor comprises a scalar processing element that is configured to issue instructions to each processing element in the two-dimensional array of processing elements, and wherein the operations further comprise: issuing, by the scalar processing element to each processing element in the two-dimensional array of processing elements, a multiply-add instruction having the same next coefficient of the stencil function. 6. The processor of claim 5 , wherein the multiply-add instruction comprises multiple operands including a current coefficient value of the stencil function. 7. The processor of claim 1 , wherein the operations further comprise storing, by each processing element, a result of each addition in register space that is local to the processing element. 8. The processor of claim 1 , wherein the processor is a stencil processor of a computing device having multiple stencil processors, and wherein the stencil processors are configured to execute instructions that cause each stencil processor of the multiple stencil processors to perform the stencil function using a respective coefficient set on each of multiple planes of image data for a single image. 9. The processor of claim 1 , wherein the processor is a stencil processor of a computing device having multiple stencil processors, and wherein the stencil processors are configured to execute instructions that cause each stencil processor of the multiple stencil processors to perform the stencil function using a respective coefficient set on a respective plane of image data for a single image having multiple planes of image data. 10. The processor of claim 1 , wherein the processor is a stencil processor of a computing device having multiple stencil processors, and wherein the stencil processors are configured to execute instructions that cause each stencil processor of the multiple stencil processors to perform the stencil function on a different respective portion of an image plane. 11. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by a processor comprising: a two-dimensional shift-register array; and a two-dimensional array of processing elements, wherein each shift register of the two-dimensional shift-register array is dedicated to one of the processing elements in the two-dimensional array of processing elements, cause the processor to perform a stencil function on each of one or more pixel values of a two-dimensional sheet of image data loaded into the two-dimensional sheet of processing elements, wherein the stencil function is associated with a plurality of coefficients, and wherein the instructions cause the processor to repeatedly shift data in the two-dimensional shift-register array to perform a traversal of a stencil of image data for the stencil function and, after each shift, to perform operations comprising: concurrently multiplying, by the processing elements in the two-dimensional array of processing elements, a same next coefficient of the stencil function by every respective pixel value stored in the two-dimensional shift-register array, performing, by each processing element in the two-dimensional array of processing elements, an addition of (i) a respective result of the multiplication with (ii) a respective current sum for the processing element to update the respective current sum for the processing element, and based on determining that all coefficients for the stencil function have not been processed, performing a next shift in the traversal of the stencil of image data including shifting, by the processor, each pixel value of image data stored in the two-dimensional shift-register array in a same direction. 12. The computer program product of claim 11 , wherein the operations further comprise based on determining that all coefficient values for the stencil function have been processed, outputting a result of the stencil function computed for each of the one or more pixel values in the two-dimensional sheet of image data. 13. The computer program product of claim 11 , wherein the operations further comprise: loading, by the processor, the two-dimensional sheet of image data into the two-dimensional shift-register array; and loading, by the processor, the plurality of coefficients for the stencil function into memory local to the processor. 14. The computer program product of claim 11 , wherein concurrently multiplying and performing the addition comprises executing, by each processing element of the two-dimensional array of processing elements, a multiply-add instruction using (i) the same next coefficient of the stencil function, (ii) a respective pixel value in a shift register of the one or more respective shift registers that are dedic

Assignees

Inventors

Classifications

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10789505B2 cover?
A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural netw…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).