Convolutional neural network on programmable two dimensional image processor

US10546211B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10546211-B2
Application numberUS-201615201204-A
CountryUS
Kind codeB2
Filing dateJul 1, 2016
Priority dateJul 1, 2016
Publication dateJan 28, 2020
Grant dateJan 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The two-dimensional shift register provides local respective register space for the execution lanes. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: executing a convolutional neural network layer having a plurality of coefficient sets on an image processor comprising: a plurality of stencil processors, wherein each stencil processor comprises a respective array of execution lanes and a respective two-dimensional shift-register array, wherein the two-dimensional shift-register array of the stencil processor comprises local respective register space accessible by the array of execution lanes of the stencil processor, a line buffer unit, and a plurality of sheet generator units, one respective sheet generator unit for each of the plurality of stencil processors, wherein executing the convolutional neural network layer comprises: loading different respective coefficient sets of the plurality of coefficient sets into each stencil processor of the plurality of stencil processors, each coefficient set comprising respective coefficient values; receiving, by the line buffer unit, a frame of image data; parsing, by the line buffer unit, the frame of image data into a plurality of line groups; sending each line group to each of the plurality of sheet generator units; loading, by each sheet generator unit from a received line group, a same plane of image data of a three-dimensional block of image data comprising pixel values into a respective two-dimensional shift-register array of a respective stencil processor of the plurality of stencil processors; and performing, by each stencil processor of the plurality of stencil processors, a two-dimensional convolution of the same plane of image data with a respective coefficient set loaded into the stencil processor including sequentially: concurrently multiplying within the execution lanes respective pixel values and coefficient values of the coefficient set loaded into the stencil processor to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional shift-register array for different stencils defining sub-regions of pixel values within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift-register array. 2. The method of claim 1 wherein the concurrently multiplying further comprises concurrently multiplying a same coefficient value against image data within the two-dimensional shift-register array. 3. The method of claim 1 wherein the effecting of alignment of values comprises shifting image data within the two-dimensional shift-register array prior to multiplying the image data by a next coefficient value along a same direction of the two-dimensional shift-register array. 4. The method of claim 1 , wherein the local respective register space for the execution lanes comprises respective dedicated register space for each execution lane, and wherein executing the convolutional neural network layer further comprises: whenever the content is shifted in the two-dimensional shift-register array, multiplying within the execution lanes the respective pixel values and coefficient values by reading respective shifted pixel values from the respective dedicated register space for each execution lane. 5. The method of claim 1 wherein the image processor is configured to use an output from the convolutional neural network layer as an input for a next convolutional neural network layer to be computed by the image processor. 6. The method of claim 1 wherein the image processor is configured to multiplex the convolutional neural network layer and a second convolutional neural network layer with the image data remaining local to the execution lanes between processing of the convolutional neural network layer and the second convolutional neural network layer. 7. One or more non-transitory machine readable storage media having stored thereon program code that when processed by an image processor comprising: a plurality of stencil processors, wherein each stencil processor comprises a respective array of execution lanes and a respective two-dimensional shift-register array, wherein the two-dimensional shift-register array of the stencil processor comprises local respective register space accessible by the array of execution lanes of the stencil processor, a line buffer unit, and a plurality of sheet generator units, one respective sheet generator unit for each of the plurality of stencil processors, causes the image processor to execute a convolutional neural network layer including performing operations comprising: loading different respective coefficient sets of the plurality of coefficient sets into each stencil processor of the plurality of stencil processors, each coefficient set comprising respective coefficient values; receiving, by the line buffer unit, a frame of image data; parsing, by the line buffer unit, the frame of image data into a plurality of line groups; sending each line group to each of the plurality of sheet generator units; loading, by each sheet generator unit from a received line group, a same plane of image data of a three-dimensional block of image data comprising pixel values into a respective two-dimensional shift-register array of a respective stencil processor of the plurality of stencil processors; performing, by each stencil processor of the plurality of stencil processors, a two-dimensional convolution of the same plane of image data with a respective coefficient set loaded into the stencil processor including sequentially: concurrently multiplying within the execution lanes respective pixel values and coefficient values of the coefficient set loaded into the stencil processor to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional shift-register array for different stencils defining sub-regions of pixel values within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift-register array. 8. The one or more non-transitory machine readable storage media of claim 7 wherein the concurrently multiplying further comprises concurrently multiplying a same coefficient value against image data within the two-dimensional shift-register array. 9. The one or more non-transitory machine readable storage media of claim 7 wherein the effecting of alignment of values comprises shifting image data within the two-dimensional shift-register array prior to multiplying the image data by a next coefficient value along a same direction of the two-dimensional shift-register array. 10. The one or more non-transitory machine readable storage media of claim 7 wherein the image processor is configured to use an output from the convolutional neural network layer as an input for a next convolutional neural network layer to be computed by the image processor. 11. The one or more non-transitory machine readable storage media of claim 7 wherein the image processor is configured to multiplex the convolutional neural network layer and a second convolutional neural network layer with the image data remaining local to the execution lanes between processing of the convolutional neural network layer and the second convolutional neural network layer. 12. The one or more non-transitory machine readable storage media of claim 7 , wherein the local respective register space for the execution lanes comprises respective dedicated register s

Assignees

Inventors

Classifications

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • using local operators · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10546211B2 cover?
A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The two-dimensional shift register provides local respective register space for the execution lanes. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).