Compiler techniques for mapping program code to a high performance, power efficient, programmable image processing hardware platform
US-2017287103-A1 · Oct 5, 2017 · US
US10304156B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10304156-B2 |
| Application number | US-201715625972-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 16, 2017 |
| Priority date | Feb 26, 2016 |
| Publication date | May 28, 2019 |
| Grant date | May 28, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method is described. The method includes repeatedly loading a next sheet of image data from a first location of a memory into a two dimensional shift register array. The memory is locally coupled to the two-dimensional shift register array and an execution lane array having a smaller dimension than the two-dimensional shift register array along at least one array axis. The loaded next sheet of image data keeps within an image area of the two-dimensional shift register array. The method also includes repeatedly determining output values for the next sheet of image data through execution of program code instructions along respective lanes of the execution lane array, wherein, a stencil size used in determining the output values encompasses only pixels that reside within the two-dimensional shift register array.
Opening claim text (preview).
The invention claimed is: 1. A processor comprising: a two-dimensional array of processing elements; a two-dimensional shift-register array having a first portion of registers that are each dedicated to one of the processing elements in the two-dimensional array of processing elements and having a halo portion of registers that borders the first portion of registers on one or more sides of the first portion; and a sheet generator configured to load sheets of image data into the two-dimensional shift register array, wherein each sheet of image data has at least as many pixels as processing elements in the two-dimensional array of processing elements, wherein the processor is configured to execute instructions to load input data to perform a stencil function requiring data from multiple sheets of image data, wherein the instructions cause the processor to perform operations comprising: initially loading a first sheet of image data and a second sheet of image into a local random access memory (RAM) that is local to the processor, assigning a first pointer that references a first address of the first sheet of image data loaded into the local RAM, assigning a second pointer that references a second address of the second sheet of image data loaded into the local RAM, loading the first sheet of image data into the first portion of the two-dimensional shift-register array using the first pointer, loading a portion of the second sheet of image data into the halo portion of the two-dimensional shift-register array using the second pointer, performing a first iteration of the stencil function using the first sheet of image data loaded into the first portion of the two-dimensional shift-register array and using the portion of the second sheet of image data loaded into the halo portion of the two-dimensional shift-register array, after performing the first iteration of the stencil function, updating the first pointer to reference the second address of the second sheet of image data loaded into the local RAM, loading the second sheet of image data into the first portion of the two-dimensional shift-register array using the first pointer, loading a portion of a third sheet of image data into the halo portion of the two-dimensional shift-register array, and performing a second iteration of the stencil function using the second sheet of image data loaded into the first portion of the two-dimensional shift-register array and using the portion of the third sheet of image data loaded into the halo portion of the two-dimensional shift-register array. 2. The processor of claim 1 , wherein loading the portion of a second sheet of image data into the halo portion of the two-dimensional shift-register array comprises loading the portion of the second sheet of image data from the local RAM. 3. The processor of claim 1 , wherein the operations further comprise: loading, into the local RAM, the third sheet of image data at least partially concurrently with performing the first iteration of the stencil function. 4. The processor of claim 1 , wherein loading the second sheet of image data from the local RAM into the first portion of the two-dimensional shift-register array comprises executing a load instruction that references the updated first pointer. 5. The processor of claim 1 , wherein the instructions include an offset instruction that, that when executed by a particular processing element having a particular location in the two-dimensional array of processing elements, causes the processing element to compute, given the particular location, an offset representing a particular sheet of data in the local RAM from which to load data. 6. The processor of claim 1 , wherein the operations further comprise providing an output sheet of data to the sheet generator to be provided by the sheet generator to one or more other components of the processor. 7. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by a processor comprising: a two-dimensional array of processing elements; a two-dimensional shift-register array having a first portion of registers that are each dedicated to one of the processing elements in the two-dimensional array of processing elements and having a halo portion of registers that borders the first portion of registers on one or more sides of the first portion; and a sheet generator configured to load sheets of image data into the two-dimensional shift register array, wherein each sheet of image data has at least as many pixels as processing elements in the two-dimensional array of processing elements, wherein the processor is configured to execute instructions to load input data to perform a stencil function requiring data from multiple sheets of image data, wherein the instructions cause the processor to perform operations comprising: initially loading a first sheet of image data and a second sheet of image into a local random access memory (RAM) that is local to the processor, assigning a first pointer that references a first address of the first sheet of image data loaded into the local RAM, assigning a second pointer that references a second address of the second sheet of image data loaded into the local RAM, loading the first sheet of image data into the first portion of the two-dimensional shift-register array using the first pointer, loading a portion of the second sheet of image data into the halo portion of the two-dimensional shift-register array using the second pointer, performing a first iteration of the stencil function using the first sheet of image data loaded into the first portion of the two-dimensional shift-register array and using the portion of the second sheet of image data loaded into the halo portion of the two-dimensional shift-register array, after performing the first iteration of the stencil function, updating the first pointer to reference the second address of the second sheet of image data loaded into the local RAM, loading the second sheet of image data into the first portion of the two-dimensional shift-register array using the first pointer, loading a portion of a third sheet of image data into the halo portion of the two-dimensional shift-register array, and performing a second iteration of the stencil function using the second sheet of image data loaded into the first portion of the two-dimensional shift-register array and using the portion of the third sheet of image data loaded into the halo portion of the two-dimensional shift-register array. 8. The computer program product of claim 7 , wherein loading the portion of a second sheet of image data into the halo portion of the two-dimensional shift-register array comprises loading the portion of the second sheet of image data from the local RAM. 9. The computer program product of claim 7 , wherein the operations further comprise: loading, into the local RAM, the third sheet of image data at least partially concurrently with performing the first iteration of the stencil function. 10. The computer program product of claim 7 , wherein loading the second sheet of image data from the local RAM into the first portion of the two-dimensional shift-register array comprises executing a load instruction that references the updated first pointer. 11. The computer program product of claim 7 , wherein the instructions include an offset instruction that, that when executed by a particular processing element having a particular location in the two-dimensional array of processing elements, causes the processing element to compute, given the particular location, an offset representing a particular sheet of data in the local R
Memory management · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Extension of register space, e.g. register cache · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.