Compiler Techniques for Mapping Program Code to a High Performance, Power Efficient, Programmable Image Processing Hardware Platform
US-2017249716-A1 · Aug 31, 2017 · US
US10387989B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10387989-B2 |
| Application number | US-201715628480-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 20, 2017 |
| Priority date | Feb 26, 2016 |
| Publication date | Aug 20, 2019 |
| Grant date | Aug 20, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for restructuring an image processing pipeline. The method includes compiling program code targeted for an image processor having programmable stencil processors composed of respective two-dimensional execution lane and shift register circuit structures. The program code is to implement a directed acyclic graph and is composed of multiple kernels that are to execute on respective ones of the stencil processors, wherein the compiling includes performing any of: horizontal fusion of kernels; vertical fusion of kernels; fission of one of the kernels into multiple kernels; spatial partitioning of a kernel into multiple spatially partitioned kernels; or splitting the directed acyclic graph into smaller graphs.
Opening claim text (preview).
The invention claimed is: 1. One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving instructions that define an original processing pipeline for a plurality of processors of a computing device, the original processing pipeline comprising a plurality of kernels to be executed in a particular order, each kernel comprising respective instructions to be performed by one of the plurality of processors, wherein the original processing pipeline specifies which of the kernels generate output to be used as input to one or more other kernels in the original processing pipeline; determining that the original processing pipeline comprises two independent kernels that have a consumer-producer relationship and that one or more vertical fusion criteria are satisfied, wherein the two independent kernels comprise load instructions that read from different respective line buffers of the computing device; and in response, generating a modified processing pipeline including generating instructions of a vertically fused kernel having instructions from both of the two independent kernels, and including modifying a respective line buffer reference by one or more load instructions from the two independent kernels so that all load instructions of the vertically fused kernel read from a same line buffer, wherein instructions of the original processing pipeline cause a producer kernel of the two independent kernels to write output to a particular line buffer and cause a consumer kernel of the two independent kernels to read the output from the particular line buffer, and wherein the instructions of the vertically fused kernel cause the output of the producer kernel to be stored in memory local to a processor in the plurality of processors and cause the vertically fused kernel to read the output stored in the memory local to the processor. 2. The one or more computer storage media of claim 1 , wherein determining that the one or more vertical fusion criteria are satisfied comprises: (i) determining that a measure of complexity of the two kernels satisfies a threshold, (ii) determining that the original processing pipeline has more kernels than processors of the computing device, or both (i) and (ii). 3. The one or more computer storage media of claim 1 , wherein generating the modified processing pipeline further comprises modifying one or more store instructions from the two independent kernels so that all store instructions of the vertically fused kernel write to the same line buffer. 4. A computer-implemented method comprising: receiving instructions that define an original processing pipeline for a plurality of processors of a computing device, the original processing pipeline comprising a plurality of kernels to be executed in a particular order, each kernel comprising respective instructions to be performed by one of the plurality of processors, wherein the original processing pipeline specifies which of the kernels generate output to be used as input to one or more other kernels in the original processing pipeline; determining that the original processing pipeline comprises two independent kernels that have a consumer-producer relationship and that one or more vertical fusion criteria are satisfied, wherein the two independent kernels comprise load instructions that read from different respective line buffers of the computing device; and in response, generating a modified processing pipeline including generating instructions of a vertically fused kernel having instructions from both of the two independent kernels, and including modifying a respective line buffer reference by one or more load instructions from the two independent kernels so that all load instructions of the vertically fused kernel read from a same line buffer, wherein instructions of the original processing pipeline cause a producer kernel of the two independent kernels to write output to a particular line buffer and cause a consumer kernel of the two independent kernels to read the output from the particular line buffer, and wherein the instructions of the vertically fused kernel cause the output of the producer kernel to be stored in memory local to a processor in the plurality of processors and cause the vertically fused kernel to read the output stored in the memory local to the processor. 5. The method of claim 4 , wherein determining that the one or more vertical fusion criteria are satisfied comprises: (i) determining that a measure of complexity of the two kernels satisfies a threshold, (ii) determining that the original processing pipeline has more kernels than processors of the computing device, or both (i) and (ii). 6. The method of claim 4 , wherein generating the modified processing pipeline further comprises modifying one or more store instructions from the two independent kernels so that all store instructions of the vertically fused kernel write to the same line buffer. 7. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving instructions that define an original processing pipeline for a plurality of processors of a computing device, the original processing pipeline comprising a plurality of kernels to be executed in a particular order, each kernel comprising respective instructions to be performed by one of the plurality of processors, wherein the original processing pipeline specifies which of the kernels generate output to be used as input to one or more other kernels in the original processing pipeline; determining that the original processing pipeline comprises two independent kernels that have a consumer-producer relationship and that one or more vertical fusion criteria are satisfied, wherein the two independent kernels comprise load instructions that read from different respective line buffers of the computing device; and in response, generating a modified processing pipeline including generating instructions of a vertically fused kernel having instructions from both of the two independent kernels, and including modifying a respective line buffer reference by one or more load instructions from the two independent kernels so that all load instructions of the vertically fused kernel read from a same line buffer, wherein instructions of the original processing pipeline cause a producer kernel of the two independent kernels to write output to a particular line buffer and cause a consumer kernel of the two independent kernels to read the output from the particular line buffer, and wherein the instructions of the vertically fused kernel cause the output of the producer kernel to be stored in memory local to a processor in the plurality of processors and cause the vertically fused kernel to read the output stored in the memory local to the processor. 8. The system of claim 7 , wherein determining that the one or more vertical fusion criteria are satisfied comprises: (i) determining that a measure of complexity of the two kernels satisfies a threshold, (ii) determining that the original processing pipeline has more kernels than processors of the computing device, or both (i) and (ii). 9. The system of claim 7 , wherein generating the modified processing pipeline further comprises modifying one or more store instructions from the two independent kernels so that all store instructions of the vertically fused kernel write to the same line buffer.
Target code generation · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Logical partitioning of resources; Management or configuration of virtualized resources (specific details on emulation or internal functioning of virtual machines G06F9/455) · CPC title
involving image processing hardware · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.