Compiler techniques for mapping program code to a high performance, power efficient, programmable image processing hardware platform

US10387989B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10387989-B2
Application numberUS-201715628480-A
CountryUS
Kind codeB2
Filing dateJun 20, 2017
Priority dateFeb 26, 2016
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for restructuring an image processing pipeline. The method includes compiling program code targeted for an image processor having programmable stencil processors composed of respective two-dimensional execution lane and shift register circuit structures. The program code is to implement a directed acyclic graph and is composed of multiple kernels that are to execute on respective ones of the stencil processors, wherein the compiling includes performing any of: horizontal fusion of kernels; vertical fusion of kernels; fission of one of the kernels into multiple kernels; spatial partitioning of a kernel into multiple spatially partitioned kernels; or splitting the directed acyclic graph into smaller graphs.

First claim

Opening claim text (preview).

The invention claimed is: 1. One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving instructions that define an original processing pipeline for a plurality of processors of a computing device, the original processing pipeline comprising a plurality of kernels to be executed in a particular order, each kernel comprising respective instructions to be performed by one of the plurality of processors, wherein the original processing pipeline specifies which of the kernels generate output to be used as input to one or more other kernels in the original processing pipeline; determining that the original processing pipeline comprises two independent kernels that have a consumer-producer relationship and that one or more vertical fusion criteria are satisfied, wherein the two independent kernels comprise load instructions that read from different respective line buffers of the computing device; and in response, generating a modified processing pipeline including generating instructions of a vertically fused kernel having instructions from both of the two independent kernels, and including modifying a respective line buffer reference by one or more load instructions from the two independent kernels so that all load instructions of the vertically fused kernel read from a same line buffer, wherein instructions of the original processing pipeline cause a producer kernel of the two independent kernels to write output to a particular line buffer and cause a consumer kernel of the two independent kernels to read the output from the particular line buffer, and wherein the instructions of the vertically fused kernel cause the output of the producer kernel to be stored in memory local to a processor in the plurality of processors and cause the vertically fused kernel to read the output stored in the memory local to the processor. 2. The one or more computer storage media of claim 1 , wherein determining that the one or more vertical fusion criteria are satisfied comprises: (i) determining that a measure of complexity of the two kernels satisfies a threshold, (ii) determining that the original processing pipeline has more kernels than processors of the computing device, or both (i) and (ii). 3. The one or more computer storage media of claim 1 , wherein generating the modified processing pipeline further comprises modifying one or more store instructions from the two independent kernels so that all store instructions of the vertically fused kernel write to the same line buffer. 4. A computer-implemented method comprising: receiving instructions that define an original processing pipeline for a plurality of processors of a computing device, the original processing pipeline comprising a plurality of kernels to be executed in a particular order, each kernel comprising respective instructions to be performed by one of the plurality of processors, wherein the original processing pipeline specifies which of the kernels generate output to be used as input to one or more other kernels in the original processing pipeline; determining that the original processing pipeline comprises two independent kernels that have a consumer-producer relationship and that one or more vertical fusion criteria are satisfied, wherein the two independent kernels comprise load instructions that read from different respective line buffers of the computing device; and in response, generating a modified processing pipeline including generating instructions of a vertically fused kernel having instructions from both of the two independent kernels, and including modifying a respective line buffer reference by one or more load instructions from the two independent kernels so that all load instructions of the vertically fused kernel read from a same line buffer, wherein instructions of the original processing pipeline cause a producer kernel of the two independent kernels to write output to a particular line buffer and cause a consumer kernel of the two independent kernels to read the output from the particular line buffer, and wherein the instructions of the vertically fused kernel cause the output of the producer kernel to be stored in memory local to a processor in the plurality of processors and cause the vertically fused kernel to read the output stored in the memory local to the processor. 5. The method of claim 4 , wherein determining that the one or more vertical fusion criteria are satisfied comprises: (i) determining that a measure of complexity of the two kernels satisfies a threshold, (ii) determining that the original processing pipeline has more kernels than processors of the computing device, or both (i) and (ii). 6. The method of claim 4 , wherein generating the modified processing pipeline further comprises modifying one or more store instructions from the two independent kernels so that all store instructions of the vertically fused kernel write to the same line buffer. 7. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving instructions that define an original processing pipeline for a plurality of processors of a computing device, the original processing pipeline comprising a plurality of kernels to be executed in a particular order, each kernel comprising respective instructions to be performed by one of the plurality of processors, wherein the original processing pipeline specifies which of the kernels generate output to be used as input to one or more other kernels in the original processing pipeline; determining that the original processing pipeline comprises two independent kernels that have a consumer-producer relationship and that one or more vertical fusion criteria are satisfied, wherein the two independent kernels comprise load instructions that read from different respective line buffers of the computing device; and in response, generating a modified processing pipeline including generating instructions of a vertically fused kernel having instructions from both of the two independent kernels, and including modifying a respective line buffer reference by one or more load instructions from the two independent kernels so that all load instructions of the vertically fused kernel read from a same line buffer, wherein instructions of the original processing pipeline cause a producer kernel of the two independent kernels to write output to a particular line buffer and cause a consumer kernel of the two independent kernels to read the output from the particular line buffer, and wherein the instructions of the vertically fused kernel cause the output of the producer kernel to be stored in memory local to a processor in the plurality of processors and cause the vertically fused kernel to read the output stored in the memory local to the processor. 8. The system of claim 7 , wherein determining that the one or more vertical fusion criteria are satisfied comprises: (i) determining that a measure of complexity of the two kernels satisfies a threshold, (ii) determining that the original processing pipeline has more kernels than processors of the computing device, or both (i) and (ii). 9. The system of claim 7 , wherein generating the modified processing pipeline further comprises modifying one or more store instructions from the two independent kernels so that all store instructions of the vertically fused kernel write to the same line buffer.

Assignees

Inventors

Classifications

  • Target code generation · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Logical partitioning of resources; Management or configuration of virtualized resources (specific details on emulation or internal functioning of virtual machines G06F9/455) · CPC title

  • involving image processing hardware · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10387989B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for restructuring an image processing pipeline. The method includes compiling program code targeted for an image processor having programmable stencil processors composed of respective two-dimensional execution lane and shift register circuit structures. The program code is to implement a directed ac…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).