Automated compute kernel fusion, resizing, and interleave

US2016267622A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016267622-A1
Application numberUS-201514656074-A
CountryUS
Kind codeA1
Filing dateMar 12, 2015
Priority dateMar 12, 2015
Publication dateSep 15, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a pipelined application having different stages of processing, such as a graphics application or an image processing application, there may be a dependence of one compute kernel upon another. Data associated with individual kernels needs to be written and read. A technique to minimize a need to read and write kernel data to external memory utilize at least one of fusing kernels, resizing workgroups, and performing interleaving of kernels.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of performing a pipelined process utilizing compute kernels to perform stages of processing in a processing unit having a plurality of processors, an on-chip memory, and access to an external memory, comprising: identifying dependencies between producer kernels and consumer kernels in a command queue; and generating an interleaved set of dispatch calls for at least one set of interdependent kernels in the command queue in which intermediate data results are maintained in the on-chip memory. 2 . The method of claim 1 , wherein identifying dependencies comprises determining whether a set of kernels has a producer-consumer relationship. 3 . The method of claim 1 , wherein identifying dependencies comprises analyzing dependencies of input and output addresses of candidate kernels. 4 . The method of claim 3 , wherein identifying dependencies comprises generating a kernel dependency graph defining a relationship between producer kernels and consumer kernels. 5 . The method of claim 1 , further comprising redefining work groups associated with at least two kernels to match inputs and outputs of dependent kernels. 6 . The method of claim 5 , wherein the redefining is further selected to maintain intermediate data results in the on-chip memory. 7 . The method of claim 6 , further comprising selecting a traversal order to maintain intermediate data results in the on-chip memory. 8 . The method of claim 1 , wherein an image is divided into adjacent strips and each strip assigned to a different processor and the method further comprises synchronizing data reuse with an adjacent workgroup. 9 . The method of claim 1 , further comprising identifying kernels that can be fused and generating dispatch calls for fused kernels. 10 . The method of claim 1 , further comprising scanning code instructions, analyzing access patterns and range information, and tagging kernels as candidates for interleaving or fusing. 11 . A method of performing a pipelined process utilizing compute kernels to perform stages of processing in a processing unit having a plurality of processors, an on-chip memory, and access to an external memory, comprising: identifying dependencies between producer kernels and consumer kernels; determining if pairs of kernels can be interleaved or fused; replacing at least two kernels with a fused kernel and dispatching the fused kernel; and generating an interleaved set of dispatch calls for at least one set of interdependent kernels in which intermediate data results are maintained in the on-chip memory. 12 . The method of claim 11 , wherein identifying dependencies comprises generating a kernel dependency graph defining a relationship between producer kernels and consumer kernels. 13 . The method of claim 11 , further comprising redefining work groups associated with at least two kernels to match inputs and outputs of dependent kernels. 14 . The method of claim 13 , wherein the redefining is further selected to maintain intermediate data results in the on-chip memory. 15 . The method of claim 14 , further comprising selecting a traversal order to maintain intermediate data results in the on-chip memory. 16 . The method of claim 11 , wherein an image is divided into adjacent strips and each strip assigned to a different processor and the method further comprises synchronizing data reuse with an adjacent workgroup. 17 . A system, comprising: a graphics processing unit having a plurality of processors and an on-chip memory; and a driver and a compiler adapted to: identify dependencies between producer kernels and consumer kernels associated with a graphics application; determine if pairs of kernels can be interleaved or fused; replace at least two kernels with a fused kernel and dispatch the fused kernel; and generate an interleaved set of dispatch calls for at least one set of interdependent kernels in which intermediate data results are maintained in the on-chip memory. 18 . The system of claim 17 , wherein the system is configured to redefine work groups associated with at least two kernels to match inputs and outputs of dependent kernels and maintain intermediate data results in on-chip memory.

Assignees

Inventors

Classifications

  • by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors · CPC title

  • Color image · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • using local operators · CPC title

  • Electricity · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016267622A1 cover?
In a pipelined application having different stages of processing, such as a graphics application or an image processing application, there may be a dependence of one compute kernel upon another. Data associated with individual kernels needs to be written and read. A technique to minimize a need to read and write kernel data to external memory utilize at least one of fusing kernels, resizing wor…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).