What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Sep 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Automated compute kernel fusion, resizing, and interleave

US2016267622A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2016267622-A1
Application number	US-201514656074-A
Country	US
Kind code	A1
Filing date	Mar 12, 2015
Priority date	Mar 12, 2015
Publication date	Sep 15, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a pipelined application having different stages of processing, such as a graphics application or an image processing application, there may be a dependence of one compute kernel upon another. Data associated with individual kernels needs to be written and read. A technique to minimize a need to read and write kernel data to external memory utilize at least one of fusing kernels, resizing workgroups, and performing interleaving of kernels.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of performing a pipelined process utilizing compute kernels to perform stages of processing in a processing unit having a plurality of processors, an on-chip memory, and access to an external memory, comprising: identifying dependencies between producer kernels and consumer kernels in a command queue; and generating an interleaved set of dispatch calls for at least one set of interdependent kernels in the command queue in which intermediate data results are maintained in the on-chip memory. 2 . The method of claim 1 , wherein identifying dependencies comprises determining whether a set of kernels has a producer-consumer relationship. 3 . The method of claim 1 , wherein identifying dependencies comprises analyzing dependencies of input and output addresses of candidate kernels. 4 . The method of claim 3 , wherein identifying dependencies comprises generating a kernel dependency graph defining a relationship between producer kernels and consumer kernels. 5 . The method of claim 1 , further comprising redefining work groups associated with at least two kernels to match inputs and outputs of dependent kernels. 6 . The method of claim 5 , wherein the redefining is further selected to maintain intermediate data results in the on-chip memory. 7 . The method of claim 6 , further comprising selecting a traversal order to maintain intermediate data results in the on-chip memory. 8 . The method of claim 1 , wherein an image is divided into adjacent strips and each strip assigned to a different processor and the method further comprises synchronizing data reuse with an adjacent workgroup. 9 . The method of claim 1 , further comprising identifying kernels that can be fused and generating dispatch calls for fused kernels. 10 . The method of claim 1 , further comprising scanning code instructions, analyzing access patterns and range information, and tagging kernels as candidates for interleaving or fusing. 11 . A method of performing a pipelined process utilizing compute kernels to perform stages of processing in a processing unit having a plurality of processors, an on-chip memory, and access to an external memory, comprising: identifying dependencies between producer kernels and consumer kernels; determining if pairs of kernels can be interleaved or fused; replacing at least two kernels with a fused kernel and dispatching the fused kernel; and generating an interleaved set of dispatch calls for at least one set of interdependent kernels in which intermediate data results are maintained in the on-chip memory. 12 . The method of claim 11 , wherein identifying dependencies comprises generating a kernel dependency graph defining a relationship between producer kernels and consumer kernels. 13 . The method of claim 11 , further comprising redefining work groups associated with at least two kernels to match inputs and outputs of dependent kernels. 14 . The method of claim 13 , wherein the redefining is further selected to maintain intermediate data results in the on-chip memory. 15 . The method of claim 14 , further comprising selecting a traversal order to maintain intermediate data results in the on-chip memory. 16 . The method of claim 11 , wherein an image is divided into adjacent strips and each strip assigned to a different processor and the method further comprises synchronizing data reuse with an adjacent workgroup. 17 . A system, comprising: a graphics processing unit having a plurality of processors and an on-chip memory; and a driver and a compiler adapted to: identify dependencies between producer kernels and consumer kernels associated with a graphics application; determine if pairs of kernels can be interleaved or fused; replace at least two kernels with a fused kernel and dispatch the fused kernel; and generate an interleaved set of dispatch calls for at least one set of interdependent kernels in which intermediate data results are maintained in the on-chip memory. 18 . The system of claim 17 , wherein the system is configured to redefine work groups associated with at least two kernels to match inputs and outputs of dependent kernels and maintain intermediate data results in on-chip memory.

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

H04N23/741
by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors · CPC title
G06T2207/10024
Color image · CPC title
G06T1/20Primary
Processor architectures; Processor configuration, e.g. pipelining · CPC title
G06T5/20
using local operators · CPC title
H04N5/232
Electricity · mapped topic

Patent family

Related publications grouped by family.

View patent family 56888114

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016267622A1 cover?: In a pipelined application having different stages of processing, such as a graphics application or an image processing application, there may be a dependence of one compute kernel upon another. Data associated with individual kernels needs to be written and read. A technique to minimize a need to read and write kernel data to external memory utilize at least one of fusing kernels, resizing wor…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Sep 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).