Facilitating dynamic pipelining of workload executions on graphics processing units on computing devices

US10068306B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10068306-B2
Application numberUS-201414574606-A
CountryUS
Kind codeB2
Filing dateDec 18, 2014
Priority dateDec 18, 2014
Publication dateSep 4, 2018
Grant dateSep 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is described for facilitating dynamic pipelining of workload executions at graphics processing units on computing devices. A method of embodiments, as described herein, includes generating a command buffer having a plurality of kernels relating to a plurality of workloads to be executed at a graphics processing unit (GPU), and pipelining the workloads to be processed at the GPU, where pipelining includes scheduling each kernel to be executed on the GPU based on at least one of availability of resource threads and status of one or more dependency events relating to each kernel in relation to other kernels of the plurality of kernels.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a processing device coupled to memory, wherein the processing device facilitates: coalescing kernels logic to generate a command buffer having a plurality of kernels relating to a plurality of workloads to be processed by a graphics processor; dependency checking logic to check, in runtime, availability of resource threads of the graphics processor that remain idle due to one or more pending dependency contingencies associated with one or more kernels of one or more workloads in relation to other kernels of the plurality of kernels of the command buffer; and state management logic to acquire, in runtime, one or more idle resource threads to facilitate pipelining of other workloads in parallel execution based on the other kernels that do not have one or more pending dependency contingencies, wherein the other workloads are processed by the graphics processor using the one or more idle resource threads without having to stall the graphics processor for processing of the one or more workloads based on the one or more kernels and without seeking resolution of the one or more pending dependency contingencies associated with the one or more kernels, and wherein the state management logic is further to set up interface descriptors for the plurality of kernels prior to processing the respective kernel and wherein the dependency checking logic checks availability, in part, by loading a descriptor by returning the data last written by the respective kernel, and wherein the state management logic pushes a write cycle in a pipelined workload based on a subsequent read as determined using a respective interface descriptor. 2. The apparatus of claim 1 , wherein the processing device facilitates the dependency checking logic to determine status of one or more dependency events to determine, in runtime, whether a dependency event remains unresolved such that a base kernel is to be executed, at least partially, to resolve the one or more pending dependency contingencies to allow for initiation of execution of the one or more subsequent kernels of the plurality of kernels. 3. The apparatus of claim 2 , wherein the processing device facilitates data coherency management logic to detect the one or more idle resource threads of the resource threads, wherein the one or more idle resource threads remain unused due to the one or more dependency events that remain unresolved. 4. The apparatus of claim 2 , wherein the one or more dependency events include at least one of incompletion of processing of a base command associated with the base kernel, and unavailability of the one or more idle resource threads, wherein one or more subsequent kernels are associated with one or more subsequent commands depending from the base command. 5. The apparatus of claim 1 , wherein the resource threads comprise one or more of graphics processor hardware threads, command buffers, executable code, and memory heaps. 6. The apparatus of claim 5 , wherein a command buffer comprises a plurality of commands associated with the plurality of workloads, wherein the plurality of commands include one or more processing commands relating to the plurality of workloads and further include data having status data relating to the plurality of workloads, wherein the processing commands and the data are dispatched to the graphics processor using a pipeline. 7. The apparatus of claim 1 , wherein the coalescing kernels logic, upon receiving an indication of one or more pending dependency contingencies associated with one or more kernels of one or more workloads, generates an additional command buffer having a first group of the plurality of kernels and the command buffer having a second group of the plurality of kernels with these command buffers to be submitted to pipelines for processing by the graphics processor while overcoming the one or more dependency contingencies. 8. A method comprising: generating a command buffer having a plurality of kernels relating to a plurality of workloads to be processed by a graphics processor; setting up interface descriptors for the plurality of kernels prior to processing the respective kernels; checking, in runtime, availability of resource threads that remain idle due to one or more pending dependency contingencies associated with one or more kernels of one or more workloads in relation to other kernels of the plurality of kernels of the command buffer, wherein the availability is checked, in part, by loading a descriptor by returning the data last written by the respective kernel; and acquiring, in runtime, one or more idle resource threads based on the availability of resource threads to facilitate pipelining of other workloads in parallel execution based on the other kernels that do not have one or more pending dependency contingencies, wherein the other workloads are processed by the graphics processor using the one or more idle resource threads without having to stall the graphics processor for processing of the one or more workloads based on the one or more kernels and without seeking resolution of the one or more pending dependency contingencies associated with the one or more kernels, and wherein a write cycle is pushed in a pipelined workload based on a subsequent read as determined using a respective interface descriptor. 9. The method of claim 8 , further comprising determining status of one or more dependency events to determine, in runtime, whether a dependency event remains unresolved such that a base kernel is to be executed, at least partially, to resolve the one or more pending dependency contingencies to allow for initiation of execution of the one or more subsequent kernels of the plurality of kernels. 10. The method of claim 9 , further comprising detecting the one or more idle resource threads of the resource threads, wherein the one or more idle resource threads that remain unused due to the one or more dependency events that remain unresolved. 11. The method of claim 9 , wherein the one or more dependency events include at least one of incompletion of processing of a base command associated with the base kernel, and unavailability of the one or more idle resource threads, wherein one or more subsequent kernels are associated with one or more subsequent commands depending from the base command. 12. The method of claim 8 , wherein the resource threads comprise one or more of graphics processor hardware threads, command buffers, executable code, and memory heaps. 13. The method of claim 12 , wherein a command buffer comprises a plurality of commands associated with the plurality of workloads, wherein the plurality of commands include one or more processing commands relating to the plurality of workloads and further include data having status data relating to the plurality of workloads, wherein the processing commands and the data are dispatched to the graphics processor using a pipeline to the graphics processor. 14. At least one non-transitory machine-readable medium comprising a plurality of instructions, executed on a computing device, to facilitate the computing device to perform one or more operations comprising: generating a command buffer having a plurality of kernels relating to a plurality of workloads to be processed by a graphics processor; setting up interface descriptors for the plurality of kernels prior to processing the respective kernels; checking, in runtime, availability of resource threads that remain idle due to one or more pending dependency contingencies associated with one or more kernels of one or more workloads in relation to other kernels of the plurality of kernels of the

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • involving image processing hardware · CPC title

  • Subject matter not provided for in other groups of this subclass · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10068306B2 cover?
A mechanism is described for facilitating dynamic pipelining of workload executions at graphics processing units on computing devices. A method of embodiments, as described herein, includes generating a command buffer having a plurality of kernels relating to a plurality of workloads to be executed at a graphics processing unit (GPU), and pipelining the workloads to be processed at the GPU, whe…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).