Method and apparatus for a highly efficient graphics processing unit (gpu) execution model
US-2016093012-A1 · Mar 31, 2016 · US
US10068306B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10068306-B2 |
| Application number | US-201414574606-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 18, 2014 |
| Priority date | Dec 18, 2014 |
| Publication date | Sep 4, 2018 |
| Grant date | Sep 4, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is described for facilitating dynamic pipelining of workload executions at graphics processing units on computing devices. A method of embodiments, as described herein, includes generating a command buffer having a plurality of kernels relating to a plurality of workloads to be executed at a graphics processing unit (GPU), and pipelining the workloads to be processed at the GPU, where pipelining includes scheduling each kernel to be executed on the GPU based on at least one of availability of resource threads and status of one or more dependency events relating to each kernel in relation to other kernels of the plurality of kernels.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a processing device coupled to memory, wherein the processing device facilitates: coalescing kernels logic to generate a command buffer having a plurality of kernels relating to a plurality of workloads to be processed by a graphics processor; dependency checking logic to check, in runtime, availability of resource threads of the graphics processor that remain idle due to one or more pending dependency contingencies associated with one or more kernels of one or more workloads in relation to other kernels of the plurality of kernels of the command buffer; and state management logic to acquire, in runtime, one or more idle resource threads to facilitate pipelining of other workloads in parallel execution based on the other kernels that do not have one or more pending dependency contingencies, wherein the other workloads are processed by the graphics processor using the one or more idle resource threads without having to stall the graphics processor for processing of the one or more workloads based on the one or more kernels and without seeking resolution of the one or more pending dependency contingencies associated with the one or more kernels, and wherein the state management logic is further to set up interface descriptors for the plurality of kernels prior to processing the respective kernel and wherein the dependency checking logic checks availability, in part, by loading a descriptor by returning the data last written by the respective kernel, and wherein the state management logic pushes a write cycle in a pipelined workload based on a subsequent read as determined using a respective interface descriptor. 2. The apparatus of claim 1 , wherein the processing device facilitates the dependency checking logic to determine status of one or more dependency events to determine, in runtime, whether a dependency event remains unresolved such that a base kernel is to be executed, at least partially, to resolve the one or more pending dependency contingencies to allow for initiation of execution of the one or more subsequent kernels of the plurality of kernels. 3. The apparatus of claim 2 , wherein the processing device facilitates data coherency management logic to detect the one or more idle resource threads of the resource threads, wherein the one or more idle resource threads remain unused due to the one or more dependency events that remain unresolved. 4. The apparatus of claim 2 , wherein the one or more dependency events include at least one of incompletion of processing of a base command associated with the base kernel, and unavailability of the one or more idle resource threads, wherein one or more subsequent kernels are associated with one or more subsequent commands depending from the base command. 5. The apparatus of claim 1 , wherein the resource threads comprise one or more of graphics processor hardware threads, command buffers, executable code, and memory heaps. 6. The apparatus of claim 5 , wherein a command buffer comprises a plurality of commands associated with the plurality of workloads, wherein the plurality of commands include one or more processing commands relating to the plurality of workloads and further include data having status data relating to the plurality of workloads, wherein the processing commands and the data are dispatched to the graphics processor using a pipeline. 7. The apparatus of claim 1 , wherein the coalescing kernels logic, upon receiving an indication of one or more pending dependency contingencies associated with one or more kernels of one or more workloads, generates an additional command buffer having a first group of the plurality of kernels and the command buffer having a second group of the plurality of kernels with these command buffers to be submitted to pipelines for processing by the graphics processor while overcoming the one or more dependency contingencies. 8. A method comprising: generating a command buffer having a plurality of kernels relating to a plurality of workloads to be processed by a graphics processor; setting up interface descriptors for the plurality of kernels prior to processing the respective kernels; checking, in runtime, availability of resource threads that remain idle due to one or more pending dependency contingencies associated with one or more kernels of one or more workloads in relation to other kernels of the plurality of kernels of the command buffer, wherein the availability is checked, in part, by loading a descriptor by returning the data last written by the respective kernel; and acquiring, in runtime, one or more idle resource threads based on the availability of resource threads to facilitate pipelining of other workloads in parallel execution based on the other kernels that do not have one or more pending dependency contingencies, wherein the other workloads are processed by the graphics processor using the one or more idle resource threads without having to stall the graphics processor for processing of the one or more workloads based on the one or more kernels and without seeking resolution of the one or more pending dependency contingencies associated with the one or more kernels, and wherein a write cycle is pushed in a pipelined workload based on a subsequent read as determined using a respective interface descriptor. 9. The method of claim 8 , further comprising determining status of one or more dependency events to determine, in runtime, whether a dependency event remains unresolved such that a base kernel is to be executed, at least partially, to resolve the one or more pending dependency contingencies to allow for initiation of execution of the one or more subsequent kernels of the plurality of kernels. 10. The method of claim 9 , further comprising detecting the one or more idle resource threads of the resource threads, wherein the one or more idle resource threads that remain unused due to the one or more dependency events that remain unresolved. 11. The method of claim 9 , wherein the one or more dependency events include at least one of incompletion of processing of a base command associated with the base kernel, and unavailability of the one or more idle resource threads, wherein one or more subsequent kernels are associated with one or more subsequent commands depending from the base command. 12. The method of claim 8 , wherein the resource threads comprise one or more of graphics processor hardware threads, command buffers, executable code, and memory heaps. 13. The method of claim 12 , wherein a command buffer comprises a plurality of commands associated with the plurality of workloads, wherein the plurality of commands include one or more processing commands relating to the plurality of workloads and further include data having status data relating to the plurality of workloads, wherein the processing commands and the data are dispatched to the graphics processor using a pipeline to the graphics processor. 14. At least one non-transitory machine-readable medium comprising a plurality of instructions, executed on a computing device, to facilitate the computing device to perform one or more operations comprising: generating a command buffer having a plurality of kernels relating to a plurality of workloads to be processed by a graphics processor; setting up interface descriptors for the plurality of kernels prior to processing the respective kernels; checking, in runtime, availability of resource threads that remain idle due to one or more pending dependency contingencies associated with one or more kernels of one or more workloads in relation to other kernels of the plurality of kernels of the
Processor architectures; Processor configuration, e.g. pipelining · CPC title
involving image processing hardware · CPC title
Subject matter not provided for in other groups of this subclass · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.