Techniques for managing the execution order of multiple nested tasks executing on a parallel processor

US9436504B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9436504-B2
Application numberUS-201213467574-A
CountryUS
Kind codeB2
Filing dateMay 9, 2012
Priority dateMay 9, 2012
Publication dateSep 6, 2016
Grant dateSep 6, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present disclosure sets forth an enhanced way for GPUs to queue new computational tasks into a task metadata descriptor queue (TMDQ). Specifically, memory for context data is pre-allocated when a new TMDQ is created. A new TMDQ may be integrated with an existing TMDQ, where computational tasks within that TMDQ include task from each of the original TMDQs. A scheduling operation is executed on completion of each computational task in order to preserve sequential execution of tasks without the use of atomic locking operations. One advantage of the disclosed technique is that GPUs are enabled to queue computational tasks within TMDQs, and also create an arbitrary number of new TMDQs to any arbitrary nesting level, without intervention by the CPU. Processing efficiency is enhanced where the GPU does not wait while the CPU creates and queues tasks.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for processing a plurality of tasks being executed by a first group of threads and stored within a plurality of task metadata descriptor queues (TMDQs), the method comprising: receiving a notification that a first task included in the plurality of tasks has completed; determining within a co-processing unit whether all tasks included in a subset of the plurality of tasks and associated with a first TMDQ have executed; if all tasks included in the subset of the plurality of tasks have not executed, then launching a second task included in the plurality of tasks; and if all tasks included in the subset of the plurality of tasks have executed, then: updating a pointer in a first data structure associated with the first TMDQ; determining that a third task included in the plurality of tasks is about to be queued in the first TMDQ; and executing the third task. 2. The method of claim 1 , further comprising decrementing a counter related to a quantity of uncompleted tasks included in the plurality of tasks after receiving the notification that the first task has completed. 3. The method of claim 2 , further comprising determining that all tasks being executed by the first group of threads have completed based on the counter having a value of zero. 4. The method of claim 1 , wherein executing the third task further comprises waiting for a memory location within a second data structure associated with the third task to be updated to reflect where the third task is located. 5. The method of claim 1 , wherein the first TMDQ has been created at the request of a task generated prior to the first task and associated with a second group of threads. 6. The method of claim 1 , wherein the first task was inserted into the first TMDQ by a task generated prior to the first task and associated with a second group of threads. 7. The method of claim 1 , wherein the first TMDQ includes a first set of tasks corresponding to a first thread included in the first group of threads, and a second TMDQ includes a second set of tasks corresponding to a second thread included in the first group of threads. 8. The method of claim 1 , wherein the first TMDQ includes a first set of tasks corresponding to a first thread included in the first group of threads and a second set of tasks corresponding to a second thread included in the first group of threads. 9. The method of claim 1 , wherein determining whether all tasks included in the subset of the plurality of tasks have executed comprises determining whether a pointer to a last task stored in the first TMDQ points to the first task. 10. The method of claim 9 , wherein determining whether the pointer to the last task stored in the first TMDQ points to the last task comprises performing an atomic operation. 11. The computer-implemented method of claim 1 , wherein determining that a third task included in the plurality of tasks is about to be queued in the first TMDQ comprises: determining that a first pointer does not point to a completed task; and determining that a second pointer is null. 12. A subsystem for processing a plurality of tasks being executed by a first group of threads and stored within a plurality of task metadata descriptor queues (TMDQs), comprising: a task management unit configured to perform the steps of: receiving a notification that a first task included in the plurality of tasks has completed; determining within a co-processing unit whether all tasks included in a subset of the plurality of tasks and associated with a first TMDQ have executed; if all tasks included in the subset of the plurality of tasks have not executed, then launching a second task included in the plurality of tasks; and if all tasks included in the subset of the plurality of tasks have executed, then: updating a pointer in a first data structure associated with the first TMDQ; determining that a third task included in the plurality of tasks is about to be queued in the first TMDQ; and executing the third task. 13. The subsystem of claim 12 , further comprising decrementing a counter related to a quantity of uncompleted tasks included in the plurality of tasks after receiving the notification that the first task has completed. 14. The subsystem of claim 13 , further comprising determining that all tasks being executed by the first group of threads have completed based on the counter having a value of zero. 15. The subsystem of claim 12 , wherein executing the third task further comprises waiting for a memory location within a second data structure associated with the third task to be updated to reflect where the third task is located. 16. The subsystem of claim 12 , wherein the first TMDQ has been created at the request of a task generated prior to the first task and associated with a second group of threads. 17. The subsystem of claim 12 , wherein the first task was inserted into the first TMDQ by a task generated prior to the first task and associated with a second group of threads. 18. The subsystem of claim 12 , wherein the first TMDQ includes a first set of tasks corresponding to a first thread included in the first group of threads, and a second TMDQ includes a second set of tasks corresponding to a second thread included in the first group of threads. 19. The subsystem of claim 12 , wherein the first TMDQ includes a first set of tasks corresponding to a first thread included in the first group of threads and a second set of tasks corresponding to a second thread included in the first group of threads. 20. The subsystem of claim 12 , wherein determining whether all tasks included in the subset of the plurality of tasks have executed comprises determining whether a pointer to a last task stored in the first TMDQ points to the first task. 21. A computing device, comprising: a task management unit configured to process a plurality of tasks being executed by a first group of threads and stored within a plurality of task metadata descriptor queues (TMDQs) by performing the steps of: receiving a notification that a first task included in the plurality of tasks has completed; determining within a co-processing unit whether all tasks included in a subset of the plurality of tasks and associated with a first TMDQ have executed; if all tasks included in the subset of the plurality of tasks have not executed, then launching a second task included in the plurality of tasks; and if all tasks included in the subset of the plurality of tasks have executed, then: updating a pointer in a first data structure associated with the first TMDQ; determining that a third task included in the plurality of tasks is about to be queued in the first TMDQ; and executing the third task.

Assignees

Inventors

Classifications

  • Multiproc · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9436504B2 cover?
One embodiment of the present disclosure sets forth an enhanced way for GPUs to queue new computational tasks into a task metadata descriptor queue (TMDQ). Specifically, memory for context data is pre-allocated when a new TMDQ is created. A new TMDQ may be integrated with an existing TMDQ, where computational tasks within that TMDQ include task from each of the original TMDQs. A scheduling oper…
Who is the assignee on this patent?
Durant Luke, Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).