Method and system for processing nested stream events

US9928109B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9928109-B2
Application numberUS-201213467804-A
CountryUS
Kind codeB2
Filing dateMay 9, 2012
Priority dateMay 9, 2012
Publication dateMar 27, 2018
Grant dateMar 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present disclosure sets forth a technique for enforcing cross stream dependencies in a parallel processing subsystem such as a graphics processing unit. The technique involves queuing waiting events to create cross stream dependencies and signaling events to indicated completion to the waiting events. A scheduler kernel examines a task status data structure from a corresponding stream and updates dependency counts for tasks and events within the stream. When each task dependency for a waiting event is satisfied, an associated task may execute.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for processing a plurality of tasks across a group of threads, the method comprising: retrieving a first item from a first queue that stores processing tasks, wait events, and signaling events, the first queue being executed by a first thread; determining that the first item comprises a signaling event and executing the signaling event, wherein a wait event in a second queue is dependent on the signaling event, the second queue being executed by a second thread; in response to executing the signaling event, decrementing a dependency count associated with the wait event in the second queue; and removing the first item from the first queue, wherein the first thread and the second thread execute within a graphics processing subsystem and at least one thread of the graphics processing subsystem generates at least one of a wait event and a signaling event stored in the first queue, wherein the graphics processing subsystem is coupled to a central processing unit (CPU) and receives processing tasks from the CPU. 2. The method of claim 1 , wherein the dependency count represents a number of different other tasks or events that the wait event in the second queue is waiting for to complete before the wait event can complete. 3. The method of claim 2 , further comprising recursively traversing a plurality of pointers that point to a plurality of nodes, wherein each pointer points to a different node, and each node is associated with one of the different other tasks or events. 4. The method of claim 1 , wherein the wait event in the second queue is further dependent on a given task. 5. The method of claim 4 , wherein the wait event in the second queue functions to block the execution of any additional task until the given task is completed. 6. The method of claim 5 , wherein additional tasks reside in the second queue behind the wait event. 7. The method of claim 4 , further comprising determining that the dependency count is equal to zero. 8. The method of claim 7 , further comprising retrieving a second item from the first queue. 9. The method of claim 8 , further comprising determining that the second item in the first queue comprises a task, and causing the task to be executed. 10. The method of claim 1 , wherein: the second queue stores processing tasks having cross dependencies with tasks stored in the first queue; and the graphics processing subsystem manages cross dependencies between the first queue and the second queue. 11. The method of claim 10 , wherein: the graphics processing subsystem manages cross dependencies between the first queue and the second queue without intervention from the CPU. 12. The method of claim 1 , wherein the graphics processing subsystem processes, via non-locking operations, the first queue and the second queue without intervention from the CPU. 13. The method of claim 1 , wherein the graphics processing subsystem manages, via non-locking operations, cross dependencies between tasks of the first queue and the second queue without intervention from the CPU. 14. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to process a plurality of tasks across a group of threads, by performing the steps of: retrieving a first item from a first queue that stores processing tasks, wait events, and signaling events, the first queue being executed by a first thread; determining that the first item comprises a signaling event and executing the signaling event, wherein a wait event in a second queue is dependent on the signaling event, the second queue being executed by a second thread; in response to executing the signaling event, decrementing a dependency count associated with the wait event in the second queue; and removing the first item from the first queue, wherein the first thread and the second thread execute within a graphics processing subsystem and at least one thread of the graphics processing subsystem generates at least one of a wait event and a signaling event stored in the first queue, wherein the graphics processing subsystem is coupled to a central processing unit (CPU) and receives processing tasks from the CPU. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the dependency count represents a number of different other tasks or events that the wait event in the second queue is waiting for to complete before the wait event can complete. 16. The non-transitory computer-readable storage medium of claim 15 , further comprising recursively traversing a plurality of pointers that point to a plurality of nodes, wherein each pointer points to a different node, and each node is associated with one of the different other tasks or events. 17. The non-transitory computer-readable storage medium of claim 14 , wherein the wait event in the second queue is further dependent on a given task. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the wait event in the second queue functions to block the execution of any additional task until the given task is completed. 19. The non-transitory computer-readable storage medium of claim 18 , wherein additional tasks reside in the second queue behind the wait event. 20. The non-transitory computer-readable storage medium of claim 17 , further comprising determining that the dependency count is equal to zero. 21. The non-transitory computer-readable storage medium of claim 20 , further comprising retrieving a second item from the first queue. 22. The non-transitory computer-readable storage medium of claim 21 , further comprising determining that the second item in the first queue comprises a task, and causing the task to be executed. 23. A computing device, comprising: a central processing unit; and a parallel processing subunit coupled to the central processing unit, comprising: a graphics processing subsystem that includes a streaming multiprocessor configured to: retrieve a first item from a first queue that stores processing tasks, wait events, and signaling events, the first queue being executed by a first thread; determine that the first item comprises a signaling event and executing the signaling event, wherein a wait event in a second queue is dependent on the signaling event, the second queue being executed by a second thread; in response to executing the signaling event, decrementing a dependency count associated with the wait event in the second queue; and remove the first item from the first queue, wherein the first thread and the second thread execute within the graphics processing subsystem and at least one thread of the graphics processing subsystem generates at least one of a wait event and a signaling event stored in the first queue, wherein the graphics processing subsystem is coupled to the central processing unit (CPU) and receives processing tasks from the CPU.

Assignees

Inventors

Classifications

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Precedence · CPC title

  • Event management; Broadcasting; Multicasting; Notifications · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9928109B2 cover?
One embodiment of the present disclosure sets forth a technique for enforcing cross stream dependencies in a parallel processing subsystem such as a graphics processing unit. The technique involves queuing waiting events to create cross stream dependencies and signaling events to indicated completion to the waiting events. A scheduler kernel examines a task status data structure from a correspo…
Who is the assignee on this patent?
Durant Luke, Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).