Simplified Hash Table
US-2024422006-A1 · Dec 19, 2024 · US
US9928109B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9928109-B2 |
| Application number | US-201213467804-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 9, 2012 |
| Priority date | May 9, 2012 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of the present disclosure sets forth a technique for enforcing cross stream dependencies in a parallel processing subsystem such as a graphics processing unit. The technique involves queuing waiting events to create cross stream dependencies and signaling events to indicated completion to the waiting events. A scheduler kernel examines a task status data structure from a corresponding stream and updates dependency counts for tasks and events within the stream. When each task dependency for a waiting event is satisfied, an associated task may execute.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for processing a plurality of tasks across a group of threads, the method comprising: retrieving a first item from a first queue that stores processing tasks, wait events, and signaling events, the first queue being executed by a first thread; determining that the first item comprises a signaling event and executing the signaling event, wherein a wait event in a second queue is dependent on the signaling event, the second queue being executed by a second thread; in response to executing the signaling event, decrementing a dependency count associated with the wait event in the second queue; and removing the first item from the first queue, wherein the first thread and the second thread execute within a graphics processing subsystem and at least one thread of the graphics processing subsystem generates at least one of a wait event and a signaling event stored in the first queue, wherein the graphics processing subsystem is coupled to a central processing unit (CPU) and receives processing tasks from the CPU. 2. The method of claim 1 , wherein the dependency count represents a number of different other tasks or events that the wait event in the second queue is waiting for to complete before the wait event can complete. 3. The method of claim 2 , further comprising recursively traversing a plurality of pointers that point to a plurality of nodes, wherein each pointer points to a different node, and each node is associated with one of the different other tasks or events. 4. The method of claim 1 , wherein the wait event in the second queue is further dependent on a given task. 5. The method of claim 4 , wherein the wait event in the second queue functions to block the execution of any additional task until the given task is completed. 6. The method of claim 5 , wherein additional tasks reside in the second queue behind the wait event. 7. The method of claim 4 , further comprising determining that the dependency count is equal to zero. 8. The method of claim 7 , further comprising retrieving a second item from the first queue. 9. The method of claim 8 , further comprising determining that the second item in the first queue comprises a task, and causing the task to be executed. 10. The method of claim 1 , wherein: the second queue stores processing tasks having cross dependencies with tasks stored in the first queue; and the graphics processing subsystem manages cross dependencies between the first queue and the second queue. 11. The method of claim 10 , wherein: the graphics processing subsystem manages cross dependencies between the first queue and the second queue without intervention from the CPU. 12. The method of claim 1 , wherein the graphics processing subsystem processes, via non-locking operations, the first queue and the second queue without intervention from the CPU. 13. The method of claim 1 , wherein the graphics processing subsystem manages, via non-locking operations, cross dependencies between tasks of the first queue and the second queue without intervention from the CPU. 14. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to process a plurality of tasks across a group of threads, by performing the steps of: retrieving a first item from a first queue that stores processing tasks, wait events, and signaling events, the first queue being executed by a first thread; determining that the first item comprises a signaling event and executing the signaling event, wherein a wait event in a second queue is dependent on the signaling event, the second queue being executed by a second thread; in response to executing the signaling event, decrementing a dependency count associated with the wait event in the second queue; and removing the first item from the first queue, wherein the first thread and the second thread execute within a graphics processing subsystem and at least one thread of the graphics processing subsystem generates at least one of a wait event and a signaling event stored in the first queue, wherein the graphics processing subsystem is coupled to a central processing unit (CPU) and receives processing tasks from the CPU. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the dependency count represents a number of different other tasks or events that the wait event in the second queue is waiting for to complete before the wait event can complete. 16. The non-transitory computer-readable storage medium of claim 15 , further comprising recursively traversing a plurality of pointers that point to a plurality of nodes, wherein each pointer points to a different node, and each node is associated with one of the different other tasks or events. 17. The non-transitory computer-readable storage medium of claim 14 , wherein the wait event in the second queue is further dependent on a given task. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the wait event in the second queue functions to block the execution of any additional task until the given task is completed. 19. The non-transitory computer-readable storage medium of claim 18 , wherein additional tasks reside in the second queue behind the wait event. 20. The non-transitory computer-readable storage medium of claim 17 , further comprising determining that the dependency count is equal to zero. 21. The non-transitory computer-readable storage medium of claim 20 , further comprising retrieving a second item from the first queue. 22. The non-transitory computer-readable storage medium of claim 21 , further comprising determining that the second item in the first queue comprises a task, and causing the task to be executed. 23. A computing device, comprising: a central processing unit; and a parallel processing subunit coupled to the central processing unit, comprising: a graphics processing subsystem that includes a streaming multiprocessor configured to: retrieve a first item from a first queue that stores processing tasks, wait events, and signaling events, the first queue being executed by a first thread; determine that the first item comprises a signaling event and executing the signaling event, wherein a wait event in a second queue is dependent on the signaling event, the second queue being executed by a second thread; in response to executing the signaling event, decrementing a dependency count associated with the wait event in the second queue; and remove the first item from the first queue, wherein the first thread and the second thread execute within the graphics processing subsystem and at least one thread of the graphics processing subsystem generates at least one of a wait event and a signaling event stored in the first queue, wherein the graphics processing subsystem is coupled to the central processing unit (CPU) and receives processing tasks from the CPU.
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Precedence · CPC title
Event management; Broadcasting; Multicasting; Notifications · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.