Techniques for sharing priorities between streams of work and dynamic parallelism

US9575760B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9575760-B2
Application numberUS-201313897123-A
CountryUS
Kind codeB2
Filing dateMay 17, 2013
Priority dateMay 17, 2013
Publication dateFeb 21, 2017
Grant dateFeb 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment sets forth a method for assigning priorities to kernels launched by a software application and executed within a stream of work on a parallel processing subsystem that supports dynamic parallelism. First, the software application assigns a maximum nesting depth for dynamic parallelism. The software application then assigns a stream priority to a stream. These assignments cause a driver to map the stream priority to a device priority and, subsequently, associate the device priority with the stream. As part of the mapping, the driver ensures that each device priority is at least the maximum nesting depth higher than the device priorities associated with any lower priority streams. Subsequently, the driver launches any kernel included in the stream with the device priority associated with the stream. Advantageously, by strategically assigning the maximum nesting depth and prioritizing streams, an application developer may increase the overall processing efficiency of the software application.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for mapping a plurality of stream priorities associated with a software application to a plurality of device priorities supported by a parallel processor, the method comprising: receiving a first request from the software application to associate a first stream with a first stream priority, wherein the first stream is for execution within the parallel processor; mapping the first stream priority to a first device priority; receiving a second request from the software application to associate a second stream with a second stream priority, wherein the second stream priority is higher in priority than the first stream priority, and the second stream also is for execution within the parallel processor; and mapping the second stream priority to a second device priority, wherein the second device priority is at least a maximum nesting depth higher in priority than the first device priority. 2. The method of claim 1 , further comprising identifying the maximum nesting depth by: receiving a request from the software application specifying the maximum nesting depth; storing the maximum nesting depth in a memory resource; and subsequently accessing the memory resource to read the maximum nesting depth. 3. The method of claim 1 , wherein the maximum nesting depth comprises a default maximum nesting depth. 4. The method of claim 1 , further comprising: launching a first work component within the second stream for execution within the parallel processor at the second device priority. 5. The method of claim 4 , wherein the first work component comprises a first function that is executable via a plurality of parallel threads. 6. The method of claim 4 , wherein the first work component, when executing within the parallel processor, launches a second work component for execution within the parallel processor, and wherein the second work component comprises a child of the first work component. 7. The method of claim 6 , wherein the second work component is associated with a third device priority that is higher in priority than the second device priority and higher in priority than the first device priority. 8. The method of claim 1 , further comprising: receiving a third request from the software application to associate a third stream with a third stream priority, wherein the third stream priority is higher in priority than the second stream priority; identifying that the highest device priority supported by the parallel processor is separated from the second device priority by less than twice the maximum nesting depth; and mapping the third stream priority to the second device priority. 9. The method of claim 1 , wherein the maximum nesting depth is less than a maximum number of execution levels supported by the parallel processor. 10. The method of claim 1 , wherein the maximum nesting depth is equal to a maximum number of execution levels supported by the parallel processor. 11. A non-transitory computer-readable storage medium including instructions that, when executed by a processor, cause the processor to map a plurality of stream priorities associated with a software application to a plurality of device priorities supported by a parallel processor subsystem, by performing the steps of: receiving a first request from the software application to associate a first stream with a first stream priority, wherein the first stream is for execution within the parallel processor; mapping the first stream priority to a first device priority; receiving a second request from the software application to associate a second stream with a second stream priority, wherein the second stream priority is higher in priority than the first stream priority, and the second stream also is for execution within the parallel processor; and mapping the second stream priority to a second device priority, wherein the second device priority is at least a maximum nesting depth higher in priority than the first device priority. 12. The non-transitory computer-readable storage medium of claim 11 , further comprising identifying the maximum nesting depth by: receiving a request from the software application specifying the maximum nesting depth; storing the maximum nesting depth in a memory resource; and subsequently accessing the memory resource to read the maximum nesting depth. 13. The non-transitory computer-readable storage medium of claim 11 , further comprising launching a first work component within the second stream for execution within the parallel processor at the second device priority. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the first work component comprises a first function that is executable via a plurality of parallel threads. 15. The non-transitory computer-readable storage medium of claim 13 , wherein the first work component, when executing within the parallel processor, launches a second work component for execution within the parallel processor, and wherein the second work component comprises a child of the first work component. 16. The non-transitory computer-readable medium of claim 15 , wherein the second work component is associated with a third device priority that is higher in priority than the second device priority and higher in priority than the first device priority. 17. The non-transitory computer-readable storage medium of claim 11 , further comprising: receiving a third request from the software application to associate a third stream with a third stream priority, wherein the third stream priority is higher in priority than the second stream priority; identifying that the highest device priority supported by the parallel processor is separated from the second device priority by less than twice the maximum nesting depth; and mapping the third stream priority to the second device priority. 18. The non-transitory computer-readable storage medium of claim 11 , wherein the maximum nesting depth is less than a maximum number of execution levels supported by the parallel processor. 19. The non-transitory computer-readable storage medium of claim 11 , wherein the maximum nesting depth is equal to a maximum number of execution levels supported by the parallel processor. 20. A system configured to map a plurality of stream priorities associated with a software application to a plurality of device priorities supported by a parallel processor, the system comprising: a memory that includes a driver program; and a processor that, when executing the driver program, is configured to: identify a maximum nesting depth that limits the number of nesting levels associated with child kernels that are launched by other kernels executing on the parallel processor; map a first stream priority associated with a first stream of work from the software application to a first device priority; and map a second stream priority associated with a second stream of work from the software application to a second device priority, wherein the second device priority is at least the maximum nesting depth higher in priority than the first device priority. 21. The system of claim 20 , wherein the maximum nesting depth is based on a maximum number of execution levels supported by the parallel processing subsystem. 22. The system of claim 19 , wherein the processor, when executing the driver program, is further configured to launch a first work component within the second stream of work for execution within parallel processor at the s

Assignees

Inventors

Classifications

  • by program, e.g. task dispatcher, supervisor, operating system · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • G06F9/5038Primary

    considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Priority · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9575760B2 cover?
One embodiment sets forth a method for assigning priorities to kernels launched by a software application and executed within a stream of work on a parallel processing subsystem that supports dynamic parallelism. First, the software application assigns a maximum nesting depth for dynamic parallelism. The software application then assigns a stream priority to a stream. These assignments cause a …
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/5038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).