Efficient memory virtualization in multi-threaded processing units

US10169091B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10169091-B2
Application numberUS-201213660799-A
CountryUS
Kind codeB2
Filing dateOct 25, 2012
Priority dateOct 25, 2012
Publication dateJan 1, 2019
Grant dateJan 1, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for scheduling tasks for execution in a parallel processor comprising two or more streaming multiprocessors, the method comprising: receiving a set of tasks associated with a first processing context related to a first page table included in a plurality of page tables; selecting a first task that is associated with a first address space identifier (ASID) from the set of tasks and associated with the first processing context; determining a minimum a number of streaming multiprocessors included in the two or more streaming multiprocessors able to execute the tasks included in the set of tasks based on a number of tasks each streaming multiprocessor is able to execute concurrently, wherein the minimum number of streaming multiprocessors includes at least a first streaming multiprocessor; assigning the tasks included in the set of tasks to the minimum number of streaming multiprocessors; selecting the first streaming multiprocessor from the two or more streaming multiprocessors to execute the first task; scheduling the first task to execute on the first streaming multiprocessor; selecting a second task that is associated with a second ASID from the set of tasks and associated with the first processing context; and scheduling the second task to execute on the first streaming multiprocessor, wherein scheduling the second task occurs prior to scheduling any other task from the set of tasks to execute on a second streaming multiprocessor included in the two or more streaming multiprocessors. 2. The method of claim 1 , wherein selecting the first streaming multiprocessor comprises identifying that the first streaming multiprocessor has previously been assigned a task included in the set of tasks associated with the first processing context, which establishes that the first streaming multiprocessor has an affinity to the first processing context. 3. The method of claim 2 , wherein selecting the first streaming multiprocessor minimizes a maximum prevailing workload for all streaming multiprocessor executing tasks associated with the first processing context. 4. The method of claim 1 , wherein the first task comprises a thread grid. 5. The method of claim 1 , wherein the first page table includes virtual address to physical address mappings associated with a first virtual address space corresponding to the first processing context, and a second page table includes virtual address to physical address mappings associated a second virtual address space corresponding to the first processing context. 6. The method of claim 1 , wherein the first page table and a second page table are included in the plurality of page tables, and each page table included in the plurality of page tables includes virtual address to physical address mappings associated a different virtual address space. 7. The method of claim 1 , further comprising: receiving a bind command from a front end context switch; and in response, associating the first page table with the first ASID. 8. The method of claim 1 , further comprising: receiving a bind command from a front end context switch; and in response, invalidating one or more entries in a first translation lookaside buffer (TLB) that are associated with the first context. 9. The method of claim 1 , wherein the first task is associated with a first thread program executing on the first streaming multiprocessor and the second task is associated with a second thread program executing on the first streaming multiprocessor. 10. The method of claim 9 , wherein the first streaming multiprocessor simultaneously executes the first thread program and the second thread program within the first processing context. 11. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to schedule tasks for execution on a first streaming multiprocessor unit, the method comprising: receiving a set of tasks associated with a first processing context related to a first page table included in a plurality of page tables; selecting a first task that is associated with a first address space identifier (ASID) from the set of tasks and associated with the first processing context; determining a minimum number of streaming multiprocessors included in the two or more streaming multiprocessors able to execute the tasks included in the set of tasks based on a number of tasks each streaming multiprocessor is able to execute concurrently, wherein the minimum number of streaming multiprocessors includes at least a first streaming multiprocessor; assigning the tasks included in the set of tasks to the minimum number of streaming multiprocessors; selecting the first streaming multiprocessor from the two or more streaming multiprocessors to execute the first task; scheduling the first task to execute on the first streaming multiprocessor; selecting a second task that is associated with a second ASID from the set of tasks and associated with the first processing context; and scheduling the second task to execute on the first streaming multiprocessor, wherein scheduling the second task occurs prior to scheduling any other task from the set of tasks to execute on a second streaming multiprocessor included in the two or more streaming multiprocessors. 12. The computer-readable storage medium of claim 11 , wherein selecting the first streaming multiprocessor comprises identifying that the first streaming multiprocessor has previously been assigned a task associated with the first processing context, which establishes that the first streaming multiprocessor has an affinity to the first processing context. 13. The computer-readable storage medium of claim 12 , wherein selecting the first streaming multiprocessor minimizes a maximum prevailing workload for all streaming multiprocessor executing tasks associated with the first processing context. 14. The computer-readable storage medium of claim 11 , wherein the first task comprises a thread grid. 15. The computer-readable storage medium of claim 11 , wherein selecting the first streaming multiprocessor maximizes at least one of a translation lookaside buffer (TLB) cache affinity and a data cache affinity relative to the tasks included in the set of tasks. 16. The computer-readable storage medium of claim 11 , further comprising determining that the tasks included in the set of tasks are associated with a first number of different ASIDs, and in response, determining that the tasks included in the set of tasks should be assigned to the minimum number of streaming multiprocessors included in the two or more streaming multiprocessors. 17. A computing device, comprising: a central processing unit that executes a process having a first processing context; and a parallel processing subunit coupled to the central processing unit, comprising: a subsystem that includes a streaming multiprocessor that: receives a set of tasks associated with a first processing context related to a first page table included in a plurality of page tables; selects a first task that is associated with a first address space identifier (ASID) from the set of tasks and associated with the first processing context; determines a minimum number of streaming multiprocessors included in the two or more streaming multiprocessors able to execute the tasks included in the set of tasks based on a number of tasks each streaming multiprocessor is able to execute concurrently, wherein the minimum number of streaming multiprocessors includes at least a first streaming multiprocessor

Assignees

Inventors

Classifications

  • the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Program initiating; Program switching, e.g. by interrupt · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Hypervisor-specific management and integration aspects · CPC title

  • Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10169091B2 cover?
A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to phys…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/5033. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).