Method and system for providing shared memory access to graphics processing unit processes

US9547535B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9547535-B1
Application numberUS-43379709-A
CountryUS
Kind codeB1
Filing dateApr 30, 2009
Priority dateApr 30, 2009
Publication dateJan 17, 2017
Grant dateJan 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One or more embodiments of the invention set forth techniques to create a process in a graphical processing unit (GPU) that has access to memory buffers in the system memory of a computer system that are shared among a plurality of GPUs in the computer system. The GPU of the process is able to engage in Direct Memory Access (DMA) with any of the shared memory buffers thereby eliminating additional copying steps that have been needed to combine data output of the various GPUs without such shared access.

First claim

Opening claim text (preview).

I claim: 1. A computer-implemented method for creating a first process that executes on a first graphics processing unit (GPU) within a plurality of GPUs included in a computer system, the method comprising: maintaining a list of a set of processes executing on the plurality of GPUs, wherein the set of processes includes the first process; maintaining a list of address range entries, wherein each address range entry in the list of address range entries comprises an address range corresponding to a different shared memory buffer in a set of shared memory buffers; requesting that an address space be allocated in the first GPU for the first process; requesting that the first GPU map the address range associated with each of the address range entries in the list of address range entries into the allocated address space, wherein a first address range is associated with a pinned memory buffer that is mapped into a second address space allocated in a second GPU within the plurality of GPUs; and adding the first process to the list of the set of processes, wherein the first process has access to any shared memory buffer corresponding to at least one of the address ranges mapped into the allocated address space. 2. The method of claim 1 , wherein the list of the set of processes and the list of address range entries are red black tree data structures. 3. The method of claim 1 , wherein the list of the set of processes and the list of address range entries are maintained in the system memory. 4. The method of claim 1 , wherein each of the shared memory buffers in the set of shared memory buffers is a page-locked memory buffer. 5. The method of claim 1 , wherein the first GPU accesses any of the shared memory buffers in the set of shared memory buffers via a direct memory access operation. 6. The method of claim 1 , further comprising receiving a request to create the first process in the first GPU from an application executing on the computer system. 7. The method of claim 6 , further comprising the step of transmitting a notification to the application that the first process has been successfully created. 8. The method of claim 1 , wherein each GPU included in the plurality of GPUs maintains a separate page table that maps a different virtual address space associated with the respective GPU to the set of shared memory buffers. 9. The method of claim 8 , wherein each process in the set of processes is configured to directly access the set of shared memory buffers via the page table corresponding to the GPU included in the plurality of GPUs upon which the process is executing. 10. A non-transitory computer-readable medium including instructions that, when executed by a processing unit, causes the processing unit to create a first process that executes on a first graphics processing unit (GPU) within a plurality of GPUs of a computer system, by performing the steps of: maintaining a list of a set of processes executing on the plurality of GPUs, wherein the set of processes includes the first process; maintaining a list of address range entries, wherein each address range entry in the list of address range entries comprises an address range corresponding to a different shared memory buffer in a set of shared memory buffers; requesting that an address space be allocated in the first GPU for the first process; requesting that the first GPU map the address range associated with each of the address range entries in the list of address range entries into the allocated address space, wherein a first address range is associated with a pinned memory buffer that is mapped into a second address space allocated in a second GPU within the plurality of GPUs; and adding the first process to the list of the set of processes, wherein the first process has access to any shared memory buffer corresponding to at least one of the address ranges mapped into the allocated address space. 11. The non-transitory computer-readable medium of claim 10 , wherein the list of the set of processes and the list of address range entries are red black tree data structures. 12. The non-transitory computer-readable medium of claim 10 , wherein the list of the set of processes and the list of address range entries are maintained in the system memory. 13. The non-transitory computer-readable medium of claim 10 , each of the shared memory buffers in the set of shared memory buffers is a page-locked memory buffer. 14. The non-transitory computer-readable medium of claim 10 , wherein the first GPU accesses any of the shared memory buffers in the set of shared memory buffers via a direct memory access operation. 15. The non-transitory computer-readable medium of claim 10 , wherein the processing unit further performs the step of receiving a request to create the first process in the first GPU from an application executing on the computer system. 16. The non-transitory computer-readable medium of claim 15 wherein the processing unit further performs the step of transmitting a notification to the application that the first process has been successfully created. 17. A computing system configured to create a first process that executes on a first graphics processing unit (GPU), the computer system comprising: a system memory configured to store a set of shared memory buffers; the plurality of GPUs; and a processor coupled to the system memory, wherein the processor is configured perform the steps of: maintaining a list of a set of processes executing on the plurality of GPUs, maintaining a list of address range entries, wherein each address range entry in the list of address range entries comprises an address range corresponding to a different shared memory buffer in the set of shared memory buffers, requesting that an address space be allocated in the first GPU for the first process, requesting that the first GPU map the address range associated with each of the address range entries in the list of address range entries into the allocated address space, wherein a first address range is associated with a pinned memory buffer that is mapped into a second address space allocated in a second GPU within the plurality of GPUs, and adding the first process to the list of the set of processes, wherein the first process has access to any shared memory buffer corresponding to at least one of the address ranges mapped into the allocated address space. 18. The computing system of claim 17 , wherein the list of the set of processes and the list of address range entries are stored in the system memory. 19. The computing system of claim 17 , wherein the system memory is further configured to store a page table including a locked page table entry for each shared memory buffer in the set of shared memory buffers, wherein the locked page table entry comprises a mapping of the shared memory buffer to a virtual address space of an application requesting allocation of the shared memory buffer. 20. The computing system of claim 17 , wherein the first GPU is configured to access any of the shared memory buffers in the set of shared memory buffers via a direct memory access operation. 21. The computing system of claim 20 , wherein the first GPU further comprises a GPU memory including a page table for mapping virtual addresses of the first process to physical addresses. 22. The computing system of claim 21 , wherein the first GPU adds a page table entry in the page table of the GPU memory to map each of shared memory buffers in the set of shared

Assignees

Inventors

Classifications

  • Graphics controllers · CPC title

  • G06F9/544Primary

    Buffers; Shared memory; Pipes · CPC title

  • Networking aspects · CPC title

  • Power processing, i.e. workload management for processors involved in display operations, such as CPUs or GPUs · CPC title

  • Parallel handling of streams of display data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9547535B1 cover?
One or more embodiments of the invention set forth techniques to create a process in a graphical processing unit (GPU) that has access to memory buffers in the system memory of a computer system that are shared among a plurality of GPUs in the computer system. The GPU of the process is able to engage in Direct Memory Access (DMA) with any of the shared memory buffers thereby eliminating additio…
Who is the assignee on this patent?
Wilt Nicholas Patrick, Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/544. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).