Memory-aware request placement for virtual GPU enabled systems

US12229602B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12229602-B2
Application numberUS-202217733284-A
CountryUS
Kind codeB2
Filing dateApr 29, 2022
Priority dateJul 12, 2019
Publication dateFeb 18, 2025
Grant dateFeb 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are aspects of memory-aware placement in systems that include graphics processing units (GPUs) that are virtual GPU (vGPU) enabled. In some examples, graphics processing units (GPU) are identified in a computing environment. Graphics processing requests are received. A graphics processing request includes a GPU memory requirement. The graphics processing requests are processed using a graphics processing request placement model that minimizes a number of utilized GPUs that are utilized to accommodate the requests. Virtual GPUs (vGPUs) are created to accommodate the graphics processing requests according to the graphics processing request placement model. The utilized GPUs divide their GPU memories to provide a subset of the plurality of vGPUs.

First claim

Opening claim text (preview).

Therefore, the following is claimed: 1. A non-transitory computer-readable medium comprising machine readable instructions, wherein the instructions, when executed by at least one processor, cause at least one computing device to perform operations comprising: executing a scheduling service in a computing environment comprising one or more host computers, each of the one or more host computers having a virtualization layer that provides virtualized hardware for one or more virtualized computing instances (VCI); identifying, by the scheduling service, a plurality of graphics processing units (GPUs) in a computing environment, wherein each of the plurality of GPUs is configured with a virtual GPU (vGPU) profile comprising a memory reservation that represents a maximum GPU memory requirement that the respective GPU will support with that respective configured vGPU profile; sorting, by the scheduling service, a first list of the plurality of configured GPUs in increasing order of the memory requirement of the vGPU profile of each configured GPU; receiving, by the scheduling service, a plurality of graphics processing requests, each respective graphics processing request comprising a GPU memory requirement: sorting, by the scheduling service, a second list of the plurality of graphics processing requests according to a vGPU request placement model of a memory requirement of each respective graphics processing request; determining, by the scheduling service and with the vGPU request-placement model that considers the respective GPU memory requirement of each graphics processing request and the respective memory reservation of the respective vGPU profile of each configured GPU, that a first configured GPU in the sorted first list has a memory reservation that meets a memory requirement of a first memory request in the sorted second list; and assigning, based on a determination that the first configured GPU in the sorted first list has a memory reservation that meets a memory requirement of the first memory request in the sorted second list, the first memory request to the first configured GPU. 2. The non-transitory computer-readable medium of claim 1 , wherein the vGPU request placement model is a first-fit placement model. 3. The non-transitory computer-readable medium of claim 2 , wherein the vGPU request placement model uses: a vGPU increasing requests increasing (VIRI) heuristic; or a vGPU increasing requests decreasing (VIRD) heuristic. 4. The non-transitory computer-readable medium of claim 1 , wherein the operations further comprise: identifying, by the scheduling service, a second plurality of GPUs in the computing environment, wherein each of the second plurality of GPU is not configured with a GPU profile; sorting, by the scheduling service, a third list of the second plurality of GPUs by total GPU memory; determining, by the scheduling service and with the vGPU request placement model that a second configured GPU does not have a memory reservation that meets a memory requirement of a second memory request; in response a determination that the second configured GPU does not have a memory reservation that meets a memory requirement of the second memory request: causing, by the scheduling service, configuration of an additional GPU from the third list with a vGPU profile comprising a memory reservation that meets the GPU memory requirement of the second graphics processing request; and assigning the second graphics processing request to the additional GPU. 5. The non-transitory computer-readable medium of claim 1 , wherein the first graphics processing vGPU request comprises a request to perform general-purpose computing on GPU (GPGPU) for a Compute Unified Device Architecture (CUDA) application. 6. The non-transitory computer-readable medium of claim 1 , the operations further comprising: causing, by the scheduling agent, creation of a vGPU on the configured GPU to service the one of the plurality of memory requests. 7. The non-transitory computer-readable medium of claim 1 , wherein each of the vGPU profiles divides a memory of the respective configured GPU evenly into one or more vGPUs. 8. A method performed by at least one computing device executing machine-readable instructions, the method comprising: executing a scheduling service in a computing environment comprising one or more host computers, each of the one or more host computers having a virtualization layer that provides virtualized hardware for one or more virtualized computing instances (VCI); identifying, by the scheduling service, a plurality of graphics processing units (GPUs) in a computing environment, wherein each of the plurality of GPUs is configured with a virtual GPU (vGPU) profile comprising a memory reservation that represents a maximum GPU memory requirement that the respective GPU will support with that respective configured vGPU profile; sorting, by the scheduling service, a first list of the plurality of configured GPUs in increasing order of the memory requirement of the vGPU profile of each configured GPU; receiving, by the scheduling service, a plurality of graphics processing requests, each respective graphics processing request comprising a GPU memory requirement; sorting, by the scheduling service, a second list of the plurality of graphics processing requests according to a vGPU request placement model of a memory requirement of each respective graphics processing request; determining, by the scheduling service and with the vGPU-request placement model that considers the respective GPU memory requirement of each graphics processing request and the respective memory reservation of the respective vGPU profile of each configured GPU, that a first configured GPU in the sorted first list has a memory reservation that meets a memory requirement of a first memory request in the sorted second list; and assigning, based on a determination that the first configured GPU in the sorted first list has a memory reservation that meets a memory requirement of the first memory request in the sorted second list, the first memory request to the first configured GPU. 9. The method of claim 8 , wherein the vGPU request placement model is a first-fit placement model. 10. The method of claim 9 , wherein the vGPU request placement model uses: a vGPU increasing requests increasing (VIRI) heuristic; or a vGPU increasing requests decreasing (VIRD) heuristic. 11. The method of claim 8 , further comprising: identifying, by the scheduling service, a second plurality of GPUs in the computing environment, wherein each of the second plurality of GPU is not configured with a GPU profile; sorting, by the scheduling service, a third list of the second plurality of GPUs by total GPU memory; determining, by the scheduling service and with the vGPU request placement model that a second configured GPU does not have a memory reservation that meets a memory requirement of a second memory request; in response a determination that the second configured GPU does not have a memory reservation that meets a memory requirement of the second memory request: causing, by the scheduling service, configuration of an additional GPU from the third list with a vGPU profile comprising a memory reservation that meets the GPU memory requirement of the second graphics processing request; and assigning the second graphics processing request to the additional GPU. 12. The method of claim 8 , wherein the first vGPU request comprises a request to perform general-purpose computing on GPU (GPGPU) for a Compute Unified Device Architecture (CUDA) application. 13. The method of claim 8 , further compr

Assignees

Inventors

Classifications

  • Memory management, e.g. access or allocation · CPC title

  • I/O management, e.g. providing access to device drivers or storage · CPC title

  • the resource being the memory · CPC title

  • Hypervisor-specific management and integration aspects · CPC title

  • Distribution of virtual machine instances; Migration and load balancing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12229602B2 cover?
Disclosed are aspects of memory-aware placement in systems that include graphics processing units (GPUs) that are virtual GPU (vGPU) enabled. In some examples, graphics processing units (GPU) are identified in a computing environment. Graphics processing requests are received. A graphics processing request includes a GPU memory requirement. The graphics processing requests are processed using a…
Who is the assignee on this patent?
VMware LLC
What technology area does this patent fall under?
Primary CPC classification G06F9/5044. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).