Dynamically provisioning and scaling graphic processing units for data analytic workloads in a hardware cloud

US9916636B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9916636-B2
Application numberUS-201615093965-A
CountryUS
Kind codeB2
Filing dateApr 8, 2016
Priority dateApr 8, 2016
Publication dateMar 13, 2018
Grant dateMar 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Server resources in a data center are disaggregated into shared server resource pools, including a graphics processing unit (GPU) pool. Servers are constructed dynamically, on-demand and based on workload requirements, by allocating from these resource pools. According to this disclosure, GPU utilization in the data center is managed proactively by assigning GPUs to workloads in a fine granularity and agile way, and de-provisioning them when no longer needed. In this manner, the approach is especially advantageous to automatically provision GPUs for data analytic workloads. The approach thus provides for a “micro-service” enabling data analytic workloads to automatically and transparently use GPU resources without providing (e.g., to the data center customer) the underlying provisioning details. Preferably, the approach dynamically determines the number and the type of GPUs to use, and then during runtime auto-scales the GPUs based on workload.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for processing a workload in a compute environment having a pool of graphics processing units (GPUs), comprising: receiving a request to process the workload; responsive to receipt of the request, determining a GPU configuration anticipated to be required to process the workload, the GPU configuration comprising a set of GPU requirements including a number of GPUs and a type of GPU; based on the set of GPU requirements, selecting GPUs from the pool that are available and assigning the selected GPUs to process the workload; and as the workload is being processed by the GPUs assigned, dynamically adjusting the GPU configuration as determined by monitored resource consumption of the workload. 2. The method as described in claim 1 wherein the GPU configuration is determined at least in part by determining whether a profile of the workload matches a profile associated with another workload that has been processed in the compute environment. 3. The method as described in claim 1 wherein the GPU configuration is determined at least in part by executing a test GPU configuration. 4. The method as described in claim 1 wherein dynamically adjusting the GPU configuration comprises: monitoring resource consumption associated with the GPUs assigned to process the workload; and based at least in part on the monitored resource consumption, modifying the number of assigned GPUs. 5. The method as described in claim 1 wherein the set of GPU requirements also include a value representing an extent to which the workload is suitable for processing on the GPUs. 6. The method as described in claim 1 wherein the GPU requirements are adjusted in accordance with one or more tasks in the workload. 7. The method as described in claim 1 wherein the GPU configuration is dynamically adjusted by provisioning or de-provisioning GPUs based on a current workload requirement. 8. The method as described in claim 1 wherein the compute environment is a disaggregated compute system comprising the GPUs assigned. 9. Apparatus for processing a workload in a compute environment having a pool of graphics processing units (GPUs), comprising: one or more hardware processors; computer memory holding computer program instructions executed by the hardware processors and operative to: receive a request to process the workload; responsive to receipt of the request, determine a GPU configuration anticipated to be required to process the workload, the GPU configuration comprising a set of GPU requirements including a number of GPUs and a type of GPU; based on the set of GPU requirements, select GPUs from the pool that are available and assign the selected available GPUs to process the workload; and as the workload is being processed by the GPUs assigned, dynamically adjust the GPU configuration as determined by monitored resource consumption of the workload. 10. The apparatus as described in claim 9 wherein the GPU configuration is determined at least in part by determining whether a profile of the workload matches a profile associated with another workload that has been processed in the compute environment. 11. The apparatus as described in claim 9 wherein the GPU configuration is determined at least in part by executing a test GPU configuration. 12. The apparatus as described in claim 9 wherein the computer program code to dynamically adjust the GPU configuration comprises computer program code to: monitor resource consumption associated with the GPUs assigned to process the workload; and based at least in part on the monitored resource consumption, modify the number of assigned GPUs. 13. The apparatus as described in claim 9 wherein the set of GPU requirements also include a value representing an extent to which the workload is suitable for processing on the GPUs. 14. The apparatus as described in claim 9 wherein the GPU requirements are adjusted in accordance with one or more tasks in the workload. 15. The apparatus as described in claim 9 wherein the GPU configuration is dynamically adjusted by provisioning or de-provisioning GPUs based on a current workload requirement. 16. The apparatus as described in claim 9 wherein the compute environment is a disaggregated compute system comprising the GPUs assigned. 17. A computer program product in a non-transitory computer readable medium for use in a data processing system for processing a workload in a compute environment having a pool of graphics processing units (GPUs), the computer program product holding computer program instructions executed in the data processing system and operative to: receive a request to process the workload; responsive to receipt of the request, determine a GPU configuration anticipated to be required to process the workload, the GPU configuration comprising a set of GPU requirements including a number of GPUs and a type of GPU; based on the set of GPU requirements, select GPUs from the pool that are available and assign the selected available GPUs to process the workload; and as the workload is being processed by the GPUs assigned, dynamically adjust the GPU configuration as determined by monitored resource consumption of the workload. 18. The computer program product as described in claim 17 wherein the GPU configuration is determined at least in part by determining whether a profile of the workload matches a profile associated with another workload that has been processed in the compute environment. 19. The computer program product as described in claim 17 wherein the GPU configuration is determined at least in part by executing a test GPU configuration. 20. The computer program product as described in claim 17 wherein the computer program code to dynamically adjust the GPU configuration comprises computer program code to: monitor resource consumption associated with the GPUs assigned to process the workload; and based at least in part on the monitored resource consumption, modify the number of assigned GPUs. 21. The computer program product as described in claim 17 wherein the set of GPU requirements also include a value representing an extent to which the workload is suitable for processing on the GPUs. 22. The computer program product as described in claim 17 wherein the GPU requirements are adjusted in accordance with one or more tasks in the workload. 23. The computer program product as described in claim 17 wherein the GPU configuration is dynamically adjusted by provisioning or de-provisioning GPUs based on a current workload requirement. 24. The computer program product as described in claim 17 wherein the compute environment is a disaggregated compute system comprising the GPUs assigned. 25. A data center facility, comprising: a set of server resource pools, the server resource pools comprising at least a graphics processing unit (GPU) resource pool; a GPU sizing component executing in a hardware processor responsive to receipt of a request to process a workload to determine a GPU configuration that includes a number of GPUs and a type of GPU; at least one disaggregated compute system comprising GPUs selected from the GPU resource pool to satisfy the GPU configuration; and a GPU scaling component executing in a hardware processor and responsive to receipt of resource consumption information as the workload is executing to scale-up or scale-down the GPU configuration. 26. The d

Assignees

Inventors

Classifications

  • Techniques for rebalancing the load in a distributed system · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • General purpose rendering architectures · CPC title

  • G06F9/5072Primary

    Grid computing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9916636B2 cover?
Server resources in a data center are disaggregated into shared server resource pools, including a graphics processing unit (GPU) pool. Servers are constructed dynamically, on-demand and based on workload requirements, by allocating from these resource pools. According to this disclosure, GPU utilization in the data center is managed proactively by assigning GPUs to workloads in a fine granular…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).