Providing multi-tier query execution options in a serverless query environment
US-2021019320-A1 · Jan 21, 2021 · US
US11113782B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11113782-B2 |
| Application number | US-201916601831-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 15, 2019 |
| Priority date | Oct 15, 2019 |
| Publication date | Sep 7, 2021 |
| Grant date | Sep 7, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various examples are disclosed for dynamic kernel slicing for virtual graphics processing unit (vGPU) sharing in serverless computing systems. A computing device is configured to provide a serverless computing service, receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code, determine a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels based on an optimization function that considers a load on a GPU, determine an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with a scheduling policy, and execute the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule.
Opening claim text (preview).
Therefore, the following is claimed: 1. A system for dynamic kernel slicing in a serverless computing service, comprising: at least one computing device; program instructions stored in memory and executable in the at least one computing device that, when executed by the at least one computing device, direct the at least one computing device to: receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code on an underlying physical graphics processing unit (GPU); in response to receipt of the request for execution of the program code, determine a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels using an optimization function in accordance with a scheduling policy; determine an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with the scheduling policy; and execute the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule. 2. The system of claim 1 , wherein: the request for execution of the program code is associated with an execution deadline; and the individual ones of the sub-kernels are scheduled in accordance with the scheduling policy based at least in part on the execution deadline. 3. The system of claim 2 , wherein the execution schedule and the slice size are determined as a function of at least two of: the execution deadline, a number of blocks in a task of the compute kernel, an arrival time of the task, a start time of a workload switch on the GPU, an end time of the workload switch on the GPU, a load at the time of the workload switch on the GPU, a total number of load switches on the GPU, a set of possible slice sizes for the task, a total number of workloads, an expected execution time of one of the sub-kernels, an elapsed time, and a total number of tasks to be scheduled. 4. The system of claim 3 , wherein: the scheduling policy maximizes a number of tasks completed with the execution deadline; and the execution schedule and the slice size are determined using the optimization function in accordance with the scheduling policy. 5. The system of claim 4 , wherein the optimization function comprises an integer non-linear program (INLP). 6. The system of claim 5 , wherein the scheduling policy is further determined using a round-robin routine or a priority-based routine. 7. The system of claim 1 , wherein the at least one computing device is further directed to assign the individual ones of the sub-kernels to a corresponding one of a plurality of containers of at least one virtual machine. 8. A method for dynamic kernel slicing in a serverless computing service, comprising: receiving a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code on an underlying physical graphics processing unit (GPU); in response to receipt of the request for execution of the program code, determining a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels using an optimization function in accordance with a scheduling policy; determining an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with the scheduling policy; and executing the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule. 9. The method of claim 8 , wherein: the request for execution of the program code is associated with an execution deadline; and the individual ones of the sub-kernels are scheduled in accordance with the scheduling policy based at least in part on the execution deadline. 10. The method of claim 9 , wherein the execution schedule and the slice size are determined as a function of at least two of: the execution deadline, a number of blocks in a task of the compute kernel, an arrival time of the task, a start time of a workload switch on the GPU, an end time of the workload switch on the GPU, a load at the time of the workload switch on the GPU, a total number of load switches on the GPU, a set of possible slice sizes for the task, a total number of workloads, an expected execution time of one of the sub-kernels, an elapsed time, and a total number of tasks to be scheduled. 11. The method of claim 10 , wherein: the scheduling policy maximizes a number of tasks completed with the execution deadline; and the execution schedule and the slice size are determined using the optimization function in accordance with the scheduling policy. 12. The method of claim 11 , wherein the optimization function comprises an integer non-linear program (INLP). 13. The method of claim 12 , wherein the scheduling policy is further determined using a round-robin routine or a priority-based routine. 14. The method of claim 8 , further comprising assigning the individual ones of the sub-kernels to a corresponding one of a plurality of containers of at least one virtual machine. 15. A non-transitory computer-readable medium comprising program instructions for dynamic kernel slicing in a serverless computing service that, when executed by at least one computing device, direct the at least one computing device to: receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code on an underlying physical graphics processing unit (GPU); in response to receipt of the request for execution of the program code, determine a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels using an optimization function in accordance with a scheduling policy; determine an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with the scheduling policy; and execute the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule. 16. The non-transitory computer-readable medium of claim 15 , wherein: the request for execution of the program code is associated with an execution deadline; and the individual ones of the sub-kernels are scheduled in accordance with the scheduling policy based at least in part on the execution deadline. 17. The non-transitory computer-readable medium of claim 16 , wherein the execution schedule and the slice size are determined as a function of at least two of: the execution deadline, a number of blocks in a task of the compute kernel, an arrival time of the task, a start time of a workload switch on the GPU, an end time of the workload switch on the GPU, a load at the time of the workload switch on the GPU, a total number of load switches on the GPU, a set of possible slice sizes for the task, a total number of workloads, an expected execution time of one of the sub-kernels, an elapsed time, and a total number of tasks to be scheduled. 18. The non-transitory computer-readable medium of claim 17 , wherein: the scheduling policy maximizes a number of tasks completed with the execution deadline; and the execution schedule
Offload · CPC title
Task decomposition · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
considering the load · CPC title
Logical partitioning of resources; Management or configuration of virtualized resources (specific details on emulation or internal functioning of virtual machines G06F9/455) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.