What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic kernel slicing for VGPU sharing in serverless computing systems

US11113782B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11113782-B2
Application number	US-201916601831-A
Country	US
Kind code	B2
Filing date	Oct 15, 2019
Priority date	Oct 15, 2019
Publication date	Sep 7, 2021
Grant date	Sep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various examples are disclosed for dynamic kernel slicing for virtual graphics processing unit (vGPU) sharing in serverless computing systems. A computing device is configured to provide a serverless computing service, receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code, determine a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels based on an optimization function that considers a load on a GPU, determine an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with a scheduling policy, and execute the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule.

First claim

Opening claim text (preview).

Therefore, the following is claimed: 1. A system for dynamic kernel slicing in a serverless computing service, comprising: at least one computing device; program instructions stored in memory and executable in the at least one computing device that, when executed by the at least one computing device, direct the at least one computing device to: receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code on an underlying physical graphics processing unit (GPU); in response to receipt of the request for execution of the program code, determine a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels using an optimization function in accordance with a scheduling policy; determine an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with the scheduling policy; and execute the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule. 2. The system of claim 1 , wherein: the request for execution of the program code is associated with an execution deadline; and the individual ones of the sub-kernels are scheduled in accordance with the scheduling policy based at least in part on the execution deadline. 3. The system of claim 2 , wherein the execution schedule and the slice size are determined as a function of at least two of: the execution deadline, a number of blocks in a task of the compute kernel, an arrival time of the task, a start time of a workload switch on the GPU, an end time of the workload switch on the GPU, a load at the time of the workload switch on the GPU, a total number of load switches on the GPU, a set of possible slice sizes for the task, a total number of workloads, an expected execution time of one of the sub-kernels, an elapsed time, and a total number of tasks to be scheduled. 4. The system of claim 3 , wherein: the scheduling policy maximizes a number of tasks completed with the execution deadline; and the execution schedule and the slice size are determined using the optimization function in accordance with the scheduling policy. 5. The system of claim 4 , wherein the optimization function comprises an integer non-linear program (INLP). 6. The system of claim 5 , wherein the scheduling policy is further determined using a round-robin routine or a priority-based routine. 7. The system of claim 1 , wherein the at least one computing device is further directed to assign the individual ones of the sub-kernels to a corresponding one of a plurality of containers of at least one virtual machine. 8. A method for dynamic kernel slicing in a serverless computing service, comprising: receiving a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code on an underlying physical graphics processing unit (GPU); in response to receipt of the request for execution of the program code, determining a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels using an optimization function in accordance with a scheduling policy; determining an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with the scheduling policy; and executing the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule. 9. The method of claim 8 , wherein: the request for execution of the program code is associated with an execution deadline; and the individual ones of the sub-kernels are scheduled in accordance with the scheduling policy based at least in part on the execution deadline. 10. The method of claim 9 , wherein the execution schedule and the slice size are determined as a function of at least two of: the execution deadline, a number of blocks in a task of the compute kernel, an arrival time of the task, a start time of a workload switch on the GPU, an end time of the workload switch on the GPU, a load at the time of the workload switch on the GPU, a total number of load switches on the GPU, a set of possible slice sizes for the task, a total number of workloads, an expected execution time of one of the sub-kernels, an elapsed time, and a total number of tasks to be scheduled. 11. The method of claim 10 , wherein: the scheduling policy maximizes a number of tasks completed with the execution deadline; and the execution schedule and the slice size are determined using the optimization function in accordance with the scheduling policy. 12. The method of claim 11 , wherein the optimization function comprises an integer non-linear program (INLP). 13. The method of claim 12 , wherein the scheduling policy is further determined using a round-robin routine or a priority-based routine. 14. The method of claim 8 , further comprising assigning the individual ones of the sub-kernels to a corresponding one of a plurality of containers of at least one virtual machine. 15. A non-transitory computer-readable medium comprising program instructions for dynamic kernel slicing in a serverless computing service that, when executed by at least one computing device, direct the at least one computing device to: receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of the program code on an underlying physical graphics processing unit (GPU); in response to receipt of the request for execution of the program code, determine a slice size to partition a compute kernel of the program code into a plurality of sub-kernels for concurrent execution by the vGPUs, the slice size being determined for individual ones of the sub-kernels using an optimization function in accordance with a scheduling policy; determine an execution schedule for executing the individual ones of the sub-kernels on the vGPUs in accordance with the scheduling policy; and execute the sub-kernels on the vGPUs as partitioned in accordance with the execution schedule. 16. The non-transitory computer-readable medium of claim 15 , wherein: the request for execution of the program code is associated with an execution deadline; and the individual ones of the sub-kernels are scheduled in accordance with the scheduling policy based at least in part on the execution deadline. 17. The non-transitory computer-readable medium of claim 16 , wherein the execution schedule and the slice size are determined as a function of at least two of: the execution deadline, a number of blocks in a task of the compute kernel, an arrival time of the task, a start time of a workload switch on the GPU, an end time of the workload switch on the GPU, a load at the time of the workload switch on the GPU, a total number of load switches on the GPU, a set of possible slice sizes for the task, a total number of workloads, an expected execution time of one of the sub-kernels, an elapsed time, and a total number of tasks to be scheduled. 18. The non-transitory computer-readable medium of claim 17 , wherein: the scheduling policy maximizes a number of tasks completed with the execution deadline; and the execution schedule

Assignees

Vmware Inc

Inventors

Classifications

G06F2209/509
Offload · CPC title
G06F2209/5017
Task decomposition · CPC title
G06F9/5066
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
G06F9/505
considering the load · CPC title
G06F9/5077
Logical partitioning of resources; Management or configuration of virtualized resources (specific details on emulation or internal functioning of virtual machines G06F9/455) · CPC title

Patent family

Related publications grouped by family.

View patent family 75383194

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11113782B2 cover?: Various examples are disclosed for dynamic kernel slicing for virtual graphics processing unit (vGPU) sharing in serverless computing systems. A computing device is configured to provide a serverless computing service, receive a request for execution of program code in the serverless computing service in which a plurality of virtual graphics processing units (vGPUs) are used in the execution of…
Who is the assignee on this patent?: Vmware Inc
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Providing multi-tier query execution options in a serverless query environment

GPU resource usage display and dynamic GPU resource allocation in a networked virtualization system

Workload placement and resource allocation for media production data center

Reconfigurable interconnect

Device and method for performing scheduling for virtualized graphics processing units

Dynamic task scheduling method for dispatching sub-tasks to computing devices of heterogeneous computing system and related computer readable medium

Container virtual machines for hadoop

Frequently asked questions