Method, system, and computer program product for dynamically scheduling machine learning inference jobs with different quality of services on a shared infrastructure

US11836642B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11836642-B2
Application numberUS-202218088193-A
CountryUS
Kind codeB2
Filing dateDec 23, 2022
Priority dateJan 17, 2020
Publication dateDec 5, 2023
Grant dateDec 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a request for system resources for an inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; assign the system resource to the inference job for processing the inference job; receive result data associated with processing of the inference job with the system resource; and update based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving or determining, with at least one processor, a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model, wherein each performance profile for each system resource includes a latency associated with the machine learning model for that system resource, a throughput associated with the machine learning model for that system resource, and an availability of that system resource for processing an inference job associated with the machine learning model, and wherein the plurality of system resources includes at least one central processing unit (CPU) and at least one graphics processing unit (GPU); receiving, with at least one processor, a request for system resources for the inference job associated with the machine learning model; determining, with at least one processor, a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; and assigning, with at least one processor, the system resource to the inference job for processing the inference job, wherein the system resource assigned to the inference job executes the machine learning model associated with the inference job to process the inference job. 2. The computer-implemented method of claim 1 , further comprising: receiving, with at least one processor, result data associated with processing of the inference job with the system resource; and updating, with at least one processor, based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model. 3. The computer-implemented method of claim 2 , wherein the result data includes at least one of the following: a latency associated with processing the inference job of the machine learning model with the system resource, a throughput associated with processing the inference job of the machine learning model with the system resource, or any combination thereof. 4. The computer-implemented method of claim 1 , wherein the quality of service requirement includes at least one of a latency requirement for the inference job associated with the machine learning model and a throughput requirement for the inference job associated with the machine learning model. 5. The computer-implemented method of claim 1 , wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine learning model includes assigning the inference job to one of a plurality of job queues based on the quality of service requirement associated with the inference job, wherein the plurality of job queues are associated with a plurality of different priorities, and wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine learning model is further based on a priority of a job queue to which the inference job is assigned. 6. The computer-implemented method of claim 2 , further comprising: receiving or determining, with at least one processor, a plurality of further performance profiles associated with the plurality of system resources, wherein each further performance profile is associated with a further machine learning model different than the machine learning model; receiving, with at least one processor, a further request for system resources for a further inference job associated with the further machine learning model; determining, with at least one processor, a further system resource of the plurality of system resources for processing the further inference job associated with the further machine learning model based on the plurality of further performance profiles and a further quality of service requirement associated with the further inference job; assigning, with at least one processor, the further system resource to the inference job for processing the inference job; receiving, with at least one processor, further result data associated with processing of the further inference job with the further system resource; and updating, with at least one processor, based on the further result data, a further performance profile of the plurality of the performance profiles associated with the system resource and the further machine learning model. 7. The computer-implemented method of claim 6 , wherein the system resource and the further system resource include a same system resource. 8. A computing system, comprising: one or more processors programmed and/or configured to: receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model, wherein each performance profile for each system resource includes a latency associated with the machine learning model for that system resource, a throughput associated with the machine learning model for that system resource, and an availability of that system resource for processing an inference job associated with the machine learning model, and wherein the plurality of system resources includes at least one central processing unit (CPU) and at least one graphics processing unit (GPU); receive a request for system resources for the inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; and assign the system resource to the inference job for processing the inference job, wherein the system resource assigned to the inference job executes the machine learning model associated with the inference job to process the inference job. 9. The computing system of claim 8 , wherein the one or more processors are further programmed and/or configured to: receive result data associated with processing of the inference job with the system resource; and update, based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model. 10. The computing system of claim 9 , wherein the result data includes at least one of the following: a latency associated with processing the inference job of the machine learning model with the system resource, a throughput associated with processing the inference job of the machine learning model with the system resource, or any combination thereof. 11. The computing system of claim 8 , wherein the quality of service requirement includes at least one of a latency requirement for the inference job associated with the machine learning model and a throughput requirement for the inference job associated with the machine learning model. 12. The computing system of claim 8 , wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine learning model includes assigning the inference job to one of a plurality of job queues based on the quality of service requirement associated with the inference job, wherein the plurality of job queues are associated with a plurality of different priorities, and wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine

Assignees

Inventors

Classifications

  • G06N5/04Primary

    Inference or reasoning models · CPC title

  • the resources being hardware resources other than CPUs, Servers and Terminals · CPC title

  • Machine learning · CPC title

  • G06F9/5027Primary

    the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Performance criteria · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11836642B2 cover?
A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a request for system resources for an inference job associated with the machine learning model; det…
Who is the assignee on this patent?
Visa Int Service Ass
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).