Dynamically controlled distributed workload execution
US-10127080-B2 · Nov 13, 2018 · US
US11836642B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11836642-B2 |
| Application number | US-202218088193-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2022 |
| Priority date | Jan 17, 2020 |
| Publication date | Dec 5, 2023 |
| Grant date | Dec 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a request for system resources for an inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; assign the system resource to the inference job for processing the inference job; receive result data associated with processing of the inference job with the system resource; and update based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving or determining, with at least one processor, a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model, wherein each performance profile for each system resource includes a latency associated with the machine learning model for that system resource, a throughput associated with the machine learning model for that system resource, and an availability of that system resource for processing an inference job associated with the machine learning model, and wherein the plurality of system resources includes at least one central processing unit (CPU) and at least one graphics processing unit (GPU); receiving, with at least one processor, a request for system resources for the inference job associated with the machine learning model; determining, with at least one processor, a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; and assigning, with at least one processor, the system resource to the inference job for processing the inference job, wherein the system resource assigned to the inference job executes the machine learning model associated with the inference job to process the inference job. 2. The computer-implemented method of claim 1 , further comprising: receiving, with at least one processor, result data associated with processing of the inference job with the system resource; and updating, with at least one processor, based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model. 3. The computer-implemented method of claim 2 , wherein the result data includes at least one of the following: a latency associated with processing the inference job of the machine learning model with the system resource, a throughput associated with processing the inference job of the machine learning model with the system resource, or any combination thereof. 4. The computer-implemented method of claim 1 , wherein the quality of service requirement includes at least one of a latency requirement for the inference job associated with the machine learning model and a throughput requirement for the inference job associated with the machine learning model. 5. The computer-implemented method of claim 1 , wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine learning model includes assigning the inference job to one of a plurality of job queues based on the quality of service requirement associated with the inference job, wherein the plurality of job queues are associated with a plurality of different priorities, and wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine learning model is further based on a priority of a job queue to which the inference job is assigned. 6. The computer-implemented method of claim 2 , further comprising: receiving or determining, with at least one processor, a plurality of further performance profiles associated with the plurality of system resources, wherein each further performance profile is associated with a further machine learning model different than the machine learning model; receiving, with at least one processor, a further request for system resources for a further inference job associated with the further machine learning model; determining, with at least one processor, a further system resource of the plurality of system resources for processing the further inference job associated with the further machine learning model based on the plurality of further performance profiles and a further quality of service requirement associated with the further inference job; assigning, with at least one processor, the further system resource to the inference job for processing the inference job; receiving, with at least one processor, further result data associated with processing of the further inference job with the further system resource; and updating, with at least one processor, based on the further result data, a further performance profile of the plurality of the performance profiles associated with the system resource and the further machine learning model. 7. The computer-implemented method of claim 6 , wherein the system resource and the further system resource include a same system resource. 8. A computing system, comprising: one or more processors programmed and/or configured to: receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model, wherein each performance profile for each system resource includes a latency associated with the machine learning model for that system resource, a throughput associated with the machine learning model for that system resource, and an availability of that system resource for processing an inference job associated with the machine learning model, and wherein the plurality of system resources includes at least one central processing unit (CPU) and at least one graphics processing unit (GPU); receive a request for system resources for the inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; and assign the system resource to the inference job for processing the inference job, wherein the system resource assigned to the inference job executes the machine learning model associated with the inference job to process the inference job. 9. The computing system of claim 8 , wherein the one or more processors are further programmed and/or configured to: receive result data associated with processing of the inference job with the system resource; and update, based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model. 10. The computing system of claim 9 , wherein the result data includes at least one of the following: a latency associated with processing the inference job of the machine learning model with the system resource, a throughput associated with processing the inference job of the machine learning model with the system resource, or any combination thereof. 11. The computing system of claim 8 , wherein the quality of service requirement includes at least one of a latency requirement for the inference job associated with the machine learning model and a throughput requirement for the inference job associated with the machine learning model. 12. The computing system of claim 8 , wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine learning model includes assigning the inference job to one of a plurality of job queues based on the quality of service requirement associated with the inference job, wherein the plurality of job queues are associated with a plurality of different priorities, and wherein determining the system resource of the plurality of system resources for processing the inference job associated with the machine
Inference or reasoning models · CPC title
the resources being hardware resources other than CPUs, Servers and Terminals · CPC title
Machine learning · CPC title
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Performance criteria · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.