Scheduling homogeneous and heterogeneous workloads with runtime elasticity in a parallel processing environment

US9645848B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9645848-B2
Application numberUS-201313897796-A
CountryUS
Kind codeB2
Filing dateMay 20, 2013
Priority dateMay 20, 2013
Publication dateMay 9, 2017
Grant dateMay 9, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for scheduling homogeneous workloads including batch jobs, and heterogeneous workloads including batch and dedicated jobs, with run-time elasticity wherein resource requirements for a given job can change during run-time execution of the job.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for scheduling jobs in a HPC (high-performance computing) system, comprising: maintaining a batch jobs queue comprising batch jobs, wherein each batch job in the batch jobs queue has a plurality of parameters associated therewith, the parameters comprising a first parameter that denotes a number of processors of the HPC system that are required to execute the batch job, a second parameter that denotes a user-estimated execution time of the batch job, and a third parameter that specifies a number of scheduling cycles that the batch job was skipped and not scheduled; initiating a scheduling cycle in response to a triggering event; and performing a scheduling process as part of the scheduling cycle to schedule one or more batch jobs in the batch jobs queue for execution by the HPC system, wherein performing the scheduling process comprises: determining a number m of available processors in the HPC system; determining the first parameter and the third parameter of a head batch job in the batch jobs queue; determining if the first parameter of the head batch job is less than or equal to m; determining if the third parameter of the head batch job is greater than or equal to a threshold value; and when the first parameter of the head batch job is determined to be less than or equal to m and the third parameter of the head batch job is determined to be greater than or equal to the threshold value, then: removing the head batch job from the batch jobs queue; and scheduling the head batch job for execution in the HPC system; and when the first parameter of the head batch job is determined to be not less than or equal to m, then: making a reservation time for executing the head batch job at a future time based on a fourth parameter of each active job being executed in the HPC system, wherein the fourth parameter of a given active job denotes a remaining execution time of the given active job; and selecting a set of one or more batch jobs in the batch jobs queue which can be scheduled for execution before the reservation time of the head batch job; wherein making the reservation time for executing the head batch job comprises: accessing a list of active jobs in which all active jobs executing in the HPC system are sorted starting from an active job with a smallest fourth parameter to an active job with a largest fourth parameter; utilizing the list of active jobs to determine a set of active jobs, starting from the active job with the smallest fourth parameter, which will result in a sufficient amount of available processors for the head batch job when execution of each active job in the set of active jobs is finished; computing a first value by adding a fourth parameter of an active job in the set of active jobs which has a greatest fourth parameter to a current time; computing a second value as a sum of m plus a total of each first parameter of each active job in the set of active jobs, less the first parameter for the head batch job; for each batch job in the batch jobs queue with a first parameter that is less than or equal to m, computing a third value which represents a number of processors of the HPC system that are required by the batch job at the computed first value; and making a reservation time for executing the head batch job based on the computed second value and the computed third value of each batch job. 2. The method of claim 1 , wherein performing the scheduling process further comprises: when the first parameter of the head batch job is determined to be less than or equal to m and the third parameter of the head batch job is determined to be not greater than or equal to the threshold value, then: selecting, based on the first parameter of each batch job in the batch jobs queue, a set of one or more batch jobs in the batch jobs queue which can be scheduled for execution to maximize a number of processors of the HPC system which are used for processing the batch jobs; and increasing the third parameter of the head batch job by one, when the head batch job is not in the selected set of one or more batch jobs. 3. The method of claim 2 , wherein the third parameter of the head batch job is not increased by one when the head batch job is included in the selected set of batch jobs. 4. The method of claim 1 , wherein the third value of a given batch job is set equal to 0 when a current time plus the second parameter of the given batch job is less than the first value, otherwise the third value of a given batch job is set equal to the first parameter of the given batch job. 5. The method of claim 1 , wherein the triggering event comprises an arrival of a new batch job in the batch jobs queue or termination of an executing batch job in the HPC system. 6. The method of claim 1 , wherein the triggering event comprises an arrival of a command that triggers a change in a second parameter of a batch job that is pending in the batch jobs queue or an active batch job that is executing in the HPC system. 7. A method for scheduling jobs in a HPC (high-performance computing) system, comprising: maintaining a batch jobs queue comprising batch jobs, wherein each batch job in the batch jobs queue has a plurality of parameters associated therewith, the parameters comprising a first parameter that denotes a number of processors of the HPC system that are required to execute the batch job, a second parameter that denotes a user-estimated execution time of the batch job-and a third parameter that specifies a number of scheduling cycles that the batch job was skipped and not scheduled; maintaining a dedicated jobs queue comprising dedicated jobs, wherein each dedicated job in the dedicated jobs queue has a plurality of parameters associated therewith, the parameters comprising a first parameter that denotes a number of processors of the HPC system that are required to execute the dedicated job, a second parameter that denotes a user-estimated execution time of the dedicated job, and a third parameter that denotes a user-requested start time of the dedicated job; initiating a scheduling cycle in response to a triggering event; and performing a scheduling process as part of the scheduling cycle to schedule one or more batch jobs in the batch jobs queue and one or more dedicated jobs in the dedicated jobs queue for execution by the HPC system, wherein performing the scheduling process comprises: determining a number m of available processors in the HPC system; when the number m of available processors in the HPC system is greater than 0, and when the batch jobs queue and the dedicated jobs queue are not empty, and when the third parameter of the head batch job in the batch jobs queue is not greater than or equal to a threshold value, then: determining if a third parameter of a head dedicated job in the dedicated jobs queue is less than or equal to a current time; and moving the head dedicated job from the dedicated jobs queue to a head position in the batch jobs queue, when the third parameter of the head dedicated job in the dedicated jobs queue is determined to be less than or equal to the current time; when the third parameter of the head dedicated job in the dedicated jobs queue is determined to be not less than or equal to the current time, then: setting a first value of the head dedicated job equal to the third parameter of the head dedicated job; determining if the third parameter of the head dedicated job in the dedicated jobs queue is less than or equal to the current time plus a remaining execution time of an active job having a greatest remaining execution time; when the third parameter of the head dedicated job in the dedicated jobs queue is determined to be not less than or equal to the current time plus a remaining e

Assignees

Inventors

Classifications

  • considering the load · CPC title

  • Multiproc · CPC title

  • by program, e.g. task dispatcher, supervisor, operating system · CPC title

  • Multiprogramming arrangements · CPC title

  • involving deadlines, e.g. rate based, periodic · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9645848B2 cover?
Systems and methods are provided for scheduling homogeneous workloads including batch jobs, and heterogeneous workloads including batch and dedicated jobs, with run-time elasticity wherein resource requirements for a given job can change during run-time execution of the job.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).