System and method for scheduling jobs in distributed datacenters

US10108458B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10108458-B2
Application numberUS-201715444797-A
CountryUS
Kind codeB2
Filing dateFeb 28, 2017
Priority dateFeb 28, 2017
Publication dateOct 23, 2018
Grant dateOct 23, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for scheduling jobs in a distributed computing environment include: obtaining a set of task identifiers, each task identifier identifying a corresponding data processing task included in one of a plurality of jobs to be scheduled for execution at one of a plurality of data processing locations; and selecting and scheduling a data processing task of the identified job having a longest optimal completion time to the data processing location corresponding to the optimal completion time of the selected data processing task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for scheduling jobs in a distributed computing environment, the method comprising: obtaining a set of task identifiers, each task identifier identifying a corresponding data processing task included in one of a plurality of jobs to be scheduled for execution; for each unscheduled data processing task, determining input data transfer times to transfer input data for the unscheduled data processing task to each of a plurality of data processing locations; and for each of the plurality of data processing locations, determining a task completion time for the unscheduled data processing task based on the corresponding input data transfer time; from the jobs having unscheduled data processing tasks, selecting a job having a longest job completion time based on a shortest task completion time for the data processing tasks included in the selected job; selecting, from the unscheduled data processing tasks for the selected job, a data processing task having a longest task completion time based on the shortest task completion times for the unscheduled data processing tasks for the selected job; and scheduling the selected data processing task for execution at the data processing location corresponding to the selected data processing task's shortest completion time and having available processing resources. 2. The method of claim 1 , comprising: adjusting the completion times of all other unscheduled data processing tasks included in the selected job to have a completion time equal to the maximum of: the optimal completion time of the selected data processing task, and the completion time of the respective unscheduled data processing task. 3. The method of claim 1 , comprising: until each data processing task identified in the set of task identifiers has been scheduled, repeating: updating the available processing resources to identify the resources to be consumed by the previously scheduled data processing task as unavailable; selecting the job of the set of jobs having unscheduled tasks and having the longest job completion time based on the shortest task completion time for the data processing tasks included in the selected job; selecting, from the unscheduled tasks for the selected job, the data processing task having the longest task completion time based on shortest task completion times for the unscheduled data processing tasks for the selected job; and scheduling the selected data processing task at the data processing location corresponding to the selected data processing task's shortest completion time. 4. The method of claim 1 , wherein determining the input data transfer times for a single data processing task identified in the set of task identifiers comprises: identifying a size and location of each input data for the single data processing task; determining a communication bandwidth between each input data location and each of the data processing locations; and for each of the data processing locations: determining the transfer time for each of the input data to the data processing location based on the size of the input data and the communication input data location and the data processing location; and selecting, from the transfer times for each of the input data to the data processing location, a largest transfer time as the input data transfer time for the single data processing task at the data processing location. 5. The method of claim 1 , comprising: for each of the data processing tasks identified in the set of task identifiers, determining an execution time for the data processing task; and for each of the plurality of data processing locations having available processing resources, determining the completion time for the data processing task at the data processing location based on the corresponding input data transfer time and the corresponding execution time. 6. The method of claim 5 , wherein identifying the execution time for the data processing task comprises: identifying a type of the data processing task; based on a database of execution data for types of data processing tasks, determining the execution time for the data processing task based on the identified type, and a size of the input data for the data processing task. 7. The method of claim 1 , comprising: populating one or more data structures representing completion time objectives and constraints based on the completion times for each of the data processing tasks, data processing location assignments parameters, and the available resources; wherein selecting the job having the longest optimal completion time includes solving a linear programming problem defined by the matrices. 8. The method of claim 1 , wherein obtaining the set of task identifiers comprises: receiving job execution requests until a scheduling trigger is detected; and upon detection of the scheduling trigger, identifying the set of data processing tasks to be scheduled from the job execution requests. 9. The method of claim 8 , wherein the scheduling trigger is detected when a defined time period has elapsed, or when a number of received jobs execution requests meets a defined threshold. 10. The method of claim 1 , wherein the data processing tasks identified in the set of task identifiers can be executed in parallel. 11. The method of claim 1 , wherein selecting the job having the longest job completion time is based on the shortest task completion times at data processing locations having available resources; and wherein selecting the data processing task having the longest task completion time is based on the shortest task completion times for the unscheduled data processing tasks for the selected job at data processing locations having available resources. 12. A system comprising: at least one processor for scheduling jobs in a distributed computing environment, the at least one processor configured for: obtaining a set of task identifiers, each task identifier identifying a corresponding data processing task included in one of a plurality of jobs to be scheduled for execution at one of a plurality of data processing locations; from the jobs having unscheduled data processing tasks, selecting a job having a longest job completion time based on a shortest task completion time for the data processing tasks included in the selected job; selecting, from unscheduled data processing tasks for the selected job, a data processing task having a longest task completion time based on shortest task completion times for the unscheduled data processing tasks for the selected job; and scheduling the selected data processing task for execution at the data processing location of the plurality of data processing locations corresponding to the selected data processing task's shortest completion time and having available processing resources. 13. The system of claim 12 , wherein the at least one processor is configured for: adjusting the completion times of all other unscheduled data processing tasks included in the selected job to have a completion time equal to the maximum of: the optimal completion time of the selected data processing task, and the completion time of the respective unscheduled data processing task. 14. The system of claim 12 , wherein the at least one processor is configured for: until each data processing task identified in the set of task identifiers has been scheduled, repeating: updating the available processing resources to identify the resources to be consumed by the previously scheduled data processing task as unavailable; selecting the job of the set of jobs having unscheduled tasks and having the longest job comp

Assignees

Inventors

Classifications

  • Techniques for rebalancing the load in a distributed system · CPC title

  • G06F9/5038Primary

    considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title

  • involving deadlines, e.g. rate based, periodic · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10108458B2 cover?
Methods and systems for scheduling jobs in a distributed computing environment include: obtaining a set of task identifiers, each task identifier identifying a corresponding data processing task included in one of a plurality of jobs to be scheduled for execution at one of a plurality of data processing locations; and selecting and scheduling a data processing task of the identified job having …
Who is the assignee on this patent?
Huawei Tech Canada Co Ltd, Governing Council Univ Toronto
What technology area does this patent fall under?
Primary CPC classification G06F9/5038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 23 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).