Distributed resource-aware task scheduling with replicated data placement in parallel database clusters

US10042886B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10042886-B2
Application numberUS-201514816681-A
CountryUS
Kind codeB2
Filing dateAug 3, 2015
Priority dateAug 3, 2015
Publication dateAug 7, 2018
Grant dateAug 7, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system, the system including a plurality of machines each having a processor and a main memory component; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to select, in response to a state of a query execution plan comprising a plurality of executable jobs for the set of data, which one of a set of scheduling algorithms to execute; an execution engine to execute the selected scheduling algorithm to determine, for each job in the plurality of jobs, which server to schedule to execute the respective job; and providing an indication of the scheduling of the servers determined to be schedules for the execution of the jobs.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a cluster of a plurality of machines logically grouped together, each of the plurality of machines having a processor and a main memory component; a distributed memory layer, the distributed memory layer comprising the main memory components of the plurality of machines in the cluster being connected to each other through a network layer and configured to transmit and receive data directly to and from each other, wherein the distributed memory layer comprising the main memory of the plurality of machines logically operates as a common memory layer for the plurality of machines and the plurality of machines use both the common memory layer and the shared distributed storage facility to execute the query execution plan; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to dynamically select, in response to a current state of the query execution plan comprising a plurality of executable jobs for the set of data and a current state and workload characteristics of the plurality of machines in the cluster, which one of a set of scheduling algorithms to execute, the set of scheduling algorithms each having a different predetermined objective; an execution engine to execute the dynamically selected dynamic task scheduling algorithm to determine, for each job in the plurality of jobs comprising the query execution plan, which machine in the cluster to schedule for execution of the job and data placement in the main memory component of the plurality of machines of the distributed memory layer and the shared distributed storage facility for the set of data to execute the respective job wherein the query execution plan is represented by directed acyclic graph defining dependencies between the plurality of jobs of the query execution plan; and providing an indication of the determined scheduling of the machines for the execution each of the plurality of jobs. 2. The system of claim 1 , wherein each of the plurality of machines can communicate directly with each of the other plurality of machines over the communication network. 3. The system of claim 1 , wherein the objective of the set of scheduling algorithms is selected from the group including: minimizing a total runtime to execute the query execution plan, minimizing a total data communication cost to execute the query execution plan, and minimizing a finish time of a last server to execute the query execution plan. 4. The system of claim 3 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan (i) considers inter-job dependencies between the plurality of jobs of the query execution plan, and (ii) arbitrarily orders new jobs or orders new jobs further based on an earliest release date and a latest critical time. 5. The system of claim 3 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan recursively schedules subsets of new jobs, jobs in each subset having an absence of dependencies therebetween. 6. The system of claim 5 , wherein each job has a determinable unique release date and the scheduling algorithm further orders new jobs to be scheduled in an ascending order based on the release date of the new jobs. 7. The system of claim 5 , wherein each job has a determinable start time and end time and the scheduling algorithm further orders a new job only when all immediate parent jobs for the new job have been scheduled. 8. A method comprising: storing a set of data in a shared distributed storage facility, the stored data being accessible by a cluster of a plurality of machines logically grouped together, over a communication network, each of the plurality of machines having a processor and a main memory component, the main memory components of the plurality of machines in the cluster being connected to each other through a network layer to comprise a distributed memory layer and are configured to transmit and receive data directly to and from each other, wherein the distributed memory layer comprising the main memory of the plurality of machines logically operates as a common memory layer for the plurality of machines and the plurality of machines use both the common memory layer and the shared distributed storage facility to execute the query execution plan; dynamically selecting by a controller, in response to a state of the current query execution plan comprising a plurality of executable jobs for the set of data and a current state and workload characteristics of the plurality of machines in the cluster, which one of a set of scheduling algorithms to execute, the set of scheduling algorithms each having a different predetermined objective; executing by an execution engine the dynamically selected dynamic task scheduling algorithm to determine, for each job in plurality of jobs comprising the query execution plan, which machine in the cluster to schedule and data placement in the main memory component of the plurality of machines of the distributed memory layer and the shared distributed storage facility for the set of data to execute the respective job, wherein the query execution plan is represented by directed acyclic graph defining dependencies between the plurality of jobs of the query execution plan; and providing an indication of the determined scheduling of the machines for the execution each of the plurality of jobs. 9. The method of claim 8 , further comprising each of the plurality of machines having the functionality to communicate directly with each of the other plurality of machines over the communication network. 10. The method of claim 8 , wherein the objective of the set of scheduling algorithms is selected from the group including: minimizing a total runtime to execute the query execution plan, minimizing a total data communication cost to execute the query execution plan, and minimizing a finish time of a last server to execute the query execution plan. 11. The method of claim 10 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan (i) considers inter-job dependencies between the plurality of jobs of the query execution plan, and (ii) arbitrarily orders new jobs or orders new jobs further based on an earliest release date and a latest critical time. 12. The method of claim 10 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan recursively schedules subsets of new jobs, jobs in each subset having an absence of dependencies therebetween. 13. The method of claim 12 , wherein each job has a determinable unique release date and the scheduling algorithm further orders new jobs to be scheduled in an ascending order based on the release date of the new jobs. 14. The method of claim 12 , wherein each job has a determinable start time and end time and the scheduling algorithm further orders a new job only when all immediate parent jobs for the new job have been scheduled.

Assignees

Inventors

Classifications

  • Plan optimisation · CPC title

  • based on a hash applied to IP addresses or costs · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10042886B2 cover?
A method and system, the system including a plurality of machines each having a processor and a main memory component; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to select, in response to a state of a query execution plan comprising a plurality of executable jobs for the set of data, which on…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/24542. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 07 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).