Methods and systems for transforming distributed database structure for reduced compute load
US-2024330289-A1 · Oct 3, 2024 · US
US10042886B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10042886-B2 |
| Application number | US-201514816681-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 3, 2015 |
| Priority date | Aug 3, 2015 |
| Publication date | Aug 7, 2018 |
| Grant date | Aug 7, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system, the system including a plurality of machines each having a processor and a main memory component; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to select, in response to a state of a query execution plan comprising a plurality of executable jobs for the set of data, which one of a set of scheduling algorithms to execute; an execution engine to execute the selected scheduling algorithm to determine, for each job in the plurality of jobs, which server to schedule to execute the respective job; and providing an indication of the scheduling of the servers determined to be schedules for the execution of the jobs.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a cluster of a plurality of machines logically grouped together, each of the plurality of machines having a processor and a main memory component; a distributed memory layer, the distributed memory layer comprising the main memory components of the plurality of machines in the cluster being connected to each other through a network layer and configured to transmit and receive data directly to and from each other, wherein the distributed memory layer comprising the main memory of the plurality of machines logically operates as a common memory layer for the plurality of machines and the plurality of machines use both the common memory layer and the shared distributed storage facility to execute the query execution plan; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to dynamically select, in response to a current state of the query execution plan comprising a plurality of executable jobs for the set of data and a current state and workload characteristics of the plurality of machines in the cluster, which one of a set of scheduling algorithms to execute, the set of scheduling algorithms each having a different predetermined objective; an execution engine to execute the dynamically selected dynamic task scheduling algorithm to determine, for each job in the plurality of jobs comprising the query execution plan, which machine in the cluster to schedule for execution of the job and data placement in the main memory component of the plurality of machines of the distributed memory layer and the shared distributed storage facility for the set of data to execute the respective job wherein the query execution plan is represented by directed acyclic graph defining dependencies between the plurality of jobs of the query execution plan; and providing an indication of the determined scheduling of the machines for the execution each of the plurality of jobs. 2. The system of claim 1 , wherein each of the plurality of machines can communicate directly with each of the other plurality of machines over the communication network. 3. The system of claim 1 , wherein the objective of the set of scheduling algorithms is selected from the group including: minimizing a total runtime to execute the query execution plan, minimizing a total data communication cost to execute the query execution plan, and minimizing a finish time of a last server to execute the query execution plan. 4. The system of claim 3 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan (i) considers inter-job dependencies between the plurality of jobs of the query execution plan, and (ii) arbitrarily orders new jobs or orders new jobs further based on an earliest release date and a latest critical time. 5. The system of claim 3 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan recursively schedules subsets of new jobs, jobs in each subset having an absence of dependencies therebetween. 6. The system of claim 5 , wherein each job has a determinable unique release date and the scheduling algorithm further orders new jobs to be scheduled in an ascending order based on the release date of the new jobs. 7. The system of claim 5 , wherein each job has a determinable start time and end time and the scheduling algorithm further orders a new job only when all immediate parent jobs for the new job have been scheduled. 8. A method comprising: storing a set of data in a shared distributed storage facility, the stored data being accessible by a cluster of a plurality of machines logically grouped together, over a communication network, each of the plurality of machines having a processor and a main memory component, the main memory components of the plurality of machines in the cluster being connected to each other through a network layer to comprise a distributed memory layer and are configured to transmit and receive data directly to and from each other, wherein the distributed memory layer comprising the main memory of the plurality of machines logically operates as a common memory layer for the plurality of machines and the plurality of machines use both the common memory layer and the shared distributed storage facility to execute the query execution plan; dynamically selecting by a controller, in response to a state of the current query execution plan comprising a plurality of executable jobs for the set of data and a current state and workload characteristics of the plurality of machines in the cluster, which one of a set of scheduling algorithms to execute, the set of scheduling algorithms each having a different predetermined objective; executing by an execution engine the dynamically selected dynamic task scheduling algorithm to determine, for each job in plurality of jobs comprising the query execution plan, which machine in the cluster to schedule and data placement in the main memory component of the plurality of machines of the distributed memory layer and the shared distributed storage facility for the set of data to execute the respective job, wherein the query execution plan is represented by directed acyclic graph defining dependencies between the plurality of jobs of the query execution plan; and providing an indication of the determined scheduling of the machines for the execution each of the plurality of jobs. 9. The method of claim 8 , further comprising each of the plurality of machines having the functionality to communicate directly with each of the other plurality of machines over the communication network. 10. The method of claim 8 , wherein the objective of the set of scheduling algorithms is selected from the group including: minimizing a total runtime to execute the query execution plan, minimizing a total data communication cost to execute the query execution plan, and minimizing a finish time of a last server to execute the query execution plan. 11. The method of claim 10 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan (i) considers inter-job dependencies between the plurality of jobs of the query execution plan, and (ii) arbitrarily orders new jobs or orders new jobs further based on an earliest release date and a latest critical time. 12. The method of claim 10 , wherein a scheduling algorithm having the predetermined objective of minimizing a finish time of a last server to execute the query execution plan recursively schedules subsets of new jobs, jobs in each subset having an absence of dependencies therebetween. 13. The method of claim 12 , wherein each job has a determinable unique release date and the scheduling algorithm further orders new jobs to be scheduled in an ascending order based on the release date of the new jobs. 14. The method of claim 12 , wherein each job has a determinable start time and end time and the scheduling algorithm further orders a new job only when all immediate parent jobs for the new job have been scheduled.
Plan optimisation · CPC title
based on a hash applied to IP addresses or costs · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.