Communication system
US-2017170951-A1 · Jun 15, 2017 · US
US2016337442A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016337442-A1 |
| Application number | US-201514711617-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 13, 2015 |
| Priority date | May 13, 2015 |
| Publication date | Nov 17, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, apparatus, and system for efficiently re-partitioning data using scheduled network communication are provided. Given re-partitioning data defining the data blocks to be sent amongst a plurality of server nodes, a corresponding network schedule is determined to send the data blocks in a coordinated manner. The network schedule is divided into time slots, wherein each of the plurality of server nodes can send up to one data block and receive up to one data block in each time slot. By using a greedy selection algorithm that prioritizes by largest senders and largest receivers, a near optimal schedule can be determined even in the presence of heavy skew. The greedy selection algorithm can be implemented with a O(T*N̂2) time complexity, enabling scaling to large multi-node clusters with many server nodes. The network schedule is of particular interest for database execution plans requiring re-partitioning on operators with different keys.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving re-partitioning data describing, for each of a plurality of server nodes, a quantity of data blocks to be sent to each of the plurality of server nodes; determining a sender order for the plurality of server nodes by using the re-partitioning data to sort, in descending order, the plurality of server nodes by a total quantity of data blocks to be sent; populating, in the sender order, a network schedule comprising a plurality of time slots, wherein each of the plurality of time slots specifies, for each of the plurality of server nodes as a sender node, at most one receiver node, of the plurality of server nodes, to send a data block over a network, wherein each of the plurality of time slots specifies a particular node no more than once, and wherein the at most one receiver node is specified based at least on having a largest possible quantity of data blocks to be received by the sender node according to the re-partitioning data; causing the plurality of server nodes to re-partition according to the network schedule; wherein the method is performed by one or more computing devices. 2 . The method of claim 1 , wherein the at most one receiver node is specified by traversing an ordered list of receivers for the sender node, wherein the ordered list of receivers is maintained in descending order of data blocks to be received by the sender node. 3 . The method of claim 2 , wherein the traversing uses a bitmask that is maintained for each of the plurality of time slots to ensure that each of the plurality of time slots specifies the particular node no more than once. 4 . The method of claim 1 , wherein the at most one receiver node is specified based on populating contiguous time slots with the at most one receiver node. 5 . The method of claim 1 , wherein the re-partitioning data is for transitioning from a first database operator on a first key to a second database operator on a second key. 6 . The method of claim 1 , wherein the one or more computing devices comprises a coordinator server. 7 . The method of claim 1 , wherein a size of each of the quantity of data blocks and a length of each of the plurality of time slots are based on the network. 8 . A non-transitory computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause performing of: receiving re-partitioning data describing, for each of a plurality of server nodes, a quantity of data blocks to be sent to each of the plurality of server nodes; determining a sender order for the plurality of server nodes by using the re-partitioning data to sort, in descending order, the plurality of server nodes by a total quantity of data blocks to be sent; populating, in the sender order, a network schedule comprising a plurality of time slots, wherein each of the plurality of time slots specifies, for each of the plurality of server nodes as a sender node, at most one receiver node, of the plurality of server nodes, to send a data block over a network, wherein each of the plurality of time slots specifies a particular node no more than once, and wherein the at most one receiver node is specified based at least on having a largest possible quantity of data blocks to be received by the sender node according to the re-partitioning data; causing the plurality of server nodes to re-partition according to the network schedule. 9 . The non-transitory computer-readable medium of claim 8 , wherein the at most one receiver node is specified by traversing an ordered list of receivers for the sender node, wherein the ordered list of receivers is maintained in descending order of data blocks to be received by the sender node. 10 . The non-transitory computer-readable medium of claim 9 , wherein the traversing uses a bitmask that is maintained for each of the plurality of time slots to ensure that each of the plurality of time slots specifies the particular node no more than once. 11 . The non-transitory computer-readable medium of claim 8 , wherein the at most one receiver node is specified based on populating contiguous time slots with the at most one receiver node. 12 . The non-transitory computer-readable medium of claim 8 , wherein the re-partitioning data is for transitioning from a first database operator on a first key to a second database operator on a second key. 13 . The non-transitory computer-readable medium of claim 8 , wherein a size of each of the quantity of data blocks and a length of each of the plurality of time slots are based on the network. 14 . A system comprising one or more computing devices configured to: receive re-partitioning data describing, for each of a plurality of server nodes, a quantity of data blocks to be sent to each of the plurality of server nodes; determine a sender order for the plurality of server nodes by using the re-partitioning data to sort, in descending order, the plurality of server nodes by a total quantity of data blocks to be sent; populate, in the sender order, a network schedule comprising a plurality of time slots, wherein each of the plurality of time slots specifies, for each of the plurality of server nodes as a sender node, at most one receiver node, of the plurality of server nodes, to send a data block over a network, wherein each of the plurality of time slots specifies a particular node no more than once, and wherein the at most one receiver node is specified based at least on having a largest possible quantity of data blocks to be received by the sender node according to the re-partitioning data; cause the plurality of server nodes to re-partition according to the network schedule. 15 . The system of claim 14 , wherein the at most one receiver node is specified by traversing an ordered list of receivers for the sender node, wherein the ordered list of receivers is maintained in descending order of data blocks to be received by the sender node. 16 . The system of claim 15 , wherein the traversing uses a bitmask that is maintained for each of the plurality of time slots to ensure that each of the plurality of time slots specifies the particular node no more than once. 17 . The system of claim 14 , wherein the at most one receiver node is specified based on populating contiguous time slots with the at most one receiver node. 18 . The system of claim 14 , wherein the re-partitioning data is for transitioning from a first database operator on a first key to a second database operator on a second key. 19 . The system of claim 14 , wherein the one or more computing devices comprises a coordinator server. 20 . The system of claim 14 , wherein a size of each of the quantity of data blocks and a length of each of the plurality of time slots are based on the network.
Distributed queries · CPC title
involving priority mechanisms (hybrid switching fabrics H04L12/6402; intermediate storage or scheduling H04L49/90; time-division multiplex systems H04J3/00) · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
Allocation of resources per group of connections, e.g. per group of users · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.