Scheduled network communication for efficient re-partitioning of data

US2016337442A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016337442-A1
Application numberUS-201514711617-A
CountryUS
Kind codeA1
Filing dateMay 13, 2015
Priority dateMay 13, 2015
Publication dateNov 17, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, and system for efficiently re-partitioning data using scheduled network communication are provided. Given re-partitioning data defining the data blocks to be sent amongst a plurality of server nodes, a corresponding network schedule is determined to send the data blocks in a coordinated manner. The network schedule is divided into time slots, wherein each of the plurality of server nodes can send up to one data block and receive up to one data block in each time slot. By using a greedy selection algorithm that prioritizes by largest senders and largest receivers, a near optimal schedule can be determined even in the presence of heavy skew. The greedy selection algorithm can be implemented with a O(T*N̂2) time complexity, enabling scaling to large multi-node clusters with many server nodes. The network schedule is of particular interest for database execution plans requiring re-partitioning on operators with different keys.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving re-partitioning data describing, for each of a plurality of server nodes, a quantity of data blocks to be sent to each of the plurality of server nodes; determining a sender order for the plurality of server nodes by using the re-partitioning data to sort, in descending order, the plurality of server nodes by a total quantity of data blocks to be sent; populating, in the sender order, a network schedule comprising a plurality of time slots, wherein each of the plurality of time slots specifies, for each of the plurality of server nodes as a sender node, at most one receiver node, of the plurality of server nodes, to send a data block over a network, wherein each of the plurality of time slots specifies a particular node no more than once, and wherein the at most one receiver node is specified based at least on having a largest possible quantity of data blocks to be received by the sender node according to the re-partitioning data; causing the plurality of server nodes to re-partition according to the network schedule; wherein the method is performed by one or more computing devices. 2 . The method of claim 1 , wherein the at most one receiver node is specified by traversing an ordered list of receivers for the sender node, wherein the ordered list of receivers is maintained in descending order of data blocks to be received by the sender node. 3 . The method of claim 2 , wherein the traversing uses a bitmask that is maintained for each of the plurality of time slots to ensure that each of the plurality of time slots specifies the particular node no more than once. 4 . The method of claim 1 , wherein the at most one receiver node is specified based on populating contiguous time slots with the at most one receiver node. 5 . The method of claim 1 , wherein the re-partitioning data is for transitioning from a first database operator on a first key to a second database operator on a second key. 6 . The method of claim 1 , wherein the one or more computing devices comprises a coordinator server. 7 . The method of claim 1 , wherein a size of each of the quantity of data blocks and a length of each of the plurality of time slots are based on the network. 8 . A non-transitory computer-readable medium storing one or more sequences of instructions which, when executed by one or more processors, cause performing of: receiving re-partitioning data describing, for each of a plurality of server nodes, a quantity of data blocks to be sent to each of the plurality of server nodes; determining a sender order for the plurality of server nodes by using the re-partitioning data to sort, in descending order, the plurality of server nodes by a total quantity of data blocks to be sent; populating, in the sender order, a network schedule comprising a plurality of time slots, wherein each of the plurality of time slots specifies, for each of the plurality of server nodes as a sender node, at most one receiver node, of the plurality of server nodes, to send a data block over a network, wherein each of the plurality of time slots specifies a particular node no more than once, and wherein the at most one receiver node is specified based at least on having a largest possible quantity of data blocks to be received by the sender node according to the re-partitioning data; causing the plurality of server nodes to re-partition according to the network schedule. 9 . The non-transitory computer-readable medium of claim 8 , wherein the at most one receiver node is specified by traversing an ordered list of receivers for the sender node, wherein the ordered list of receivers is maintained in descending order of data blocks to be received by the sender node. 10 . The non-transitory computer-readable medium of claim 9 , wherein the traversing uses a bitmask that is maintained for each of the plurality of time slots to ensure that each of the plurality of time slots specifies the particular node no more than once. 11 . The non-transitory computer-readable medium of claim 8 , wherein the at most one receiver node is specified based on populating contiguous time slots with the at most one receiver node. 12 . The non-transitory computer-readable medium of claim 8 , wherein the re-partitioning data is for transitioning from a first database operator on a first key to a second database operator on a second key. 13 . The non-transitory computer-readable medium of claim 8 , wherein a size of each of the quantity of data blocks and a length of each of the plurality of time slots are based on the network. 14 . A system comprising one or more computing devices configured to: receive re-partitioning data describing, for each of a plurality of server nodes, a quantity of data blocks to be sent to each of the plurality of server nodes; determine a sender order for the plurality of server nodes by using the re-partitioning data to sort, in descending order, the plurality of server nodes by a total quantity of data blocks to be sent; populate, in the sender order, a network schedule comprising a plurality of time slots, wherein each of the plurality of time slots specifies, for each of the plurality of server nodes as a sender node, at most one receiver node, of the plurality of server nodes, to send a data block over a network, wherein each of the plurality of time slots specifies a particular node no more than once, and wherein the at most one receiver node is specified based at least on having a largest possible quantity of data blocks to be received by the sender node according to the re-partitioning data; cause the plurality of server nodes to re-partition according to the network schedule. 15 . The system of claim 14 , wherein the at most one receiver node is specified by traversing an ordered list of receivers for the sender node, wherein the ordered list of receivers is maintained in descending order of data blocks to be received by the sender node. 16 . The system of claim 15 , wherein the traversing uses a bitmask that is maintained for each of the plurality of time slots to ensure that each of the plurality of time slots specifies the particular node no more than once. 17 . The system of claim 14 , wherein the at most one receiver node is specified based on populating contiguous time slots with the at most one receiver node. 18 . The system of claim 14 , wherein the re-partitioning data is for transitioning from a first database operator on a first key to a second database operator on a second key. 19 . The system of claim 14 , wherein the one or more computing devices comprises a coordinator server. 20 . The system of claim 14 , wherein a size of each of the quantity of data blocks and a length of each of the plurality of time slots are based on the network.

Assignees

Inventors

Classifications

  • Distributed queries · CPC title

  • involving priority mechanisms (hybrid switching fabrics H04L12/6402; intermediate storage or scheduling H04L49/90; time-division multiplex systems H04J3/00) · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Allocation of resources per group of connections, e.g. per group of users · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016337442A1 cover?
A method, apparatus, and system for efficiently re-partitioning data using scheduled network communication are provided. Given re-partitioning data defining the data blocks to be sent amongst a plurality of server nodes, a corresponding network schedule is determined to send the data blocks in a coordinated manner. The network schedule is divided into time slots, wherein each of the plurality o…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification H04L12/40143. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Nov 17 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).