System and Method for Massively Parallel Processing Database
US-2016171072-A1 · Jun 16, 2016 · US
US9959332B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9959332-B2 |
| Application number | US-201514601679-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 21, 2015 |
| Priority date | Jan 21, 2015 |
| Publication date | May 1, 2018 |
| Grant date | May 1, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a method includes determining a number of initial servers in a massively parallel processing (MPP) database cluster and determining an initial bucket configuration of the MPP database cluster, where the initial bucket configuration has a number of initial buckets. The method also includes adding a number of additional servers to the MPP database cluster to produce a number of updated servers, where the updated servers include the initial servers and the additional servers and creating an updated bucket configuration in accordance with the number of initial servers, the initial bucket configuration, and the number of additional servers, where the updated bucket configuration has a number of updated buckets. Additionally, the method includes redistributing data of the MPP cluster in accordance with the updated bucket configuration.
Opening claim text (preview).
What is claimed is: 1. A method comprising: determining a quantity of initial servers in a massively parallel processing (MPP) database cluster; determining a configuration of initial buckets of the MPP database cluster, wherein the configuration of initial buckets comprises a quantity of initial buckets; adding at least one additional server to the MPP database cluster to produce updated servers, wherein the updated servers comprise the initial servers and the at least one additional server; creating a configuration of updated buckets comprising the initial buckets in accordance with the quantity of initial servers, the configuration of initial buckets, and a quantity of additional servers, wherein the configuration of updated buckets identifies a subset of buckets of the initial buckets, with the subset of buckets being transmitted to the at least one additional server from the initial servers; and redistributing, based on the configuration of updated buckets, data from the initial servers to the at least one additional server, with the data being associated with the subset of buckets. 2. The method of claim 1 , wherein the configuration of updated buckets comprises a mapping of buckets to the updated servers, wherein each of the updated servers has either a minimum number of buckets or a maximum number of buckets, wherein the maximum number of buckets is one bucket more than the minimum number of buckets. 3. The method of claim 1 , wherein creating the configuration of updated buckets comprises determining whether the quantity of updated buckets is greater than the quantity of initial buckets. 4. The method of claim 3 , wherein determining whether the quantity of updated buckets is greater than the quantity of initial buckets comprises: determining a redistributed bucket configuration having the quantity of updated buckets and the quantity of initial buckets; determining a percentage load variation of the redistributed bucket configuration; determining whether the percentage load variation is acceptable; setting the quantity of updated buckets to be greater than the quantity of initial buckets when the percentage load variation is not acceptable; and setting the quantity of updated buckets to the quantity of initial buckets when the percentage load variation is acceptable. 5. The method of claim 1 , wherein the quantity of updated buckets is a power of two of the quantity of initial buckets. 6. The method of claim 1 , wherein the quantity of updated buckets is two times the quantity of initial buckets. 7. The method of claim 1 , wherein the quantity of initial buckets is a power of two. 8. The method of claim 1 , wherein redistributing data in a subset of buckets from the initial servers to the at least one additional server comprises: determining an initial bucket-server mapping; determining the subset of buckets to be moved from the initial servers to the at least one additional server; determining data associated with the subset of buckets to produce a subset of data; and redistributing the data associated with the subset of buckets from the initial servers to the at least one additional server. 9. The method of claim 1 , further comprising placing data on the initial servers in accordance with the configuration of initial buckets before adding the at least one additional server to the MPP database cluster. 10. The method of claim 9 , wherein placing the data on the initial servers comprises: determining a hash value for a row of a table; and determining a bucket associated with the row in accordance with the hash value. 11. A method comprising: determining an updated bucket-server mapping for a massively parallel processor (MPP) database cluster in accordance with a quantity of initial servers and a quantity of additional servers, wherein the updated bucket-server mapping identifies a subset of buckets of the initial buckets, with the subset of buckets being transmitted to the at least one additional server from the initial servers; determining whether a first table is to be redistributed in accordance with the updated bucket-server mapping and an initial bucket-server mapping; starting a first transaction when the first table is to be redistributed; performing the first transaction comprising redistributing data from an initial server of the initial servers to the additional servers, with the data being associated with the subset of buckets; and committing the first transaction after performing the first transaction. 12. The method of claim 11 , wherein performing the transaction further comprises: creating a temporary table in accordance with the updated bucket-server mapping; redistributing the data in accordance with the updated bucket-server mapping; and merging the temporary table and the first table after redistributing the data. 13. The method of claim 12 , wherein redistributing the data comprises: building delete statements in accordance with a difference between the updated bucket-server mapping and the initial bucket-server mapping; and issuing the delete statements for deleting records from the initial bucket-server mapping. 14. The method of claim 12 , wherein redistributing the data comprises: building insert statements in accordance with a difference between the updated bucket-server mapping and the initial bucket-server mapping; and issuing the insert statements for insert records which is deleted from the initial bucket-server mapping to the updated bucket-server mapping. 15. The method of claim 11 , further comprising: determining whether a second table is to be redistributed after committing the first transaction; starting a second transaction when the second table is to be redistributed; and removing the initial bucket-server mapping when the second table is not to be redistributed. 16. The method of claim 11 , further comprising creating a list of tables to be redistributed, wherein determining whether the first table is to be redistributed comprises determining whether the first table is to be redistributed in accordance with the list of tables. 17. The method of claim 11 , further comprising installing the additional servers before determining whether the first table is to be redistributed. 18. A computer comprising: a processor; and a non-transitory computer readable storage medium storing programming for execution by the processor, the programming including instructions to: determine a quantity of initial servers in a massively parallel processing (MPP) database cluster, determine a configuration of initial buckets of the MPP database cluster, wherein the configuration of initial buckets comprises a quantity of initial buckets, add at least one additional server to the MPP database cluster to produce updated servers, wherein the updated servers comprise the initial servers and the at least one additional server, create a configuration of updated buckets comprising the initial buckets in accordance with the quantity of initial servers, the configuration of initial buckets, and a quantity of additional servers, wherein the configuration of updated buckets identifies a subset of buckets of the initial buckets, with the subset of buckets being transmitted to the at least one additional server from the initial servers; and redistribute, based on the configuration of updated buckets, data from the initial servers to the at least one additional server, with the data being associated with the subset of buckets. 19. A computer comprising: a processor; and
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
Parallel file systems, i.e. file systems supporting multiple processors · CPC title
Physics · mapped topic
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.