Incremental clustering of database tables
US-10853345-B2 · Dec 1, 2020 · US
US10963443B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10963443-B2 |
| Application number | US-202016918591-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 1, 2020 |
| Priority date | Jul 17, 2018 |
| Publication date | Mar 30, 2021 |
| Grant date | Mar 30, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Automatic clustering of a database table is disclosed. A method for automatic clustering of a database table includes receiving an indication that a data modification task has been executed on a table and determining whether the table is sufficiently clustered. The method includes, in response to determining the table is not sufficiently clustered, selecting one or more micro-partitions of the table to be reclustered. The method includes assigning each of the one or more micro-partitions to an execution node to be reclustered.
Opening claim text (preview).
What is claimed is: 1. A method performed by a database platform executing instructions on at least one hardware processor, the method comprising: dividing a database table into one or more levels, the database table comprising a plurality of database partitions, each level corresponding to a number of times data in that level has been reclustered; calculating a maximum number of levels for the database table based on (i) a batch size selected for a clustering execution and (ii) a number of database partitions in the plurality of database partitions, the maximum number of levels representing an upper bound on a number of times any piece of data in the plurality of database partitions is reclustered; determining defined boundaries of a subset of the plurality of database partitions from at least one level of the one or more levels; selecting one or more micro-batches of database partitions from the subset of the plurality of database partitions; and reclustering the selected one or more micro-batches of database partitions according to a clustering key. 2. The method of claim 1 , wherein: the selecting of the one or more micro-batches of database partitions from the subset of the plurality of database partitions is based on a sorting order of the subset of the plurality of database partitions; and the reclustering of the selected one or more micro-batches of database partitions according to the clustering key comprises providing at least one of the selected one or more micro-batches of database partitions to one or more clustering workers to be reclustered according to the clustering key. 3. The method of claim 2 , wherein the sorting order is based on clustering key metadata. 4. The method of claim 2 , wherein the subset of the plurality of database partitions comprises one or more worst clustered partitions with respect to a bounded range according to the clustering key. 5. The method of claim 2 , further comprising receiving an indication that at least one of the provided one or more micro-batches has been reclustered according to the clustering key by at least one of the one or more clustering workers. 6. The method of claim 2 , further comprising maintaining a table state of the database table, the table state comprising one or more types of information selected from the group consisting of: clustering information for each level of the one or more levels, level information for each database partition of the plurality of database partitions, and a clustering state of the database table. 7. The method of claim 1 , wherein the dividing of the database table into the one or more levels is based on the maximum number of levels. 8. The method of claim 1 , wherein the database partitions in the plurality of database partitions comprise a plurality of micro-partitions, the micro-partitions comprising immutable storage devices that cannot be updated in-place. 9. The method of claim 1 , the database partitions in the plurality of database partitions comprising a plurality of micro-partitions, the method further comprising: receiving an indication that a data modification task has been executed on the database table, wherein the executing of the data modification task comprises ingesting new micro-partitions into the database table; identifying a proportion of micro-partitions in lower levels of the one or more levels of the database table; determining whether a high proportion of micro-partitions is in the lower levels; and based on determining that a high proportion of micro-partitions is in the lower levels, entering a catch-up mode in which reclustering operations are performed according to the clustering key. 10. The method of claim 9 , further comprising, based on determining that a high proportion of micro-partitions is not in the lower levels, entering a stable mode in which no reclustering operations are performed. 11. A system comprising: at least one hardware processor; and a memory device storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: dividing a database table into one or more levels, the database table comprising a plurality of database partitions, each level corresponding to a number of times data in that level has been reclustered; calculating a maximum number of levels for the database table based on (i) a batch size selected for a clustering execution and (ii) a number of database partitions in the plurality of database partitions, the maximum number of levels representing an upper bound on a number of times any piece of data in the plurality of database partitions is reclustered; determining defined boundaries of a subset of the plurality of database partitions from at least one level of the one or more levels; selecting one or more micro-batches of database partitions from the subset of the plurality of database partitions; and reclustering the selected one or more micro-batches of database partitions according to a clustering key. 12. The system of claim 11 , wherein: the selecting of the one or more micro-batches of database partitions from the subset of the plurality of database partitions is based on a sorting order of the subset of the plurality of database partitions; and the reclustering of the selected one or more micro-batches of database partitions according to the clustering key comprises providing at least one of the selected one or more micro-batches of database partitions to one or more clustering workers to be reclustered according to the clustering key. 13. The system of claim 12 , wherein the sorting order is based on clustering key metadata. 14. The system of claim 12 , wherein the subset of the plurality of database partitions comprises one or more worst clustered partitions with respect to a bounded range according to the clustering key. 15. The system of claim 12 , the operations further comprising receiving an indication that at least one of the provided one or more micro-batches has been reclustered according to the clustering key by at least one of the one or more clustering workers. 16. The system of claim 12 , the operations further comprising maintaining a table state of the database table, the table state comprising one or more types of information selected from the group consisting of: clustering information for each level of the one or more levels, level information for each database partition of the plurality of database partitions, and a clustering state of the database table. 17. The system of claim 11 , wherein the dividing of the database table into the one or more levels is based on the maximum number of levels. 18. The system of claim 11 , wherein the database partitions in the plurality of database partitions comprise a plurality of micro-partitions, the micro-partitions comprising immutable storage devices that cannot be updated in-place. 19. The system of claim 11 , the database partitions in the plurality of database partitions comprising a plurality of micro-partitions, the operations further comprising: receiving an indication that a data modification task has been executed on the database table, wherein the executing of the data modification task comprises ingesting new micro-partitions into the database table; identifying a proportion of micro-partitions in lower levels of the one or more levels of the database table; determining whether a high proportion of micro-partitions Preps in the lower levels; and based on determining that a high proportion
Clustering or classification · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Tablespace storage structures; Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.