Reclustering of database tables using level information
US-10963443-B2 · Mar 30, 2021 · US
US11403275B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11403275-B2 |
| Application number | US-202117511064-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 26, 2021 |
| Priority date | Jul 17, 2018 |
| Publication date | Aug 2, 2022 |
| Grant date | Aug 2, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the data structure. The subject technology sorts each of one or more peaks in the data structure based on height. The subject technology sorts overlapping micro-partitions based on width. The subject technology selects based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths.
Opening claim text (preview).
What is claimed is: 1. A method performed by a database platform executing instructions on at least one hardware processor, the method comprising: based on determining that a proportion of a plurality of partitions of a database table that are in one or more lower clustering levels of the database table exceeds a clustering-mode threshold, entering a clustering mode in which reclustering operations are performed on the database table; and while in the clustering mode: selecting one or more partitions from among the plurality of partitions of the database table for reclustering, each selected partition being in a first lower clustering level among the one or more lower clustering levels of the database table, the database table further comprising a maximum clustering level, a given clustering level of a given partition indicating a number of times the given partition has been reclustered; and reclustering the selected one or more partitions, the reclustering transitioning each reclustered partition to a next-higher clustering level among the maximum clustering level and the one or more lower clustering levels of the database table. 2. The method of claim 1 , further comprising entering, based on determining that the proportion of the plurality of partitions of the database table that are in the one or more lower clustering levels of the database table does not exceed the clustering-mode threshold, a stable mode in which reclustering operations are not performed on the database table. 3. The method of claim 1 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is based on one or more clustering metrics of the database table. 4. The method of claim 1 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is performed responsive to making a determination that the database table is not sufficiently clustered. 5. The method of claim 4 , wherein the making of the determination that the database table is not sufficiently clustered comprises determining one or more of: that at least a threshold number of rows was added to the database table; that at least a threshold number of rows was deleted from the database table; and that at least a threshold number of rows was modified in the database table. 6. The method of claim 4 , wherein the determination that the database table is not sufficiently clustered is based at least in part on a budget of resources allocated to performing reclustering operations. 7. The method of claim 1 , wherein the maximum clustering level is calculated based on a set of one or more factors, the set of one or factors comprising a size of the database table. 8. The method of claim 1 , wherein the reclustering of the selected one or more partitions comprises reclustering the selected one or more partitions according to a clustering key. 9. The method of claim 8 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table comprises including, in the selected one or more partitions, one or more worst-clustered partitions in the first lower clustering level according to the clustering key. 10. The method of claim 1 , wherein the reclustering of the selected one or more partitions comprises: segmenting the selected one or more partitions into smaller groups of partitions; and reclustering the smaller groups of partitions. 11. A database platform comprising: at least one hardware processor; and one or more non-transitory computer readable storage media containing instructions that, when executed by the at least one hardware processor, cause the database platform to perform operations comprising: based on determining that a proportion of a plurality of partitions of a database table that are in one or more lower clustering levels of the database table exceeds a clustering-mode threshold, entering a clustering mode in which reclustering operations are performed on the database table; and while in the clustering mode: selecting one or more partitions from among the plurality of partitions of the database table for reclustering, each selected partition being in a first lower clustering level among the one or more lower clustering levels of the database table, the database table further comprising a maximum clustering level, a given clustering level of a given partition indicating a number of times the given partition has been reclustered; and reclustering the selected one or more partitions, the reclustering transitioning each reclustered partition to a next-higher clustering level among the maximum clustering level and the one or more lower clustering levels of the database table. 12. The database platform of claim 11 , the operations further comprising entering, based on determining that the proportion of the plurality of partitions of the database table that are in the one or more lower clustering levels of the database table does not exceed the clustering-mode threshold, a stable mode in which reclustering operations are not performed on the database table. 13. The database platform of claim 11 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is based on one or more clustering metrics of the database table. 14. The database platform of claim 11 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is performed responsive to making a determination that the database table is not sufficiently clustered. 15. The database platform of claim 14 , wherein the making of the determination that the database table is not sufficiently clustered comprises determining one or more of: that at least a threshold number of rows was added to the database table; that at least a threshold number of rows was deleted from the database table; and that at least a threshold number of rows was modified in the database table. 16. The database platform of claim 14 , wherein the determination that the database table is not sufficiently clustered is based at least in part on a budget of resources allocated to performing reclustering operations. 17. The database platform of claim 11 , wherein the maximum clustering level is calculated based on a set of one or more factors, the set of one or factors comprising a size of the database table. 18. The database platform of claim 11 , wherein the reclustering of the selected one or more partitions comprises reclustering the selected one or more partitions according to a clustering key. 19. The database platform of claim 18 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table comprises including, in the selected one or more partitions, one or more worst-clustered partitions in the first lower clustering level according to the clustering key. 20. The database platform of claim 11 , wherein the reclustering of the selected one or more partitions comprises: segmenting the selected one or more partitions into smaller groups of partitions; and reclustering the smaller groups of partitions. 21. One or more non-transitory computer readable storage media containing instructions that, when executed by at least one hardware processor of a database platform, cause the database platform to perform op
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Clustering or classification · CPC title
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title
Tablespace storage structures; Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.