Incremental reclustering of database tables using reclustering-count levels

US11403275B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11403275-B2
Application numberUS-202117511064-A
CountryUS
Kind codeB2
Filing dateOct 26, 2021
Priority dateJul 17, 2018
Publication dateAug 2, 2022
Grant dateAug 2, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the data structure. The subject technology sorts each of one or more peaks in the data structure based on height. The subject technology sorts overlapping micro-partitions based on width. The subject technology selects based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a database platform executing instructions on at least one hardware processor, the method comprising: based on determining that a proportion of a plurality of partitions of a database table that are in one or more lower clustering levels of the database table exceeds a clustering-mode threshold, entering a clustering mode in which reclustering operations are performed on the database table; and while in the clustering mode: selecting one or more partitions from among the plurality of partitions of the database table for reclustering, each selected partition being in a first lower clustering level among the one or more lower clustering levels of the database table, the database table further comprising a maximum clustering level, a given clustering level of a given partition indicating a number of times the given partition has been reclustered; and reclustering the selected one or more partitions, the reclustering transitioning each reclustered partition to a next-higher clustering level among the maximum clustering level and the one or more lower clustering levels of the database table. 2. The method of claim 1 , further comprising entering, based on determining that the proportion of the plurality of partitions of the database table that are in the one or more lower clustering levels of the database table does not exceed the clustering-mode threshold, a stable mode in which reclustering operations are not performed on the database table. 3. The method of claim 1 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is based on one or more clustering metrics of the database table. 4. The method of claim 1 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is performed responsive to making a determination that the database table is not sufficiently clustered. 5. The method of claim 4 , wherein the making of the determination that the database table is not sufficiently clustered comprises determining one or more of: that at least a threshold number of rows was added to the database table; that at least a threshold number of rows was deleted from the database table; and that at least a threshold number of rows was modified in the database table. 6. The method of claim 4 , wherein the determination that the database table is not sufficiently clustered is based at least in part on a budget of resources allocated to performing reclustering operations. 7. The method of claim 1 , wherein the maximum clustering level is calculated based on a set of one or more factors, the set of one or factors comprising a size of the database table. 8. The method of claim 1 , wherein the reclustering of the selected one or more partitions comprises reclustering the selected one or more partitions according to a clustering key. 9. The method of claim 8 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table comprises including, in the selected one or more partitions, one or more worst-clustered partitions in the first lower clustering level according to the clustering key. 10. The method of claim 1 , wherein the reclustering of the selected one or more partitions comprises: segmenting the selected one or more partitions into smaller groups of partitions; and reclustering the smaller groups of partitions. 11. A database platform comprising: at least one hardware processor; and one or more non-transitory computer readable storage media containing instructions that, when executed by the at least one hardware processor, cause the database platform to perform operations comprising: based on determining that a proportion of a plurality of partitions of a database table that are in one or more lower clustering levels of the database table exceeds a clustering-mode threshold, entering a clustering mode in which reclustering operations are performed on the database table; and while in the clustering mode: selecting one or more partitions from among the plurality of partitions of the database table for reclustering, each selected partition being in a first lower clustering level among the one or more lower clustering levels of the database table, the database table further comprising a maximum clustering level, a given clustering level of a given partition indicating a number of times the given partition has been reclustered; and reclustering the selected one or more partitions, the reclustering transitioning each reclustered partition to a next-higher clustering level among the maximum clustering level and the one or more lower clustering levels of the database table. 12. The database platform of claim 11 , the operations further comprising entering, based on determining that the proportion of the plurality of partitions of the database table that are in the one or more lower clustering levels of the database table does not exceed the clustering-mode threshold, a stable mode in which reclustering operations are not performed on the database table. 13. The database platform of claim 11 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is based on one or more clustering metrics of the database table. 14. The database platform of claim 11 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table is performed responsive to making a determination that the database table is not sufficiently clustered. 15. The database platform of claim 14 , wherein the making of the determination that the database table is not sufficiently clustered comprises determining one or more of: that at least a threshold number of rows was added to the database table; that at least a threshold number of rows was deleted from the database table; and that at least a threshold number of rows was modified in the database table. 16. The database platform of claim 14 , wherein the determination that the database table is not sufficiently clustered is based at least in part on a budget of resources allocated to performing reclustering operations. 17. The database platform of claim 11 , wherein the maximum clustering level is calculated based on a set of one or more factors, the set of one or factors comprising a size of the database table. 18. The database platform of claim 11 , wherein the reclustering of the selected one or more partitions comprises reclustering the selected one or more partitions according to a clustering key. 19. The database platform of claim 18 , wherein the selecting, for reclustering, of the one or more partitions from among the plurality of partitions of the database table comprises including, in the selected one or more partitions, one or more worst-clustered partitions in the first lower clustering level according to the clustering key. 20. The database platform of claim 11 , wherein the reclustering of the selected one or more partitions comprises: segmenting the selected one or more partitions into smaller groups of partitions; and reclustering the smaller groups of partitions. 21. One or more non-transitory computer readable storage media containing instructions that, when executed by at least one hardware processor of a database platform, cause the database platform to perform op

Assignees

Inventors

Classifications

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Clustering or classification · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • G06F7/08Primary

    Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title

  • Tablespace storage structures; Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11403275B2 cover?
The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the …
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 02 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).