Reclustering of database tables based on peaks and widths

US10956394B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10956394-B2
Application numberUS-202016941215-A
CountryUS
Kind codeB2
Filing dateJul 28, 2020
Priority dateJul 17, 2018
Publication dateMar 23, 2021
Grant dateMar 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the data structure. The subject technology sorts each of one or more peaks in the data structure based on height. The subject technology sorts overlapping micro-partitions based on width. The subject technology selects based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a database platform comprising at least one hardware processor, the method comprising: determining whether a table is sufficiently clustered: in response to determining that the table is not sufficiently clustered, selecting one or more micro-partitions of the table to be reclustered, the selecting comprising: constructing a data structure for the table; extracting minimum and maximum endpoints for each micro-partition in the data structure; sorting each of one or more peaks in the data structure based on height; sorting overlapping micro-partitions based on width; and selecting based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths; and reclustering the selected one or more micro-partitions of the table. 2. The method of claim 1 , further comprising: defining a constant micro-partition having equivalent minimum and maximum values for a cluster key column; and removing the constant micro-partition from the selected one or more micro-partitions of the table prior to the reclustering. 3. The method of claim 1 , wherein the selecting further comprises: identifying the one or more peaks in the data structure as those that are taller than a predefined threshold; and identifying the overlapping micro-partitions within each of the one or more peaks. 4. The method of claim 1 , wherein the data structure comprises a stabbing count array or an interval tree. 5. The method of claim 1 , further comprising determining a budget for allocating resources to perform reclustering operations. 6. The method of claim 5 , wherein the determining whether the table is sufficiently clustered is based at least in part on the budget. 7. The method of claim 1 , further comprising partitioning the selected one or more micro-partitions of the table into one or more batches each comprising a grouping of micro-partitions. 8. The method of claim 1 , further comprising receiving an indication that a data modification task has been executed on the table, wherein the executing of the data modification task on the table comprises ingesting new micro-partitions into the table. 9. The method of claim 1 , further comprising entering a catch-up mode in which, based at least in part on a high proportion of database partitions being within lower levels of the table, reclustering operations are performed. 10. The method of claim 1 , further comprising entering a stable mode in which, based at least in part on a low proportion of database partitions being within lower levels of the table, no reclustering operations are performed. 11. A system comprising: at least one hardware processor; and a memory device including instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: determining whether a table is sufficiently clustered; in response to determining that the table is not sufficiently clustered, selecting one or more micro-partitions of the table to be reclustered, the selecting comprising: constructing a data structure for the table; extracting minimum and maximum endpoints for each micro-partition in the data structure; sorting each of one or more peaks in the data structure based on height; sorting overlapping micro-partitions based on width; and selecting based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths; and reclustering the selected one or more micro-partitions of the table. 12. The system of claim 11 , the operations further comprising: defining a constant micro-partition having equivalent minimum and maximum values for a cluster key column; and removing the constant micro-partition from the selected one or more micro-partitions of the table prior to the reclustering. 13. The system of claim 11 , wherein the selecting further comprises: identifying the one or more peaks in the data structure as those that are taller than a predefined threshold; and identifying the overlapping micro-partitions within each of the one or more peaks. 14. The system of claim 11 , wherein the data structure comprises a stabbing count array or an interval tree. 15. The system of claim 11 , the operations further comprising determining a budget for allocating resources to perform reclustering operations. 16. The system of claim 15 , wherein the determining whether the table is sufficiently clustered is based at least in part on the budget. 17. The system of claim 11 , the operations further comprising partitioning the selected one or more micro-partitions of the table into one or more batches each comprising a grouping of micro-partitions. 18. The system of claim 11 , the operations further comprising receiving an indication that a data modification task has been executed on the table, wherein the executing of the data modification task on the table comprises ingesting new micro-partitions into the table. 19. The system of claim 11 , the operations further comprising entering a catch-up mode in which, based at least in part on a high proportion of database partitions being within lower levels of the table, reclustering operations are performed. 20. The system of claim 11 , the operations further comprising entering a stable mode in which, based at least in part on a low proportion of database partitions being within lower levels of the table, no reclustering operations are performed. 21. A non-transitory computer-readable medium comprising instructions that, when executed by at least one hardware processor, cause the at least one hardware processor to perform operations comprising: determining whether a table is sufficiently clustered; in response to determining that the table is not sufficiently clustered, selecting one or more micro-partitions of the table to be reclustered, the selecting comprising: constructing a data structure for the table; extracting minimum and maximum endpoints for each micro-partition in the data structure; sorting each of one or more peaks in the data structure based on height; sorting overlapping micro-partitions based on width; and selecting based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths; and reclustering the selected one or more micro-partitions of the table. 22. The non-transitory computer-readable medium of claim 21 , the operations further comprising: defining a constant micro-partition having equivalent minimum and maximum values for a cluster key column; and removing the constant micro-partition from the selected one or more micro-partitions of the table prior to the reclustering. 23. The non-transitory computer-readable medium of claim 21 , wherein the selecting further comprises: identifying the one or more peaks in the data structure as those that are taller than a predefined threshold; and identifying the overlapping micro-partitions within each of the one or more peaks. 24. The non-transitory computer-readable medium of claim 21 , wherein the data structure comprises a stabbing count array or an interval tree. 25. The non-transitory computer-readable medium of claim 21 , the operatio

Assignees

Inventors

Classifications

  • Clustering or classification · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • Tablespace storage structures; Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10956394B2 cover?
The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the …
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).