Selecting partitions for reclustering based on distribution of overlapping partitions

US11544244B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11544244-B2
Application numberUS-202217654296-A
CountryUS
Kind codeB2
Filing dateMar 10, 2022
Priority dateJul 17, 2018
Publication dateJan 3, 2023
Grant dateJan 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are embodiments of systems and methods for selecting partitions for reclustering based on distribution of overlapping partitions. In an example, a database platform makes a determination to at least partially recluster a database table that includes data stored across a plurality of partitions. The database platform responsively selects a subset of the partitions. The selecting of the subset includes identifying a point on a domain of a clustering key that corresponds to a local maximum of overlapping partitions, and also includes selecting the subset from among a group of overlapping partitions. The group includes at least one partition that overlaps the identified point on the domain of the clustering key. Each partition in the selected subset is above a reduction goal of overlapping partitions. The database platform at least partially reclusters the selected subset based on the clustering key.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a database platform executing instructions on at least one hardware processor, the method comprising: making an incremental-reclustering determination to at least partially recluster a database table, the database table comprising table data stored across a plurality of partitions, the database table further comprising a clustering key; selecting, in response to making the incremental-reclustering determination, a subset of the plurality of partitions, the selecting of the subset comprising: identifying a point on a domain of the clustering key that corresponds to a local maximum number of overlapping partitions; and selecting the subset from among a group of overlapping partitions, the group of overlapping partitions including at least one partition that overlaps the identified point on the domain of the clustering key, each partition in the selected subset being above a reduction goal that is measured in number of overlapping partitions; and at least partially reclustering the selected subset based on the clustering key. 2. The method of claim 1 , wherein making the incremental-reclustering determination comprises making the incremental-reclustering determination based on a budget of one or more available computing resources. 3. The method of claim 1 , wherein the local maximum number of overlapping partitions is also a global maximum number of overlapping partitions on the domain of the clustering key. 4. The method of claim 1 , wherein the local maximum number of overlapping partitions is not a global maximum number of overlapping partitions on the domain of the clustering key. 5. The method of claim 1 , wherein selecting the subset comprises selecting partitions from among an uppermost subgroup of the group of overlapping partitions that are above the reduction goal. 6. The method of claim 1 , wherein selecting the subset comprises selecting partitions from among the group of overlapping partitions that are both above the reduction goal and that have the greatest widths among the partitions in the group of overlapping partitions that are above the reduction goal. 7. The method of claim 1 , wherein at least partially reclustering the selected subset based on the clustering key comprises reclustering the entire selected subset based on the clustering key. 8. The method of claim 7 , wherein reclustering the entire selected subset comprises distributing the entire selected subset among a plurality of workers to be reclustered. 9. The method of claim 1 , wherein the making of the incremental-reclustering determination to at least partially recluster the database table is based on determining that at least a threshold number of modifications have been made to the database table since a previous reclustering operation. 10. The method of claim 1 , wherein the making of the incremental-reclustering determination to at least partially recluster the database table is based on one or more clustering metrics of the database table. 11. A database platform comprising: at least one hardware processor; and one or more non-transitory computer readable storage media containing instructions that, when executed by the at least one hardware processor, cause the database platform to perform operations comprising: making an incremental-reclustering determination to at least partially recluster a database table, the database table comprising table data stored across a plurality of partitions, the database table further comprising a clustering key; selecting, in response to making the incremental-reclustering determination, a subset of the plurality of partitions, the selecting of the subset comprising: identifying a point on a domain of the clustering key that corresponds to a local maximum number of overlapping partitions; and selecting the subset from among a group of overlapping partitions, the group of overlapping partitions including at least one partition that overlaps the identified point on the domain of the clustering key, each partition in the selected subset being above a reduction goal that is measured in number of overlapping partitions; and at least partially reclustering the selected subset based on the clustering key. 12. The database platform of claim 11 , wherein making the incremental-reclustering determination comprises making the incremental-reclustering determination based on a budget of one or more available computing resources. 13. The database platform of claim 11 , wherein the local maximum number of overlapping partitions is also a global maximum number of overlapping partitions on the domain of the clustering key. 14. The database platform of claim 11 , wherein the local maximum number of overlapping partitions is not a global maximum number of overlapping partitions on the domain of the clustering key. 15. The database platform of claim 11 , wherein selecting the subset comprises selecting partitions from among an uppermost subgroup of the group of overlapping partitions that are above the reduction goal. 16. The database platform of claim 11 , wherein selecting the subset comprises selecting partitions from among the group of overlapping partitions that are both above the reduction goal and that have the greatest widths among the partitions in the group of overlapping partitions that are above the reduction goal. 17. The database platform of claim 11 , wherein at least partially reclustering the selected subset based on the clustering key comprises reclustering the entire selected subset based on the clustering key. 18. The database platform of claim 17 , wherein reclustering the entire selected subset comprises distributing the entire selected subset among a plurality of workers to be reclustered. 19. The database platform of claim 11 , wherein the making of the incremental-reclustering determination to at least partially recluster the database table is based on determining that at least a threshold number of modifications have been made to the database table since a previous reclustering operation. 20. The database platform of claim 11 , wherein the making of the incremental-reclustering determination to at least partially recluster the database table is based on one or more clustering metrics of the database table. 21. One or more non-transitory computer readable storage media containing instructions that, when executed by at least one hardware processor of a database platform, cause the database platform to perform operations comprising: making an incremental-reclustering determination to at least partially recluster a database table, the database table comprising table data stored across a plurality of partitions, the database table further comprising a clustering key; selecting, in response to making the incremental-reclustering determination, a subset of the plurality of partitions, the selecting of the subset comprising: identifying a point on a domain of the clustering key that corresponds to a local maximum number of overlapping partitions; and selecting the subset from among a group of overlapping partitions, the group of overlapping partitions including at least one partition that overlaps the identified point on the domain of the clustering key, each partition in the selected subset being above a reduction goal that is measured in number of overlapping partitions; and at least partially reclustering the selected subset based on the clustering key. 22. The one or more non-transitory computer readabl

Assignees

Inventors

Classifications

  • Tablespace storage structures; Management thereof · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Clustering or classification · CPC title

  • G06F7/08Primary

    Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11544244B2 cover?
Disclosed herein are embodiments of systems and methods for selecting partitions for reclustering based on distribution of overlapping partitions. In an example, a database platform makes a determination to at least partially recluster a database table that includes data stored across a plurality of partitions. The database platform responsively selects a subset of the partitions. The selecting…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).