Incremental clustering maintenance of a table

US10817540B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10817540-B2
Application numberUS-201715694436-A
CountryUS
Kind codeB2
Filing dateSep 1, 2017
Priority dateSep 2, 2016
Publication dateOct 27, 2020
Grant dateOct 27, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes storing table data for a table in a plurality of partitions and for maintaining approximate or good enough clustering. The method includes creating one or more new partitions based on changes to the table, wherein at least one of the one or more new partitions overlap with each other or previous partitions resulting in a decrease in a degree of clustering of the table. The method includes determining that a degree of clustering of the table data is below a clustering threshold. The method further includes reclustering one or more partitions of the table to improve the degree of clustering of the table in response to one or more of: determining that the degree of clustering has fallen below the clustering threshold, an explicit user command from a user, and/or as part of a DML command. Reclustering may be performed in incremental steps to iteratively improve clustering.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer database implemented method, the method comprising: storing table data for a table in a plurality of partitions, each partition comprising a portion of the table data for the table, the partitions being at least partially clustered based on one or more attributes in the table; creating one or more new partitions based on changes to the table, at least one of the one or more new partitions overlapping with respect to values of the one or more attributes with at least one of another new partition and a previous partition, the creating of the one or more new partitions resulting in a decrease in a degree of clustering of the table; determining, after creating the one or more new partitions, that the degree of clustering of the table is below a clustering threshold, the degree of clustering of the table reflecting a degree of overlap across the plurality of partitions of values of the one or more attributes, the degree of clustering and the degree of overlap being inversely related, the clustering threshold corresponding to a clustering ratio, the clustering ratio determined by at least a proportion of rows in a layout of the table that satisfy an ordering criteria based at least in part a particular attribute of the one or more attributes; and responsive to determining that the degree of clustering of the table is below the clustering threshold, increasing the degree of clustering of the table by reclustering one or more partitions of the table. 2. The method of claim 1 , further comprising determining the degree of clustering of the table based on one or more of: how many partitions overlap other partitions of the table; a degree of overlap of one or more partitions with other partitions of the table; how many partitions overlap for one or more attribute values; each individual depth of the partitions; a distribution of depth of the partitions; and an average depth of the partitions, wherein the depth comprises a number of partitions that overlap for a specific attribute value for the one or more attributes. 3. The method of claim 1 , wherein determining that the degree of clustering of the table is below the clustering threshold comprises determining one or both of: one or more of an amount, frequency, and type of data manipulation language (DML) statements performed on the table; and an amount of new data added to the table. 4. The method of claim 1 , wherein determining that the degree of clustering of the table is below the clustering threshold comprises determining that an execution time of an example query exceeds a threshold query execution time. 5. The method of claim 1 , wherein determining that the degree of clustering of the table is below the clustering threshold is based on pruning effectiveness during query compilation, and filter selectivity during query execution. 6. The method of claim 1 , wherein reclustering comprises selecting two or more partitions as merge candidates. 7. The method of claim 6 , wherein selecting the two or more partitions as the merge candidates comprises selecting based on one or more of: the two or more partitions containing overlapping values for the one or more attributes; a degree to which the two or more partitions overlap; a depth of selected partitions; a distribution of selected partitions; a number of times a partition has been reclustered; a resource budget; a width of values corresponding to the one or more attributes covered by the two or more partitions; and whether a partition is ideally clustered based on the one or more attributes. 8. The method of claim 6 , wherein selecting the two or more partitions as the merge candidates comprises not selecting partitions that: do not overlap any other partitions in the table; or do not overlap any other partitions in the table by more than an overlap threshold. 9. The method of claim 6 , wherein selecting the two or more partitions as the merge candidates comprises not selecting partitions comprising row values having an identical value for the one or more attributes. 10. The method of claim 1 , wherein reclustering incrementally increases the degree of clustering of the table. 11. The method of claim 1 , wherein reclustering comprises reclustering based on one or more of a reclustering resource budget, a number of partitions, data size, and available computing resources. 12. The method of claim 1 , wherein reclustering comprises merging two or more partitions. 13. The method of claim 1 , wherein, both before and after the changes to the table, the table is not ideally clustered, wherein the table being ideally clustered comprises one or both of the following being true of each of the partitions of the table: the partition comprises no overlaps with one or more other partitions in ranges of values corresponding to the one or more attributes; and all rows of the partition comprise the same value for an attribute of the one or more attributes. 14. A system for incremental clustering maintenance of database data, the system comprising: one or more processors; and computer readable storage media storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: storing table data for a table in a plurality of partitions, each partition comprising a portion of the table data for the table, the partitions being at least partially clustered based on one or more attributes in the table; creating one or more new partitions based on changes to the table, at least one of the one or more new partitions overlapping with respect to values of the one or more attributes with at least one of another new partition and a previous partition, the creating of the one or more new partitions resulting in a decrease in a degree of clustering of the table; determining, after creating the one or more new partitions, that the degree of clustering of the table is below a clustering threshold, the degree of clustering of the table reflecting a degree of overlap across the plurality of partitions of values of the one or more attributes, the degree of clustering and the degree of overlap being inversely related, the clustering threshold corresponding to a clustering ratio, the clustering ratio determined by at least a proportion of rows in a layout of the table that satisfy an ordering criteria based at least in part a particular attribute of the one or more attributes; and responsive to determining that the degree of clustering of the table is below the clustering threshold, increasing the degree of clustering of the table by reclustering one or more partitions of the table. 15. The system of claim 14 , the operations further comprising determining the degree of clustering of the table based on one or more of: how many partitions overlap other partitions of the table; a degree of overlap of one or more partitions with other partitions of the table; how many partitions overlap for one or more attribute values; each individual depth of the partitions; a distribution of depth of the partitions; and an average depth of the partitions, wherein the depth comprises a number of partitions that overlap for a specific attribute value for the one or more attributes. 16. The system of claim 14 , wherein determining that the degree of clustering of the table is below the clustering threshold comprises determining that an execution time of an example query exceeds a threshold query execution time. 17. The system of claim 14 , wherein reclustering comprises selecting two or

Assignees

Inventors

Classifications

  • Query processing · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Schema design and management · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • G06F16/285Primary

    Clustering or classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10817540B2 cover?
A method includes storing table data for a table in a plurality of partitions and for maintaining approximate or good enough clustering. The method includes creating one or more new partitions based on changes to the table, wherein at least one of the one or more new partitions overlap with each other or previous partitions resulting in a decrease in a degree of clustering of the table. The met…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 27 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).