Reclustering of database tables using level information

US10963443B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10963443-B2
Application numberUS-202016918591-A
CountryUS
Kind codeB2
Filing dateJul 1, 2020
Priority dateJul 17, 2018
Publication dateMar 30, 2021
Grant dateMar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automatic clustering of a database table is disclosed. A method for automatic clustering of a database table includes receiving an indication that a data modification task has been executed on a table and determining whether the table is sufficiently clustered. The method includes, in response to determining the table is not sufficiently clustered, selecting one or more micro-partitions of the table to be reclustered. The method includes assigning each of the one or more micro-partitions to an execution node to be reclustered.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a database platform executing instructions on at least one hardware processor, the method comprising: dividing a database table into one or more levels, the database table comprising a plurality of database partitions, each level corresponding to a number of times data in that level has been reclustered; calculating a maximum number of levels for the database table based on (i) a batch size selected for a clustering execution and (ii) a number of database partitions in the plurality of database partitions, the maximum number of levels representing an upper bound on a number of times any piece of data in the plurality of database partitions is reclustered; determining defined boundaries of a subset of the plurality of database partitions from at least one level of the one or more levels; selecting one or more micro-batches of database partitions from the subset of the plurality of database partitions; and reclustering the selected one or more micro-batches of database partitions according to a clustering key. 2. The method of claim 1 , wherein: the selecting of the one or more micro-batches of database partitions from the subset of the plurality of database partitions is based on a sorting order of the subset of the plurality of database partitions; and the reclustering of the selected one or more micro-batches of database partitions according to the clustering key comprises providing at least one of the selected one or more micro-batches of database partitions to one or more clustering workers to be reclustered according to the clustering key. 3. The method of claim 2 , wherein the sorting order is based on clustering key metadata. 4. The method of claim 2 , wherein the subset of the plurality of database partitions comprises one or more worst clustered partitions with respect to a bounded range according to the clustering key. 5. The method of claim 2 , further comprising receiving an indication that at least one of the provided one or more micro-batches has been reclustered according to the clustering key by at least one of the one or more clustering workers. 6. The method of claim 2 , further comprising maintaining a table state of the database table, the table state comprising one or more types of information selected from the group consisting of: clustering information for each level of the one or more levels, level information for each database partition of the plurality of database partitions, and a clustering state of the database table. 7. The method of claim 1 , wherein the dividing of the database table into the one or more levels is based on the maximum number of levels. 8. The method of claim 1 , wherein the database partitions in the plurality of database partitions comprise a plurality of micro-partitions, the micro-partitions comprising immutable storage devices that cannot be updated in-place. 9. The method of claim 1 , the database partitions in the plurality of database partitions comprising a plurality of micro-partitions, the method further comprising: receiving an indication that a data modification task has been executed on the database table, wherein the executing of the data modification task comprises ingesting new micro-partitions into the database table; identifying a proportion of micro-partitions in lower levels of the one or more levels of the database table; determining whether a high proportion of micro-partitions is in the lower levels; and based on determining that a high proportion of micro-partitions is in the lower levels, entering a catch-up mode in which reclustering operations are performed according to the clustering key. 10. The method of claim 9 , further comprising, based on determining that a high proportion of micro-partitions is not in the lower levels, entering a stable mode in which no reclustering operations are performed. 11. A system comprising: at least one hardware processor; and a memory device storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: dividing a database table into one or more levels, the database table comprising a plurality of database partitions, each level corresponding to a number of times data in that level has been reclustered; calculating a maximum number of levels for the database table based on (i) a batch size selected for a clustering execution and (ii) a number of database partitions in the plurality of database partitions, the maximum number of levels representing an upper bound on a number of times any piece of data in the plurality of database partitions is reclustered; determining defined boundaries of a subset of the plurality of database partitions from at least one level of the one or more levels; selecting one or more micro-batches of database partitions from the subset of the plurality of database partitions; and reclustering the selected one or more micro-batches of database partitions according to a clustering key. 12. The system of claim 11 , wherein: the selecting of the one or more micro-batches of database partitions from the subset of the plurality of database partitions is based on a sorting order of the subset of the plurality of database partitions; and the reclustering of the selected one or more micro-batches of database partitions according to the clustering key comprises providing at least one of the selected one or more micro-batches of database partitions to one or more clustering workers to be reclustered according to the clustering key. 13. The system of claim 12 , wherein the sorting order is based on clustering key metadata. 14. The system of claim 12 , wherein the subset of the plurality of database partitions comprises one or more worst clustered partitions with respect to a bounded range according to the clustering key. 15. The system of claim 12 , the operations further comprising receiving an indication that at least one of the provided one or more micro-batches has been reclustered according to the clustering key by at least one of the one or more clustering workers. 16. The system of claim 12 , the operations further comprising maintaining a table state of the database table, the table state comprising one or more types of information selected from the group consisting of: clustering information for each level of the one or more levels, level information for each database partition of the plurality of database partitions, and a clustering state of the database table. 17. The system of claim 11 , wherein the dividing of the database table into the one or more levels is based on the maximum number of levels. 18. The system of claim 11 , wherein the database partitions in the plurality of database partitions comprise a plurality of micro-partitions, the micro-partitions comprising immutable storage devices that cannot be updated in-place. 19. The system of claim 11 , the database partitions in the plurality of database partitions comprising a plurality of micro-partitions, the operations further comprising: receiving an indication that a data modification task has been executed on the database table, wherein the executing of the data modification task comprises ingesting new micro-partitions into the database table; identifying a proportion of micro-partitions in lower levels of the one or more levels of the database table; determining whether a high proportion of micro-partitions Preps in the lower levels; and based on determining that a high proportion

Assignees

Inventors

Classifications

  • Clustering or classification · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

  • Tablespace storage structures; Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10963443B2 cover?
Automatic clustering of a database table is disclosed. A method for automatic clustering of a database table includes receiving an indication that a data modification task has been executed on a table and determining whether the table is sufficiently clustered. The method includes, in response to determining the table is not sufficiently clustered, selecting one or more micro-partitions of the …
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).