What technology area does this patent fall under?

Primary CPC classification G06F7/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Incremental clustering of database tables

US10853345B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10853345-B2
Application number	US-201916716989-A
Country	US
Kind code	B2
Filing date	Dec 17, 2019
Priority date	Jul 17, 2018
Publication date	Dec 1, 2020
Grant date	Dec 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Automatic clustering of a database table is disclosed. A method for automatic clustering of a database table includes receiving an indication that a data modification task has been executed on a table and determining whether the table is sufficiently clustered. The method includes, in response to determining the table is not sufficiently clustered, selecting one or more micro-partitions of the table to be reclustered. The method includes assigning each of the one or more micro-partitions to an execution node to be reclustered.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving an indication that a data modification task has been executed on a table, the data modification task comprising ingesting new micro-partitions into the table; determining, subsequent to the execution of the data modification task on the table, that the table is not sufficiently clustered, the determining comprising: retrieving level information for the table; identifying, based on the retrieved level information, a proportion of micro-partitions in lower levels of the table; and determining that the identified proportion of micro-partitions in lower levels of the table exceeds a threshold, and responsively entering a catch-up mode in which reclustering operations are performed; selecting one or more micro-partitions of the table to be reclustered; and reclustering the one or more selected micro-partitions. 2. The method of claim 1 , wherein determining that the table is not sufficiently clustered comprises determining one or more of: at least a threshold number of rows was added to the table by the execution of the data modification task on the table; at least a threshold number of rows was deleted from the table by the execution of the data modification task on the table; and at least a threshold number of rows was modified in the table by the execution of the data modification task on the table. 3. The method of claim 1 , wherein selecting one or more micro-partitions of the table to be reclustered comprises: identifying a constant micro-partition, the constant micro-partition having equivalent minimum and maximum values for a cluster key column; and selecting, as the one or more micro-partitions of the table to be reclustered, one or more micro-partitions of the table other than the identified constant micro-partition. 4. The method of claim 1 , further comprising: constructing a stabbing count array for the table; identifying one or more peaks in the stabbing count array that are taller than a predefined threshold; identifying micro-partitions within the one or more identified peaks; and sorting the identified micro-partitions based on width, wherein selecting the one or more micro-partitions of the table to be reclustered comprises selecting a subset of the identified micro-partitions having the greatest widths. 5. The method of claim 4 , further comprising: sorting each of the one or more identified peaks in the stabbing count array based on height, wherein identifying micro-partitions within the one or more identified peaks comprises identifying micro-partitions within a subset of the one or more identified peaks having the greatest heights. 6. The method of claim 1 , further comprising defining a budget for allocating processing resources to performing reclustering operations, wherein determining that the table is not sufficiently clustered is based at least in part on the defined budget. 7. The method of claim 1 , further comprising partitioning the one or more selected micro-partitions of the table into multiple batches of micro-partitions to be reclustered. 8. The method of claim 1 , wherein selecting the one or more micro-partitions of the table to be reclustered comprises: determining a maximum number of levels for the table based at least on a size of the table; dividing the table into levels; selecting a micro-batch of micro-partitions within each level; and selecting micro-partitions from the micro-batch. 9. The method of claim 1 , wherein reclustering the one or more selected micro-partitions comprises assigning each of the one or more selected micro-partitions to an execution node to be reclustered. 10. A system comprising: at least one processor; and one or more non-transitory computer readable storage media containing instructions executable by the at least one processor for causing the at least one processor to perform operations comprising: receiving an indication that a data modification task has been executed on a table, the data modification task comprises ingesting new micro-partitions into the table; determining, subsequent to the execution of the data modification task on the table, that the table is not sufficiently clustered, the determining comprising: retrieving level information for the table; identifying, based on the retrieved level information, a proportion of micro-partitions in lower levels of the table; and determining that the identified proportion of micro-partitions in lower levels of the table exceeds a threshold, and responsively entering a catch-up mode in which reclustering operations are performed; selecting one or more micro-partitions of the table to be reclustered; and reclustering the one or more selected micro-partitions. 11. The system of claim 10 , wherein determining that the table is not sufficiently clustered comprises determining one or more of: at least a threshold number of rows was added to the table by the execution of the data modification task on the table; at least a threshold number of rows was deleted from the table by the execution of the data modification task on the table; and at least a threshold number of rows was modified in the table by the execution of the data modification task on the table. 12. The system of claim 10 , wherein selecting one or more micro-partitions of the table to be reclustered comprises: identifying a constant micro-partition, the constant micro-partition having equivalent minimum and maximum values for a cluster key column; and selecting, as the one or more micro-partitions of the table to be reclustered, one or more micro-partitions of the table other than the identified constant micro-partition. 13. The system of claim 10 , the operations further comprising: constructing a stabbing count array for the table; identifying one or more peaks in the stabbing count array that are taller than a predefined threshold; identifying micro-partitions within the one or more identified peaks; and sorting the identified micro-partitions based on width, wherein selecting the one or more micro-partitions of the table to be reclustered comprises selecting a subset of the identified micro-partitions having the greatest widths. 14. The system of claim 13 , the operations further comprising: sorting each of the one or more identified peaks in the stabbing count array based on height, wherein identifying micro-partitions within the one or more identified peaks comprises identifying micro-partitions within a subset of the one or more identified peaks having the greatest heights. 15. The system of claim 10 , the operations further comprising defining a budget for allocating processing resources to performing reclustering operations, wherein determining that the table is not sufficiently clustered is based at least in part on the defined budget. 16. The system of claim 10 , the operations further comprising partitioning the one or more selected micro-partitions of the table into multiple batches of micro-partitions to be reclustered. 17. The system of claim 10 , wherein selecting the one or more micro-partitions of the table to be reclustered comprises: determining a maximum number of levels for the table based at least on a size of the table; dividing the table into levels; selecting a micro-batch of micro-partitions within each level; and selecting micro-partitions from the micro-batch. 18. The system of claim 10 , wherein reclustering the one or more selected micro-partitions comprises assigning each of the one or more selected micro-partitions to an e

Assignees

Snowflake Inc

Inventors

Classifications

G06F16/285
Clustering or classification · CPC title
G06F16/278
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
G06F16/22
Indexing; Data structures therefor; Storage structures · CPC title
G06F16/217
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
G06F7/08Primary
Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry (by merging two or more sets of carriers in ordered sequence G06F7/16) · CPC title

Patent family

Related publications grouped by family.

View patent family 69161290

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10853345B2 cover?: Automatic clustering of a database table is disclosed. A method for automatic clustering of a database table includes receiving an indication that a data modification task has been executed on a table and determining whether the table is sufficiently clustered. The method includes, in response to determining the table is not sufficiently clustered, selecting one or more micro-partitions of the …
Who is the assignee on this patent?: Snowflake Inc
What technology area does this patent fall under?: Primary CPC classification G06F7/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Incremental Clustering Of Database Tables

Materialization strategies in journal-based databases

Member clustering with equi-sized partitions

Modularized data distribution plan generation

Systems and methods for dynamic sharding of hierarchical data

System and method for data replication using a single master failover protocol

Frequently asked questions