Partition-based scanning of external tables for query processing

US11675780B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11675780-B2
Application numberUS-202217650462-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2022
Priority dateApr 16, 2019
Publication dateJun 13, 2023
Grant dateJun 13, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are embodiments of systems and methods for partition-based scanning of external tables for query processing. In an example embodiment, a database platform receives a query that includes one or more predicates, where the query is directed at least to data in an external table that is stored in an external storage platform that is external to the database platform. The database platform identifies, based on metadata that summarizes the data in the external table, one or more partitions of the external table that potentially include data that satisfies the one or more predicates. The database platform also identifies, from the one or more identified partitions, data that satisfies the one or more predicates. The database platform sends a response to the query to the client, the response comprising the data satisfying the one or more predicates.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by a database platform executing instructions on at least one hardware processor, the method comprising: receiving a query comprising one or more predicates, the query directed at least to data in an external table that is stored in an external storage platform that is external to the database platform, the external table being based on a source directory of the external storage platform, the source directory comprising a hierarchical structure of storage locations in which data of the external table is stored in the external storage platform, the storage locations defining a plurality of partitions of the external table; identifying, based on metadata that summarizes the data in the external table, one or more partitions in the plurality of partitions of the external table to be scanned for data satisfying the one or more predicates; identifying, from the one or more identified partitions, data satisfying the one or more predicates; generating a materialized view over the external table; storing the materialized view in an internal storage platform that is internal to the database platform; and sending a response to the query to the client, the response comprising the data satisfying the one or more predicates, the response further comprising data from the stored materialized view. 2. The method of claim 1 , further comprising: generating a query plan based on the query, the query plan comprising a plurality of discrete subtasks, the plurality of discrete subtasks comprising instructions to scan the identified one or more partitions of the external table for data satisfying the one or more predicates; and assigning, based on the metadata, the plurality of discrete subtasks to one or more nodes in an execution platform. 3. The method of claim 2 , wherein assigning, based on the metadata, the plurality of discrete subtasks further comprises: reading the metadata to determine whether a first partition is stored in a cache of any of a plurality of nodes of the execution platform; and assigning, in response to the first partition being stored in the cache of a first node of the execution platform, a task to the first node, the task instructing the first node to scan the first partition. 4. The method of claim 1 , wherein: the metadata further summarizes data in an internal table that is stored in an internal storage platform that is internal to the database platform; and the method further comprises: identifying, based on the metadata, one or more partitions of the internal table to be scanned for data satisfying the one or more predicates; identifying, from the one or more identified partitions of the internal table, second data satisfying the one or more predicates; and including the second data in the response to the query. 5. The method of claim 1 , further comprising: receiving a notification that a modification has been made to the external table; and refreshing the metadata in response to the modification being made to the external table. 6. The method of claim 1 , further comprising refreshing, in response to a threshold number of modifications being made to the external table, the metadata to reflect the modifications. 7. The method of claim 1 , further comprising storing the metadata in a partition storage object on a shared storage platform associated with the database platform. 8. A database platform comprising: at least one hardware processor; and one or more non-transitory computer-readable storage media containing instructions that, when executed by the at least one hardware processor, cause the database platform to perform operations comprising: receiving a query comprising one or more predicates, the query directed at least to data in an external table that is stored in an external storage platform that is external to the database platform, the external table being based on a source directory of the external storage platform, the source directory comprising a hierarchical structure of storage locations in which data of the external table is stored in the external storage platform, the storage locations defining a plurality of partitions of the external table; identifying, based on metadata that summarizes the data in the external table, one or more partitions in the plurality of partitions of the external table to be scanned for data satisfying the one or more predicates; identifying, from the one or more identified partitions, data satisfying the one or more predicates; generating a materialized view over the external table; storing the materialized view in an internal storage platform that is internal to the database platform; and sending a response to the query to the client, the response comprising the data satisfying the one or more predicates, the response further comprising data from the stored materialized view. 9. The database platform of claim 8 , the operations further comprising: generating a query plan based on the query, the query plan comprising a plurality of discrete subtasks, the plurality of discrete subtasks comprising instructions to scan the identified one or more partitions of the external table for data satisfying the one or more predicates; and assigning, based on the metadata, the plurality of discrete subtasks to one or more nodes in an execution platform. 10. The database platform of claim 9 , wherein assigning, based on the metadata, the plurality of discrete subtasks further comprises: reading the metadata to determine whether a first partition is stored in a cache of any of a plurality of nodes of the execution platform; and assigning, in response to the first partition being stored in the cache of a first node of the execution platform, a task to the first node, the task instructing the first node to scan the first partition. 11. The database platform of claim 8 , wherein: the metadata further summarizes data in an internal table that is stored in an internal storage platform that is internal to the database platform; and the operations further comprise: identifying, based on the metadata, one or more partitions of the internal table to be scanned for data satisfying the one or more predicates; identifying, from the one or more identified partitions of the internal table, second data satisfying the one or more predicates; and including the second data in the response to the query. 12. The database platform of claim 8 , the operations further comprising: receiving a notification that a modification has been made to the external table; and refreshing the metadata in response to the modification being made to the external table. 13. The database platform of claim 8 , the operations further comprising refreshing, in response to a threshold number of modifications being made to the external table, the metadata to reflect the modifications. 14. The database platform of claim 8 , the operations further comprising storing the metadata in a partition storage object on a shared storage platform associated with the database platform. 15. One or more non-transitory computer-readable storage media containing instructions that, when executed by at least one hardware processor of a database platform, cause the database platform to perform operations comprising: receiving a query comprising one or more predicates, the query directed at least to data in an external table that is stored in an external storage platform that is external to the database platform, the external table being based on a source directory of the external storage platform, the source directory comprising a hierarchical stru

Assignees

Inventors

Classifications

  • Plan optimisation · CPC title

  • of sub-queries or views · CPC title

  • Interactive query statement specification based on a database schema · CPC title

  • by facilitating the interaction with a user or administrator · CPC title

  • Updating materialised views · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11675780B2 cover?
Disclosed herein are embodiments of systems and methods for partition-based scanning of external tables for query processing. In an example embodiment, a database platform receives a query that includes one or more predicates, where the query is directed at least to data in an external table that is stored in an external storage platform that is external to the database platform. The database p…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2423. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 13 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).