Systems and methods for efficiently querying external tables

US2024111762A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024111762-A1
Application numberUS-202318526666-A
CountryUS
Kind codeA1
Filing dateDec 1, 2023
Priority dateApr 16, 2019
Publication dateApr 4, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems and methods for efficiently querying external tables. In an embodiment, a database platform receives a query that is directed at least in part to external data in an external table stored on a data storage platform that is external to the database platform. The external table includes a plurality of partitions. The database platform identifies, from external-table metadata, a subset of the plurality of partitions of the external table as including data that potentially satisfies the query. The external-table metadata is stored by the database platform. The database platform identifies data that satisfies the query by scanning the identified subset of the partitions, and responds to the query at least in part with the identified data that satisfies the query.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: generating a source directory by identifying a plurality of partitions, each partition including external data in an external table stored on a data storage platform, the generating of the source directory including: identifying the plurality of partitions in the data storage platform; identifying folders and folder locations for individual partitions; and generating the source directory using the identified folders and folder locations; receiving a query for execution on the external data in the external table stored on the data storage platform external to a database platform, the external data distributed among the plurality of partitions, the plurality of partitions being organized in the external table based on information located in the source directory, the source directory defining the folders and the folder locations, the folders storing files corresponding to particular partitions; identifying at least a subset of the plurality of partitions for execution of the query; identifying data that satisfies the query by assessing data stored within the identified subset of the plurality of partitions at least partially based on application of the query to the source directory; and responding to the query at least in part with the identified data that satisfies the query. 2 . The method of claim 1 , wherein identifying the subset comprises identifying, from external table metadata, the subset, the external table metadata being stored by the database platform. 3 . The method of claim 2 , wherein the identifying of the subset comprises using the query to identify multiple instances of partition-grouping external-table metadata among the external-table metadata. 4 . The method of claim 3 , wherein each instance of partition-grouping external-table metadata comprising collective metadata regarding a distinct group of partitions in the plurality of partitions. 5 . The method of claim 4 , wherein the external-table metadata maps the plurality of partitions of the external table to storage locations in a source directory of the external data storage platform. 6 . The method of claim 4 , further comprising generating the external-table metadata based on a hierarchical structure of the storage locations in the source directory of the external data storage platform. 7 . The method of claim 4 , further comprising: receiving, from the external data storage platform, a notification of a modification having been made in a particular storage location in the source directory of the external data storage platform; and updating the external-table metadata to reflect the modification having been made in the particular storage location. 8 . The method of claim 7 , wherein the modification comprises one or more of a file having been added to the particular storage location, a file having been modified in the particular storage location, and a file having been deleted from the particular storage location. 9 . The method of claim 2 , further comprising refreshing the external-table metadata at threshold time periods. 10 . The method of claim 2 , further comprising refreshing the external-table metadata in response to a threshold number of modifications being made to the external data. 11 . The method of claim 2 , further comprising refreshing the external-table metadata in response to receiving a request to refresh the external-table metadata. 12 . The method of claim 1 , wherein: the database platform stores data in a first format; the external data includes data that is stored in the external data storage platform in at least one second format that is different from the first format; and the method further comprises converting, into the first format for storage at the database platform, the data that is stored in the at least one second format. 13 . The method of claim 1 , further comprising: generating a materialized view over the external table; and storing the generated materialized view. 14 . A database platform comprising: at least one processor; and one or more non-transitory computer readable storage media containing instructions that, when executed by the at least one processor, cause the database platform to perform operations comprising: generating a source directory by identifying a plurality of partitions, each partition including external data in an external table stored on a data storage platform, the generating of the source directory including: identifying the plurality of partitions in the data storage platform; identifying folders and folder locations for individual partitions; and generating the source directory using the identified folders and folder locations; receiving a query for execution on the external data in the external table stored on the data storage platform external to a database platform, the external data distributed among the plurality of partitions, the plurality of partitions being organized in the external table based on information located in the source directory, the source directory defining the folders and the folder locations, the folders storing files corresponding to particular partitions; identifying at least a subset of the plurality of partitions for execution of the query; identifying data that satisfies the query by assessing data stored within the identified subset of the plurality of partitions at least partially based on application of the query to the source directory; and responding to the query at least in part with the identified data that satisfies the query. 15 . The database platform of claim 14 , wherein identifying the subset comprises identifying, from external table metadata, the subset, the external table metadata being stored by the database platform. 16 . The database platform of claim 15 , wherein the identifying of the subset comprises using the query to identify multiple instances of partition-grouping external-table metadata among the external-table metadata. 17 . The database platform of claim 16 , wherein each instance of partition-grouping external-table metadata comprising collective metadata regarding a distinct group of partitions in the plurality of partitions. 18 . The database platform of claim 17 , wherein the external-table metadata maps the plurality of partitions of the external table to storage locations in a source directory of the external data storage platform. 19 . The database platform of claim 17 , further comprising generating the external-table metadata based on a hierarchical structure of the storage locations in the source directory of the external data storage platform. 20 . One or more non-transitory computer readable storage media containing instructions that, when executed by at least one hardware processor of a database platform, cause the database platform to perform operations comprising: generating a source directory by identifying a plurality of partitions, each partition including external data in an external table stored on a data storage platform, the generating of the source directory including: identifying the plurality of partitions in the data storage platform; identifying folders and folder locations for individual partitions; and generating the source directory using the identified folders and folder locations; receiving a query for execution on the external data in the external table stored on the data storage platform external to a database platform, the external data distributed among the plurality of p

Assignees

Inventors

Classifications

  • Interactive query statement specification based on a database schema · CPC title

  • by facilitating the interaction with a user or administrator · CPC title

  • Management of space entities, e.g. partitions, extents, pools · CPC title

  • Monitoring storage devices or systems · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024111762A1 cover?
Disclosed herein are systems and methods for efficiently querying external tables. In an embodiment, a database platform receives a query that is directed at least in part to external data in an external table stored on a data storage platform that is external to the database platform. The external table includes a plurality of partitions. The database platform identifies, from external-table m…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2423. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 04 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).