Caching systems and methods

US11809451B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11809451-B2
Application numberUS-201414518971-A
CountryUS
Kind codeB2
Filing dateOct 20, 2014
Priority dateFeb 19, 2014
Publication dateNov 7, 2023
Grant dateNov 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Example caching systems and methods are described. In one implementation, a method identifies multiple files used to process a query and distributes each of the multiple files to a particular execution node to execute the query. Each execution node determines whether the distributed file is stored in the execution node's cache. If the execution node determines that the file is stored in the cache, it processes the query using the cached file. If the file is not stored in the cache, the execution node retrieves the file from a remote storage device, stores the file in the execution node's cache, and processes the query using the file.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: receiving a query directed to database data stored across a plurality of shared storage devices; referencing a metadata store to locate a set of files that comprises data that needs to be processed to respond to the query; referencing the metadata store to determine whether the set of files is cached among execution nodes of an execution platform comprising a plurality of execution nodes, wherein the execution platform is separate from the metadata store and the plurality of shared storage devices; in response to determining that at least a portion of the set of files is cached among the plurality of execution nodes, assigning by one or more processors, processing of one or more of the set of files to each of one or more execution nodes that have cached at least a portion of the set of files; for each of the one or more assigned execution nodes: determining, by the assigned execution node, whether the assigned one or more files is stored at least in part in a cache of the assigned execution node; and in response to the assigned execution node determining the assigned one or more files is not entirely stored in the cache of the assigned execution node: retrieving a missing portion of the assigned one or more files from one or more remote storage devices of the plurality of remote storage devices including the missing portion of the assigned one or more files, wherein the plurality of execution nodes are organized into one or more virtual warehouses having one or more logical mappings between them, and a virtual warehouse including the assigned execution node dynamically establishes a communication link with each of the one or more of the plurality of remote storage devices based at least in part on the query so that the assigned execution node may retrieve the missing portion; storing, by the assigned execution node, the missing portion of the assigned one or more files in the cache of the assigned execution node so that the entire one or more files is stored in the cache of the assigned execution node, wherein a size and composition of the cache is adjusted to accommodate the missing portion of the assigned one or more files; processing the query using the assigned one or more files stored in the cache of the assigned execution node; and updating the metadata store to indicate the entire assigned one or more files is now cached in the cache of the assigned execution node; wherein any of the set of files stored in the plurality of shared storage devices may be accessed by any of a plurality of execution nodes of the execution platform; wherein any of the set of files stored in the plurality of shared storage devices may be stored in a cache of any of the plurality of execution nodes of the execution platform; and wherein any of the set of files stored in the plurality of shared storage devices may be stored in a cache of multiple execution nodes of the plurality of execution nodes of the execution platform at one point in time; and in response to a determination of a change in the number of execution nodes of the execution platform, wherein the change is creating a new execution node, the new execution node comprising a plurality of processors, wherein the cache varies among the plurality of processors, wherein a first subset of the plurality of processors comprises a minimal cache and a second subset of the plurality of processors comprises a cache providing faster input-output operations, reassign processing, among the changed number of execution nodes of the execution platform, of the set of files comprising data that needs to be processed to respond to the query. 2. The method of claim 1 , further comprising: in response to the assigned execution node determining the assigned one or more files is entirely stored in the cache of the assigned execution node, processing, using one or more processors of the assigned execution node, the query using the assigned one or more files stored in the cache of the assigned execution node. 3. The method of claim 1 , wherein updating the metadata store to indicate the entire assigned one or more files is now cached in the assigned execution node comprises updating the metadata store to identify all files that are duplicated in the cache of the assigned execution node. 4. The method of claim 1 , further comprising determining, by the assigned execution node, whether to store the assigned one or more files in faster or slower memory by implementing a least recently used (LRU) algorithm. 5. The method of claim 4 , wherein implementing the LRU algorithm further comprises identifying one or more copies of the assigned one or more files to be removed from the cache. 6. The method of claim 1 , wherein the metadata store is separate and independently scalable from each of the resource manager, the plurality of shared storage devices, and the execution platform, and wherein the metadata store comprises a complete metadata listing of the database data stored across the plurality of shared storage devices and a complete listing of files cached in the plurality execution nodes of the execution platform. 7. The method of claim 1 , wherein each execution node of the execution platform comprises a cache, wherein the cache includes a first storage portion and a second storage portion, wherein the first storage portion is significantly faster than the second storage portion. 8. The method of claim 1 , wherein the query directed to the database data comprises a single instruction that is applied by the execution platform to each of the set of files substantially simultaneously. 9. The method of claim 1 , wherein each execution node of the plurality of execution nodes comprises at least one processor and at least one local cache caching a copy of at least a portion of the database data. 10. The method of claim 1 , further comprising: in response to the assigned execution node determining the assigned one or more files is not stored in the cache of the assigned execution node, modifying, by the assigned execution node a database data structure of the retrieved copy of the assigned one or more files prior to storing the retrieved copy in the cache. 11. The method of claim 10 , wherein modifying the database data structure of the retrieved copy includes decrypting the retrieved copy. 12. The method of claim 10 , wherein modifying the database data structure of the retrieved copy includes decompressing the retrieved copy. 13. A system comprising: a plurality of shared storage devices collectively storing database data; a metadata store separate from the plurality of shared storage devices, the metadata store comprising metadata for the database data stored across the plurality of shared storage devices; and one or more processors operatively coupled to the metadata store, the one or more processors to: receive a query directed to the database data stored across the plurality of shared storage devices reference the metadata store to locate a set of files that comprises data that needs to be processed to respond to the query; reference the metadata store to determine whether the set of files is cached among execution nodes of an execution platform comprising a plurality of execution nodes, wherein the execution platform is separate from the metadata store and the plurality of shared storage devices; and in response to determining that at least a portion of the set of files is cached among the plurality of execution nodes, assigning processing of one or more of the set of files to each of one or more execution nodes that have cached at least a portion of the set of files; for

Assignees

Inventors

Classifications

  • G06F16/273Primary

    Asynchronous replication or reconciliation · CPC title

  • Intra-oral devices · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • G06F9/5016Primary

    the resource being the memory · CPC title

  • considering hardware capabilities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11809451B2 cover?
Example caching systems and methods are described. In one implementation, a method identifies multiple files used to process a query and distributes each of the multiple files to a particular execution node to execute the query. Each execution node determines whether the distributed file is stored in the execution node's cache. If the execution node determines that the file is stored in the cac…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/273. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).