Checkpoints in batch file processing

US11461325B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11461325-B2
Application numberUS-202117333358-A
CountryUS
Kind codeB2
Filing dateMay 28, 2021
Priority dateJun 1, 2020
Publication dateOct 4, 2022
Grant dateOct 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure may provide a dynamic query execution model with fault tolerance and failure recovery techniques. Embodiments of the present disclosure may utilize checkpoints to map processed output files to their corresponding input files. Therefore, if an error occurs in processing one or more files, the system may only need to reschedule processing of selected file(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by one or more processors, an assignment of a batch from a query coordinator, the batch including a group of a set of files from a shared file queue; storing a unique batch ID and information related to the batch in a staging area; performing a first operator on the batch; based on performing the first operator, generating a checkpoint signal; determining a state condition of a second operator as being stateless; based on the stateless condition of the second operator, performing the second operator and passing the checkpoint signal to a third operator without implementing barriers between batches; determining a state condition of the third operator as being non-stateless; based on the non-stateless condition of the third operator, clearing data associated with the third operator, implementing barriers between batches, and performing the third operator; finalizing the stored information in the staging area; generating a materialized result file as an output file associated with the batch; uploading the materialized result file to a storage area from where the materialized result file is accessible to be scanned by the query coordinator; and uploading a listing of the materialized result to an output shared file queue. 2. The method of claim 1 , further comprising: generating a file registration request; transmitting the file registration request to the query coordinator including the unique batch ID indicating that the batch has been processed; and deleting the unique batch ID and information related to the batch stored in the staging area. 3. The method of claim 1 , wherein the first operator includes a table scan operator. 4. The method of claim 1 , wherein the third operator includes an insert operator. 5. The method of claim 1 , wherein the output file includes data corresponding only to the batch and no other batch. 6. A system comprising: one or more processors of a machine; and a memory storing instructions that, when executed by the one or more processors, cause the machine to perform operations comprising: receiving an assignment of a batch an assignment of a batch from a query coordinator, the batch including a group of a set of files from a shared file queue; storing a unique batch ID and information related to the batch in a staging area;; performing a first operator on the batch; based on performing the first operator, generating a checkpoint signal; determining a state condition of a second operator as being stateless; based on the stateless condition of the second operator, performing the second operator and passing the checkpoint signal to a third operator without implementing barriers between batches; determining a state condition of the third operator as being non-stateless; based on the non-stateless condition of the third operator, clearing data associated with the third operator, implementing barriers between batches, and performing the third operator; finalizing the stored information in the staging area; generating a materialized result file as an output file associated with the batch; uploading the materialized result file to a storage area from where the materialized result file is accessible to be scanned by the query coordinator; and uploading a listing of the materialized result to an output shared file queue. 7. The system of claim 6 , the operations further comprising: transmitting the output file associated with the batch; generating a file registration request; and deleting the unique batch ID and information related to the batch. 8. The system of claim 6 , wherein the first operator includes a table scan operator. 9. The system of claim 6 , wherein the third operator includes an insert operator. 10. The system of claim 6 , wherein the output file includes data corresponding only to the batch and no other batch. 11. A machine-storage medium embodying instructions that, when executed by a machine, cause the machine to perform operations comprising: receiving, by one or more processors, an assignment of a batch from a query coordinator, the batch including a group of a set of files from a shared file queue; storing a unique batch ID and information related to the batch in a staging area;; performing a first operator on the batch; based on performing the first operator, generating a checkpoint signal; determining a state condition of a second operator as being stateless; based on the stateless condition of the second operator, performing the second operator and passing the checkpoint signal to a third operator without implementing barriers between batches; determining a state condition of the third operator as being non-stateless; based on the non-stateless condition of the third operator, clearing data associated with the third operator, implementing barriers between batches, and performing the third operator; generating a materialized result file as an output file associated with the batch; uploading the materialized result file to a storage area from where the materialized result file is accessible to be scanned by the query coordinator; and uploading a listing of the materialized result to an output shared file queue. 12. The machine-storage medium of claim 11 , further comprising: transmitting the output file associated with the batch; generating a file registration request; and deleting the unique batch ID and information related to the batch. 13. The machine-storage medium of claim 11 , wherein the first operator includes a table scan operator. 14. The machine-storage medium of claim 11 , wherein the third operator includes an insert operator. 15. The machine-storage medium of claim 11 , wherein the output file includes data corresponding only to the batch and no other batch.

Assignees

Inventors

Classifications

  • using cached or materialised query results · CPC title

  • using management policies (point-in-time backing up or restoration of persistent data G06F11/1446; file migration policies for HSM systems G06F16/185) · CPC title

  • Logical partitioning of resources; Management or configuration of virtualized resources (specific details on emulation or internal functioning of virtual machines G06F9/455) · CPC title

  • Plan optimisation · CPC title

  • Task life-cycle, e.g. stopping, restarting, resuming execution (G06F9/4881 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11461325B2 cover?
Embodiments of the present disclosure may provide a dynamic query execution model with fault tolerance and failure recovery techniques. Embodiments of the present disclosure may utilize checkpoints to map processed output files to their corresponding input files. Therefore, if an error occurs in processing one or more files, the system may only need to reschedule processing of selected file(s).
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24542. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).