Selecting between hydration-based scanning and stateless scale-out scanning to improve query performance

US11593367B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11593367-B1
Application numberUS-202117489532-A
CountryUS
Kind codeB1
Filing dateSep 29, 2021
Priority dateSep 29, 2021
Publication dateFeb 28, 2023
Grant dateFeb 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

When a query is received by a stateful data processing service, the service determines, for each table scan (and associated operations) of a query, whether to select the table scan for execution by a stateless data processing service. The selected table scans are sent to the stateless data processing service for execution, and results are received by the stateful data processing service. The stateful data processing service may also execute other table scans of the query locally, against a local data cache. If the data is not present in the local data cache, then the stateful data processing service will copy the table data into the local data cache before executing the table scan. A query result based on the remote and/or local table scans may then be returned to the client.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: one or more processors; and one or more memories, wherein the one or more memories have stored thereon instructions, which when executed by the one or more processors of a provider network, cause the one or more processors to implement a stateful data processing service for a plurality of clients, wherein the stateful data processing service is configured to, for a given client of the plurality of clients: receive, from the client, a query for a database, wherein the query indicates at least a plurality of table scans and associated operations to be performed on a plurality of tables of the database; select, based on one or more criteria, one or more table scans and associated operations from among the plurality of table scans and associated operations; send an indication of the one or more table scans and associated operations to a stateless data processing service; and one or more other processors; and one or more other memories, wherein the one or more other memories have stored thereon instructions, which when executed by the one or more other processors of a provider network, cause the one or more other processors to implement a stateless data processing service for the plurality of clients, wherein the stateless data processing service is configured to, for the given client: receive, from the stateful data processing service, the indication of one or more table scans and associated operations; perform the one or more table scans and associated operations on respective tables of the database to generate one or more results; and send the one or more results to the stateful data processing service; and wherein the stateful data processing service is further configured to, for the given client: receive, from the stateless data processing service, the one or more results; copy, from the database to a data cache, at least a portion of one or more other tables of the database to be used for one or more other table scans and associated operations of the plurality of table scans and associated operations; perform the one or more other table scans and associated operations on the data cache to generate one or more other results; generate a query result based at least on the one or more results and the one or more other results; and send the query result to the client. 2. The system as recited in claim 1 , wherein to select, based on one or more criteria, the one or more table scans and associated operations from among the plurality of table scans and associated operations, the stateful data processing service is further configured to, for individual ones of the one or more table scans and associated operations: determine that an amount of time to perform the table scan and associated operations by the stateless data processing service will be less than an amount of time to perform the table scan and associated operations by the stateful data processing service. 3. The system as recited in claim 1 , wherein to select, based on one or more criteria, the one or more table scans and associated operations from among the plurality of table scans and associated operations, the stateful data processing service is further configured to, for individual ones of the one or more table scans and associated operations, determine one or more of: a size of a table to be scanned by the table scan is above a threshold size, or a number of requests to be made by the stateless data processing service to perform the table scan is above a threshold number. 4. The system as recited in claim 1 , wherein to select, based on one or more criteria, the one or more table scans and associated operations from among the plurality of table scans and associated operations, the stateful data processing service is further configured to, for individual ones of the one or more table scans and associated operations, determine that: no data of a table to be scanned by the table scan is stored by the stateful data processing service, or an amount of data of the table to be scanned by the table scan that is stored by the stateful data processing service is less than a threshold amount. 5. The system as recited in claim 1 , wherein to select, based on one or more criteria, the one or more table scans and associated operations from among the plurality of table scans and associated operations, the stateful data processing service is further configured to, for individual ones of the one or more table scans and associated operations, determine that: an amount of data returned by the table scan and associated operations is less than a threshold amount. 6. A method, comprising: performing, by a plurality of computing devices of a provider network: performing, by a stateful data processing service for a given client of the stateful data processing service: receiving, from the client, a query for a database, wherein the query indicates at least one or more table scans and associated operations to be performed on one or more tables of the database, and wherein at least a portion of the one or more other tables is stored by a data cache of the stateful data processing service; selecting, based on one or more criteria, at least one of the one or more table scans and associated operations; sending an indication of the at least one table scan and associated operations to a stateless data processing service; performing, by the stateless data processing service: receiving, from the stateful data processing service, the indication of the at least one table scan and associated operations; performing the at least one table scan and associated operations on respective tables of the database to generate one or more results; and sending the one or more results to the stateful data processing service; and performing, by the stateful data processing service: receiving, from the stateless data processing service, the one or more results; generating a query result based at least on the one or more results; sending the query result to the client. 7. The method as recited in claim 6 , further comprising performing at least one other data scan on the data cache to generate one or more other results, and wherein generating the query result comprises: generating the query result based at least on the one or more results and the one or more other results. 8. The method as recited in claim 6 , wherein selecting, based on one or more criteria, at least one of the one or more table scans and associated operations comprises: determining a size of a table to be scanned by the at least one table scan is above a threshold size; and determining a number of requests to be made by the stateless data processing service to perform the at least one table scan is above a threshold number. 9. The method as recited in claim 8 , wherein performing the at least one table scan and associated operations on respective tables of the database to generate one or more results comprises: performing the number of requests in parallel to perform the at least one table scan. 10. The method as recited in claim 6 , wherein selecting, based on one or more criteria, at least one of the one or more table scans and associated operations comprises: determining that an amount of data of the at least one table to be scanned by the table scan that is stored by the stateful data processing service is less than a threshold amount. 11. The method as recited in claim 6 , wherein selecting, based on one or more criteria, at least one of the one or more table scans and associated operations comprises: determining that a table used by the at least one table scan is not used by other table scans of the query.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11593367B1 cover?
When a query is received by a stateful data processing service, the service determines, for each table scan (and associated operations) of a query, whether to select the table scan for execution by a stateless data processing service. The selected table scans are sent to the stateless data processing service for execution, and results are received by the stateful data processing service. The st…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24537. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).