Scalable architecture for a distributed time-series database

US11989186B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11989186-B2
Application numberUS-201816199078-A
CountryUS
Kind codeB2
Filing dateNov 23, 2018
Priority dateNov 23, 2018
Publication dateMay 21, 2024
Grant dateMay 21, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer-readable media for a scalable architecture for a distributed time-series database are disclosed. Using a fleet of ingestion routers, time-series data generated by a plurality of client devices is stored into a plurality of durable partitions. The time-series data comprises a plurality of time series, and an amount of the ingestion routers is determined based at least in part on an ingestion rate of the time-series data. Using a fleet of stream processors, the time-series data from the durable partitions is stored into a plurality of storage tiers including a first storage tier and a second storage tier. A retention period for the first storage tier differs from a retention period for the second storage tier. An amount of the stream processors is determined based at least in part on the time-series data in the durable partitions.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: one or more computing devices comprising respective processors and memory configured to implement a control plane; a plurality of computing devices comprising respective processors and memory configured to implement a fleet of ingestion routers, wherein the fleet of ingestion routers is configured to: receive time-series data generated by a plurality of client devices, wherein the time-series data is associated with a plurality of time series, and wherein an amount of the ingestion routers is determined by the control plane based at least in part on an ingestion rate of the time-series data; and partition the time-series data based at least in part on the plurality of time series to generate partitioned time-series data; one or more persistent storage resources comprising a plurality of durable partitions, wherein the one or more persistent storage resources are configured to store individual partitions of the partitioned time-series data sent from the fleet of ingestion routers in respective ones of the plurality of durable partitions; a plurality of computing devices comprising respective processors and memory configured to implement a fleet of stream processors, wherein an amount of the stream processors in the fleet is determined by the control plane based at least in part on the partitioned time-series data in the durable partitions, wherein the fleet of stream processors is configured to: retrieve the time-series data, stored by the fleet of ingestion routers, from the durable partitions maintained at one or more persistent storage resources of the streaming service; send a first one or more elements of the retrieved time-series data to a first storage tier; and send a different second one or more elements of the retrieved time-series data to a second storage tier; and a plurality of storage tiers, including the first storage tier and the second storage tier, respectively different from the one or more persistent storage resources, wherein individual ones of the plurality of storage tiers are different from and communicatively coupled over a network to respective ones of the fleet of stream processors, wherein a retention period for the first storage tier differs from a retention period for the second storage tier, wherein a performance characteristic for the first storage tier differs from a performance characteristic for the second storage tier, and wherein the individual ones of the plurality of storage tiers are configured to store the retrieved time-series data sent from the fleet of stream processors; and a plurality of computing devices comprising respective processors and memory configured to implement a fleet of query processors configured to access time-series data stored in the first storage tier and the second storage tier, wherein individual ones of the fleet of query processors are each different from individual ones of the fleet of stream processors. 2. The system as recited in claim 1 , wherein the fleet of query processors is configured to: perform queries of the time-series data stored in the plurality of storage tiers, wherein an amount of the query processors is determined by the control plane based at least in part on the queries. 3. The system as recited in claim 1 , wherein an amount of the durable partitions is determined by the control plane based at least in part on the time-series data. 4. The system as recited in claim 1 , wherein an amount of storage resources in the first tier is determined by the control plane based at least in part on an amount of the time-series data within the retention period for the first storage tier, and wherein an amount of storage resources in the second tier is determined by the control plane based at least in part on an amount of the time-series data within the retention period for the second storage tier. 5. A method, comprising: storing, by a fleet of ingestion routers into a plurality of durable partitions maintained at one or more persistent storage resources, time-series data generated by a plurality of client devices, wherein the time-series data is associated with a plurality of time series, and wherein an amount of the ingestion routers is determined based at least in part on an ingestion rate of the time-series data; retrieving, by a fleet of stream processors, the time-series data from the durable partitions maintained at one or more persistent storage resources; storing, by the fleet of stream processors, the time-series data retrieved from the durable partitions into a plurality of storage tiers including a first storage tier and a second storage tier, wherein a first one or more elements of the retrieved time-series data is stored by the fleet of stream processors into the first storage tier and a different second one or more elements of the retrieved time-series data is stored by the fleet of stream processors into the second storage tier, wherein individual ones of the plurality of storage tiers are different from the one or more persistent storage resources from which the time-series data is retrieved, wherein the individual ones of the plurality of storage tiers are different from and communicatively coupled over a network to respective ones of the fleet of stream processors, wherein a retention period for the first storage tier differs from a retention period for the second storage tier, and wherein an amount of the stream processors is determined based at least in part on the time-series data in the durable partitions; and accessing, by a fleet of query processors, time-series data stored in first storage tier and the second storage tier, wherein individual ones of the fleet of query processors are each different from individual ones of the fleet of stream processors. 6. The method as recited in claim 5 , further comprising: performing, by a fleet of query processors, queries of the time-series data stored in the plurality of storage tiers, wherein an amount of the query processors is determined based at least in part on the queries. 7. The method as recited in claim 5 , wherein an amount of the durable partitions is determined based at least in part on the time-series data. 8. The method as recited in claim 5 , wherein an amount of storage resources in the first tier is determined based at least in part on an amount of the time-series data within the retention period for the first storage tier, and wherein an amount of storage resources in the second tier is determined based at least in part on an amount of the time-series data within the retention period for the second storage tier. 9. The method as recited in claim 5 , wherein a latency characteristic for the first storage tier differs from a latency characteristic for the second storage tier. 10. The method as recited in claim 5 , wherein the time-series data is partitioned into the durable partitions based at least in part on a hierarchy of the time series. 11. The method as recited in claim 5 , wherein the time-series data is stored in the first storage tier using a plurality of tiles, wherein the tiles are partitioned based at least in part on spatial boundaries and temporal boundaries. 12. The method as recited in claim 5 , further comprising: organizing, by the fleet of stream processors, the time-series data from the durable partitions into a plurality of tables, wherein the tables are stored in the plurality of storage tiers; and transforming, by the fleet of stream processors, the time-series data from the tables into a plurality of additional tables, wherein the additional tables are stored in the plurality of storage tiers. 13. The method as recited in

Assignees

Inventors

Classifications

  • Data stream processing; Continuous queries · CPC title

  • between a Database Management System and a front-end application · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title

  • Lifecycle management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11989186B2 cover?
Methods, systems, and computer-readable media for a scalable architecture for a distributed time-series database are disclosed. Using a fleet of ingestion routers, time-series data generated by a plurality of client devices is stored into a plurality of durable partitions. The time-series data comprises a plurality of time series, and an amount of the ingestion routers is determined based at le…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24568. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).