Dynamic partitioning techniques for data streams

US9720989B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9720989-B2
Application numberUS-201314077171-A
CountryUS
Kind codeB2
Filing dateNov 11, 2013
Priority dateNov 11, 2013
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A partitioning policy, comprising an indication of an initial mapping of data records of a stream to a plurality of partitions, is selected to distribute data records of a data stream among a plurality of nodes of a stream management service. Data ingestion nodes and storage nodes are configured according to the initial mapping. In response to a determination that a triggering criterion for dynamically repartitioning the data stream has been met, a modified mapping is generated, and a different set of ingestion and storage nodes are configured. For at least some time during which arriving data records are stored in accordance with the modified mapping, data records stored at the first set of storage nodes in accordance with the initial mapping are retained.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: one or more computing devices comprising one or more respective hardware processors and memory and configured to: determine a partitioning policy to be applied to distribute data records of a data stream among a plurality of nodes of a multi-tenant stream management service, wherein the partitioning policy comprises an initial mapping of data records to a plurality of partitions based at least in part on one or more attribute values associated with the data records; identify, using the initial mapping, a first partition of which a particular data record of the data stream is to be designated a member, based at least in part on a particular attribute value; generate, corresponding to the particular data record, a sequence number indicative of a position of the particular data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected based at least in part on the initial mapping; store a plurality of data records of the first partition at a data storage location of the stream management service in an order based at least in part on respective sequence numbers of the plurality of data records, wherein the data storage location is selected based at least in part on the initial mapping; and in response to a determination that a triggering criterion for repartitioning the data stream has been met, generate a modified mapping of data records to partitions, initiate usage of the modified mapping without interrupting a flow of data record acquisitions of the data stream; and select, for another data record with the particular attribute value, wherein the other data record is received subsequent to an initiation of usage of the modified mapping, at least one of: (a) a different ingestion node of the stream management service or (b) a different data storage location of the stream management service. 2. The system as recited in claim 1 , wherein the sequence number comprises an indication of (a) a timestamp associated with ingestion of the particular data record, and (b) an additional subsequence value. 3. The system as recited in claim 2 , wherein the one or more computing devices are further configured to: select an initial timestamp value to be used for sequence numbers of data records to be mapped using the modified mapping; in response to a data record retrieval request indicating a particular sequence number, in response to a determination that a value of a particular timestamp indicated by the particular sequence number is lower than the initial timestamp value, utilize the initial mapping to retrieve one or more data records; and in response to a determination that the value of the particular timestamp is not lower than the initial timestamp value, utilize the modified mapping to retrieve one or more data records. 4. The system as recited in claim 1 , wherein the triggering criterion comprises one or more of: (a) a detection of an overload condition, (b) a detection of a workload imbalance, (c) a client request for repartitioning, (d) a determination of a change to a data durability requirement of the data stream, (e) a determination of a schedule of a software version change, (f) a detection of a change to a usage pattern of the data stream, (g) a determination of a pricing impact of repartitioning the data stream, or (h) a determination of a performance target associated with the data stream. 5. The system as recited in claim 1 , wherein the one or more computing devices are further configured to: receive a client request indicating one or more partitioning criteria to be used for the data stream; and generate the initial mapping based at least in part on the client request. 6. A method, comprising: performing, by one or more computing devices of a stream management service: determining an initial mapping of data records of a data stream to a plurality of partitions based at least in part on one or more attribute values of the data records; identifying, using the initial mapping, a first partition of which a particular data record of the data stream is to be designated a member, based at least in part on a particular attribute value; storing the particular data record at a storage location selected based at least in part on the initial mapping; and in response to determining that a triggering criterion has been met, generating a modified mapping of data records to partitions, and selecting, for another data record with the particular attribute value and without interrupting a flow of the data records of the data stream, wherein the other data record is received subsequent to an initiation of usage of the modified mapping, a different storage location. 7. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: executing, on the particular data record prior to said determining that the triggering criterion has been met, a processing operation at a worker node selected based at least in part on the initial mapping; and executing, on a different data record with the particular attribute value, subsequent to said determining that the triggering criterion has been met, the processing operation at a different worker node selected based at least in part on the modified mapping. 8. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: generating, corresponding to the particular data record, a sequence number indicative of a position of the particular data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected based at least in part on the initial mapping; and storing data records of the first partition in an order corresponding to their sequence numbers. 9. The method as recited in claim 8 , wherein the sequence number comprises an indication of (a) a timestamp associated with ingestion of the particular data record, and (b) an additional subsequence value. 10. The method as recited in claim 9 , wherein the timestamp is indicative of a clock time at which the particular data record was ingested, wherein the method further comprises performing, by the one or more computing devices: in response to a retrieval request requesting one or more data records to be retrieved based at least in part on a specified record ingestion time range, using sequence numbers associated with the one or more data records as index keys to retrieve the one or more data records. 11. The method as recited in claim 9 , further comprising performing, by the one or more computing devices: selecting an initial timestamp value to be used for sequence numbers of data records to be mapped using the modified mapping; in response to receiving a data record retrieval request indicating a particular sequence number, in response to determining that a value of a particular timestamp indicated by the particular sequence number is lower than the initial timestamp value, utilizing the initial mapping to retrieve one or more data records; and in response to determining that the value of the particular timestamp is not lower than the initial timestamp value, utilizing the modified mapping to retrieve one or more data records. 12. The method as recited in claim 6 , wherein said modified mapping uses at least one additional attribute value to determine the partition of a data record. 13. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: receiving a client request indicating one or more partitioning criteria to be use

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Information retrieval; Database structures therefor; File system structures therefor · CPC title

  • G06F16/258Primary

    Data format conversion from or to a database · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9720989B2 cover?
A partitioning policy, comprising an indication of an initial mapping of data records of a stream to a plurality of partitions, is selected to distribute data records of a data stream among a plurality of nodes of a stream management service. Data ingestion nodes and storage nodes are configured according to the initial mapping. In response to a determination that a triggering criterion for dyn…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30569. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).