Flexible and secure transformation of data using stream pipes
US-2015372807-A1 · Dec 24, 2015 · US
US9720989B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9720989-B2 |
| Application number | US-201314077171-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 11, 2013 |
| Priority date | Nov 11, 2013 |
| Publication date | Aug 1, 2017 |
| Grant date | Aug 1, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A partitioning policy, comprising an indication of an initial mapping of data records of a stream to a plurality of partitions, is selected to distribute data records of a data stream among a plurality of nodes of a stream management service. Data ingestion nodes and storage nodes are configured according to the initial mapping. In response to a determination that a triggering criterion for dynamically repartitioning the data stream has been met, a modified mapping is generated, and a different set of ingestion and storage nodes are configured. For at least some time during which arriving data records are stored in accordance with the modified mapping, data records stored at the first set of storage nodes in accordance with the initial mapping are retained.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: one or more computing devices comprising one or more respective hardware processors and memory and configured to: determine a partitioning policy to be applied to distribute data records of a data stream among a plurality of nodes of a multi-tenant stream management service, wherein the partitioning policy comprises an initial mapping of data records to a plurality of partitions based at least in part on one or more attribute values associated with the data records; identify, using the initial mapping, a first partition of which a particular data record of the data stream is to be designated a member, based at least in part on a particular attribute value; generate, corresponding to the particular data record, a sequence number indicative of a position of the particular data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected based at least in part on the initial mapping; store a plurality of data records of the first partition at a data storage location of the stream management service in an order based at least in part on respective sequence numbers of the plurality of data records, wherein the data storage location is selected based at least in part on the initial mapping; and in response to a determination that a triggering criterion for repartitioning the data stream has been met, generate a modified mapping of data records to partitions, initiate usage of the modified mapping without interrupting a flow of data record acquisitions of the data stream; and select, for another data record with the particular attribute value, wherein the other data record is received subsequent to an initiation of usage of the modified mapping, at least one of: (a) a different ingestion node of the stream management service or (b) a different data storage location of the stream management service. 2. The system as recited in claim 1 , wherein the sequence number comprises an indication of (a) a timestamp associated with ingestion of the particular data record, and (b) an additional subsequence value. 3. The system as recited in claim 2 , wherein the one or more computing devices are further configured to: select an initial timestamp value to be used for sequence numbers of data records to be mapped using the modified mapping; in response to a data record retrieval request indicating a particular sequence number, in response to a determination that a value of a particular timestamp indicated by the particular sequence number is lower than the initial timestamp value, utilize the initial mapping to retrieve one or more data records; and in response to a determination that the value of the particular timestamp is not lower than the initial timestamp value, utilize the modified mapping to retrieve one or more data records. 4. The system as recited in claim 1 , wherein the triggering criterion comprises one or more of: (a) a detection of an overload condition, (b) a detection of a workload imbalance, (c) a client request for repartitioning, (d) a determination of a change to a data durability requirement of the data stream, (e) a determination of a schedule of a software version change, (f) a detection of a change to a usage pattern of the data stream, (g) a determination of a pricing impact of repartitioning the data stream, or (h) a determination of a performance target associated with the data stream. 5. The system as recited in claim 1 , wherein the one or more computing devices are further configured to: receive a client request indicating one or more partitioning criteria to be used for the data stream; and generate the initial mapping based at least in part on the client request. 6. A method, comprising: performing, by one or more computing devices of a stream management service: determining an initial mapping of data records of a data stream to a plurality of partitions based at least in part on one or more attribute values of the data records; identifying, using the initial mapping, a first partition of which a particular data record of the data stream is to be designated a member, based at least in part on a particular attribute value; storing the particular data record at a storage location selected based at least in part on the initial mapping; and in response to determining that a triggering criterion has been met, generating a modified mapping of data records to partitions, and selecting, for another data record with the particular attribute value and without interrupting a flow of the data records of the data stream, wherein the other data record is received subsequent to an initiation of usage of the modified mapping, a different storage location. 7. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: executing, on the particular data record prior to said determining that the triggering criterion has been met, a processing operation at a worker node selected based at least in part on the initial mapping; and executing, on a different data record with the particular attribute value, subsequent to said determining that the triggering criterion has been met, the processing operation at a different worker node selected based at least in part on the modified mapping. 8. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: generating, corresponding to the particular data record, a sequence number indicative of a position of the particular data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected based at least in part on the initial mapping; and storing data records of the first partition in an order corresponding to their sequence numbers. 9. The method as recited in claim 8 , wherein the sequence number comprises an indication of (a) a timestamp associated with ingestion of the particular data record, and (b) an additional subsequence value. 10. The method as recited in claim 9 , wherein the timestamp is indicative of a clock time at which the particular data record was ingested, wherein the method further comprises performing, by the one or more computing devices: in response to a retrieval request requesting one or more data records to be retrieved based at least in part on a specified record ingestion time range, using sequence numbers associated with the one or more data records as index keys to retrieve the one or more data records. 11. The method as recited in claim 9 , further comprising performing, by the one or more computing devices: selecting an initial timestamp value to be used for sequence numbers of data records to be mapped using the modified mapping; in response to receiving a data record retrieval request indicating a particular sequence number, in response to determining that a value of a particular timestamp indicated by the particular sequence number is lower than the initial timestamp value, utilizing the initial mapping to retrieve one or more data records; and in response to determining that the value of the particular timestamp is not lower than the initial timestamp value, utilizing the modified mapping to retrieve one or more data records. 12. The method as recited in claim 6 , wherein said modified mapping uses at least one additional attribute value to determine the partition of a data record. 13. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: receiving a client request indicating one or more partitioning criteria to be use
Physics · mapped topic
Physics · mapped topic
Information retrieval; Database structures therefor; File system structures therefor · CPC title
Data format conversion from or to a database · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.