Architecture for a transparently-scalable, ultra-high-throughput storage network
US-2016210061-A1 · Jul 21, 2016 · US
US11599293B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599293-B2 |
| Application number | US-202017070029-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 14, 2020 |
| Priority date | Oct 14, 2020 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The described technology is generally directed towards consistently replicating and reconstructing a data stream made up of a dynamic set of (ordered) segments into a different location (e.g., cluster) from the one in which the stream was created. The technology facilitates consistently and generally continuously and replicating a stream of events ingested in a source cluster to a target cluster for consumption (reading). As stream data segments are replicated to a target cluster by a replicator which is not guaranteed to keep the replicated data consistent, a target controller reconstructs the replicated data stream up to a stream cut point at which the replicated data has been sufficiently replicated so as to be consistent. Reading of the replicated data stream is limited to a view up to the stream cut point; as more data is replicated, additional data up to a later stream cut point becomes available for reading.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, the operations comprising: obtaining part of a data stream at a target cluster of a streaming data storage system via replication from a source cluster of the streaming data storage system, the data stream comprising events maintained in segments of the data stream, wherein the segments are parallel segments separating the data stream according to routing key values; monitoring stream cut data, the stream cut data comprising stream cuts made to the data stream, wherein the stream cuts comprise respective groups of segment identifiers that identify respective segments of respective groups of segments of the stream cuts and respective segment lengths of the respective segments of the respective groups of segments of the stream cuts, and wherein the stream cut data is stored via a stream cut segment of the data stream that is parallel to a segment of the segments; determining, from the stream cut data and current lengths of the segments obtained via the replication, a most recent stream cut point to which the segments have been fully replicated; and limiting reading of the segments to events before the most recent stream cut point to which the segments have been fully replicated. 2. The system of claim 1 , wherein the determining the most recent stream cut point to which the segments have been fully replicated comprises comparing the current lengths of the segments obtained via the replication with the respective segment lengths of the respective segments of the respective groups of segments of the stream cuts. 3. The system of claim 1 , wherein the determining the most recent stream cut point to which the segments have been fully replicated comprises determining whether the segments obtained via the replication exist relative to the respective segments, in the respective groups of segments of the stream cuts, identified by the respective groups of segment identifiers. 4. The system of claim 1 , wherein the operations further comprise, at the target cluster, registering the data stream as a replicated data stream. 5. The system of claim 1 , wherein operations further comprise, determining whether a stream cut of the stream cuts is associated with a scale event that changes segment relationship data, and in response to determining that the stream cut is associated with the scale event, updating relationship metadata maintained at the target cluster. 6. The system of claim 1 , wherein operations further comprise, for identified segments that are identified in a most recent stream cut corresponding to the most recent stream cut point, updating corresponding segment length metadata in a target segment data store based on segment offsets corresponding to the identified segments in the most recent stream cut. 7. The system of claim 6 , wherein the limiting the reading of the segments to events before the most recent stream cut point comprises limiting a read request based on the corresponding segment length metadata in the target segment data store. 8. The system of claim 1 , wherein the monitoring of the stream cut data comprises monitoring for a replication stream cut generated at the source cluster and replicated to the target cluster. 9. The system of claim 1 , wherein the monitoring of the stream cut data of comprises monitoring for a stream cut of the stream cuts that was generated at the source cluster in response to a scale event. 10. The system of claim 9 , wherein the stream cut is stored as an event comprised in the stream cut segment. 11. A method, comprising: obtaining, via a processor of a target cluster, replicated segments comprising streamed data of a data stream from a source cluster; obtaining a stream cut comprising identifiers for respective segments of the data stream and offset values representing lengths of the respective segments relative to the stream cut wherein the segments are parallel segments separating the data stream according to routing key values, and wherein the stream cut is comprised in an event of a stream cut segment that is a parallel segment to the parallel segments of the data stream; determining whether current lengths of the replicated segments are greater than or equal to corresponding lengths represented by the offset values in the stream cut; and in response to determining that the current lengths of the replicated segments are greater than or equal to the offset values in the stream cut, updating target offset data of the replicated segments to match the offset values in the stream cut, resulting in updated target offset data, and allowing reading of the streamed data from the replicated segments up to locations in the replicated segments represented by the updated target offset data. 12. The method of claim 11 , wherein the locations are first locations in the replicated segments, and further comprising, in response to determining that the current lengths of the replicated segments are less than the offset values in the stream cut, allowing the reading of the streamed data from the replicated segments up to second locations in the replicated segments represented by earlier target offset data that is based on offset values in an earlier stream cut prior to the stream cut in the data stream. 13. The method of claim 11 , wherein the obtaining the stream cut comprises monitoring for modifications to a stream cut data structure maintained via the stream cut segment of the data stream as replicated to the target cluster. 14. The method of claim 13 , wherein the monitoring for the modifications to the stream cut data structure comprises detecting a replication stream cut generated at the source cluster and replicated to the target cluster. 15. The method of claim 13 , wherein the monitoring for the modifications to the stream cut data structure comprises detecting a stream cut generated at the source cluster and replicated to the target cluster in response to a scale event. 16. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor of a streaming data storage system, facilitate performance of operations, the operations comprising: receiving, at a target cluster, segment data of segments of a data stream being replicated from a source cluster, wherein the segment data of the segments is not guaranteed to be consistent during replication, and wherein the segments of the data stream are parallel segments separating the data stream according to routing key values; determining a selected stream cut point at which the segment data replicated from the source cluster is consistent among the segments, wherein the stream cut point is comprised in a stream cut event of a stream cut segment that is a parallel segment to the parallel segments of the data stream; and presenting a view of the data stream to a reader of the target cluster, in which the view is limited to the segment data before the selected stream cut point. 17. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise monitoring for a change to replicated stream cut data. 18. The non-transitory machine-readable medium of claim 16 , wherein the selected stream cut point is an existing prior selected stream cut point, and wherein the operations further comprise updating the selected stream cut point to a new selected stream cut point upon determining that the segment data replicated from the sou
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Replication mechanisms · CPC title
Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title
Plurality of storage devices · CPC title
Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.