Consistent data stream replication and reconstruction in a streaming data storage platform

US11599293B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11599293-B2
Application numberUS-202017070029-A
CountryUS
Kind codeB2
Filing dateOct 14, 2020
Priority dateOct 14, 2020
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The described technology is generally directed towards consistently replicating and reconstructing a data stream made up of a dynamic set of (ordered) segments into a different location (e.g., cluster) from the one in which the stream was created. The technology facilitates consistently and generally continuously and replicating a stream of events ingested in a source cluster to a target cluster for consumption (reading). As stream data segments are replicated to a target cluster by a replicator which is not guaranteed to keep the replicated data consistent, a target controller reconstructs the replicated data stream up to a stream cut point at which the replicated data has been sufficiently replicated so as to be consistent. Reading of the replicated data stream is limited to a view up to the stream cut point; as more data is replicated, additional data up to a later stream cut point becomes available for reading.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, the operations comprising: obtaining part of a data stream at a target cluster of a streaming data storage system via replication from a source cluster of the streaming data storage system, the data stream comprising events maintained in segments of the data stream, wherein the segments are parallel segments separating the data stream according to routing key values; monitoring stream cut data, the stream cut data comprising stream cuts made to the data stream, wherein the stream cuts comprise respective groups of segment identifiers that identify respective segments of respective groups of segments of the stream cuts and respective segment lengths of the respective segments of the respective groups of segments of the stream cuts, and wherein the stream cut data is stored via a stream cut segment of the data stream that is parallel to a segment of the segments; determining, from the stream cut data and current lengths of the segments obtained via the replication, a most recent stream cut point to which the segments have been fully replicated; and limiting reading of the segments to events before the most recent stream cut point to which the segments have been fully replicated. 2. The system of claim 1 , wherein the determining the most recent stream cut point to which the segments have been fully replicated comprises comparing the current lengths of the segments obtained via the replication with the respective segment lengths of the respective segments of the respective groups of segments of the stream cuts. 3. The system of claim 1 , wherein the determining the most recent stream cut point to which the segments have been fully replicated comprises determining whether the segments obtained via the replication exist relative to the respective segments, in the respective groups of segments of the stream cuts, identified by the respective groups of segment identifiers. 4. The system of claim 1 , wherein the operations further comprise, at the target cluster, registering the data stream as a replicated data stream. 5. The system of claim 1 , wherein operations further comprise, determining whether a stream cut of the stream cuts is associated with a scale event that changes segment relationship data, and in response to determining that the stream cut is associated with the scale event, updating relationship metadata maintained at the target cluster. 6. The system of claim 1 , wherein operations further comprise, for identified segments that are identified in a most recent stream cut corresponding to the most recent stream cut point, updating corresponding segment length metadata in a target segment data store based on segment offsets corresponding to the identified segments in the most recent stream cut. 7. The system of claim 6 , wherein the limiting the reading of the segments to events before the most recent stream cut point comprises limiting a read request based on the corresponding segment length metadata in the target segment data store. 8. The system of claim 1 , wherein the monitoring of the stream cut data comprises monitoring for a replication stream cut generated at the source cluster and replicated to the target cluster. 9. The system of claim 1 , wherein the monitoring of the stream cut data of comprises monitoring for a stream cut of the stream cuts that was generated at the source cluster in response to a scale event. 10. The system of claim 9 , wherein the stream cut is stored as an event comprised in the stream cut segment. 11. A method, comprising: obtaining, via a processor of a target cluster, replicated segments comprising streamed data of a data stream from a source cluster; obtaining a stream cut comprising identifiers for respective segments of the data stream and offset values representing lengths of the respective segments relative to the stream cut wherein the segments are parallel segments separating the data stream according to routing key values, and wherein the stream cut is comprised in an event of a stream cut segment that is a parallel segment to the parallel segments of the data stream; determining whether current lengths of the replicated segments are greater than or equal to corresponding lengths represented by the offset values in the stream cut; and in response to determining that the current lengths of the replicated segments are greater than or equal to the offset values in the stream cut, updating target offset data of the replicated segments to match the offset values in the stream cut, resulting in updated target offset data, and allowing reading of the streamed data from the replicated segments up to locations in the replicated segments represented by the updated target offset data. 12. The method of claim 11 , wherein the locations are first locations in the replicated segments, and further comprising, in response to determining that the current lengths of the replicated segments are less than the offset values in the stream cut, allowing the reading of the streamed data from the replicated segments up to second locations in the replicated segments represented by earlier target offset data that is based on offset values in an earlier stream cut prior to the stream cut in the data stream. 13. The method of claim 11 , wherein the obtaining the stream cut comprises monitoring for modifications to a stream cut data structure maintained via the stream cut segment of the data stream as replicated to the target cluster. 14. The method of claim 13 , wherein the monitoring for the modifications to the stream cut data structure comprises detecting a replication stream cut generated at the source cluster and replicated to the target cluster. 15. The method of claim 13 , wherein the monitoring for the modifications to the stream cut data structure comprises detecting a stream cut generated at the source cluster and replicated to the target cluster in response to a scale event. 16. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor of a streaming data storage system, facilitate performance of operations, the operations comprising: receiving, at a target cluster, segment data of segments of a data stream being replicated from a source cluster, wherein the segment data of the segments is not guaranteed to be consistent during replication, and wherein the segments of the data stream are parallel segments separating the data stream according to routing key values; determining a selected stream cut point at which the segment data replicated from the source cluster is consistent among the segments, wherein the stream cut point is comprised in a stream cut event of a stream cut segment that is a parallel segment to the parallel segments of the data stream; and presenting a view of the data stream to a reader of the target cluster, in which the view is limited to the segment data before the selected stream cut point. 17. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise monitoring for a change to replicated stream cut data. 18. The non-transitory machine-readable medium of claim 16 , wherein the selected stream cut point is an existing prior selected stream cut point, and wherein the operations further comprise updating the selected stream cut point to a new selected stream cut point upon determining that the segment data replicated from the sou

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/065Primary

    Replication mechanisms · CPC title

  • G06F3/0685Primary

    Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title

  • Plurality of storage devices · CPC title

  • Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599293B2 cover?
The described technology is generally directed towards consistently replicating and reconstructing a data stream made up of a dynamic set of (ordered) segments into a different location (e.g., cluster) from the one in which the stream was created. The technology facilitates consistently and generally continuously and replicating a stream of events ingested in a source cluster to a target cluste…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).