Geographically diverse data storage system employing a replication tree

US11681677B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11681677-B2
Application numberUS-202016803923-A
CountryUS
Kind codeB2
Filing dateFeb 27, 2020
Priority dateFeb 27, 2020
Publication dateJun 20, 2023
Grant dateJun 20, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A geographically diverse data storage system that can protect data via replication of data among relevant zones according to a determined replication topology is disclosed. The replication topology can be determined based on replication times between the relevant zones. In an aspect, a tree topology can provide advantages over a star topography. In an embodiment, a tree topology can be generated, or an existing topology can be modified, via selection of a next replication task(s) based on the replication times. In an aspect, the replication times can be determined from measurable characteristics of the geographically diverse data storage system. In some embodiments, the replications times can be based on historical measurements, time limited historical measurements, inferences from machine learning, etc. A determined topology can be ranked relative to other viable topologies based on criteria such as speed, monetary cost, computing resource usage, etc. Accordingly, a selected topology, or selected modification to a topology, can provide for improved replication that can provide protection for data stored in the geographically diverse data storage system.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, comprising: receiving an indication of replication times between pairs of zones comprised in a geographically diverse data storage system comprising a first zone, a second zone, and a third zone; determining a first replication operation between the first zone and the second zone based on a first value of the replication times and adding the first zone and the second zone to a tree set; determining a second replication operation between a zone of the tree set and the third zone based on a second value of the replication times and adding the third zone to the tree set; selecting a preferred replication topology based on a ranking of the first replication operation and the second replication operation among other replication operations determined for the pairs of zones from the replication times, wherein the ranking is based on a monetary cost of replication, a speed of replication, a reliability of replication, and satisfaction of a customer requirement for replication of a data chunk between the pairs of zones, wherein the data chunk comprises data stored in an append-only format according to an order in which the data was received by the geographically diverse data storage system, and wherein the data chunk is sealed prior to replication, causing the data chunk to be immutable; and replicating the data chunk among the pairs of zones according to the preferred replication topology, resulting in a replicated data chunk. 2. The system of claim 1 , wherein the first zone is located remotely from the second zone, and wherein the first zone is located remotely from the third zone. 3. The system of claim 1 , wherein the second zone is located remotely from the third zone. 4. The system of claim 1 , wherein the ranking is based, at least in part, on determining the first value of the replication times is lower than another value of the replication times. 5. The system of claim 1 , wherein the ranking is based, at least in part, on determining the first value of the replication times is the same as another value of the replication times, and is further based, at least in part, on determining that employing a zone corresponding to the first value results in shorter tree topology than employing another zone corresponding to the other value of the replication times. 6. The system of claim 1 , wherein the ranking is in response to a determining that a characteristic of the geographically diverse data storage system has transitioned a threshold value. 7. The system of claim 6 , wherein the threshold value is a replication time value of the replication times. 8. The system of claim 6 , wherein the threshold value is an amount of change in a replication time value of the replication times. 9. The system of claim 1 , wherein the operations further comprise iteratively determining another replication operation of the other replication operations, and wherein the other replication operation is between a zone of the tree set and another zone of the geographically diverse data storage system based on another value of the replication times and adding the other zone to the tree set. 10. The system of claim 1 , wherein the ranking of the replication operations excludes unavailable topology schemes. 11. The system of claim 1 , wherein the replicating the data chunk according to the preferred replication topology results in generating a protection set via replication of data chunks comprising the data chunk among zones comprised in the tree set. 12. The system of claim 9 , wherein the iteratively determining another replication operation between a zone of the tree set and another zone results in a third replication operation that occurs in parallel with the second replication operation. 13. A method, comprising: performing, by a system comprising a processor, a first iteration of operations comprising: in response to receiving, by the system, an indication of replication times between a pair of zones comprised in a geographically diverse data storage system comprising a first zone, a second zone, and a third zone, determining a first replication operation between the first zone and the second zone based on a first value of the replication times and adding the first zone and the second zone to a tree set; determining, by the system, a second replication operation between a zone of the tree set and the third zone based on a second value of the replication times and adding the third zone to the tree set; selecting, by the system, a preferred replication topology based on ranking viable replication topologies, wherein the ranking the viable replication topologies is based, at least in part, on the replication times, a monetary cost of replication, a reliability of replication, and satisfaction of a customer requirement for replicating a data chunk between pairs of zones, and wherein the preferred replication topology comprises the first replication operation and the second replication operation; and initiating, by the system, a replication of the data chunk in accord with the preferred replication topology, wherein the data chunk comprises data stored in an append-only format according to an order in which the data was received by the geographically diverse data storage system, and wherein the data chunk becomes immutable by sealing the data chunk prior to performing the replication. 14. The method of claim 13 , wherein the operations further comprise: in response to determining, by the system, that there is a relevant zone of the geographically diverse data storage system to be added to the tree set, iteratively determining at least another replication operation between a zone of the tree set and at least another zone of the geographically diverse data storage system based on at least another value of the replication times and adding at least the other zone to the tree set, resulting in in topology scheme of the viable topology schemes. 15. The method of claim 14 , wherein the iteratively determining at least the other replication operation results in a third replication operation that occurs in parallel with the second replication operation. 16. The method of claim 13 , wherein the determining the first replication operation results in the first replication operation being between remotely located zones. 17. A non-transitory machine-readable storage medium, comprising executable instructions that, when executed by a processor, facilitate performance of operations, comprising: determining that an indication of replication times between a pair of zones comprised in a geographically diverse data storage system comprising a first zone, a second zone, and a third zone, satisfies a rule related to a threshold value; determining a first replication operation between the first zone and the second zone based on a first value of the replication times and adding the first zone and the second zone to a tree set; determining a second replication operation between a zone of the tree set and the third zone based on a second value of the replication times and adding the third zone to the tree set; ranking viable replication topologies, wherein the ranking the viable replication topologies is based, at least in part, on the replication times, a monetary cost of replications, a reliability of replications, and satisfaction of a customer requirement for replicating a data chunk between pairs of zones, and wherein a selected replication

Assignees

Inventors

Classifications

  • G06F16/27Primary

    Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Distributed file systems · CPC title

  • Trees, e.g. B+trees · CPC title

  • Parallel file systems, i.e. file systems supporting multiple processors · CPC title

  • Applying rules; Deductive queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11681677B2 cover?
A geographically diverse data storage system that can protect data via replication of data among relevant zones according to a determined replication topology is disclosed. The replication topology can be determined based on replication times between the relevant zones. In an aspect, a tree topology can provide advantages over a star topography. In an embodiment, a tree topology can be generate…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 20 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).