Systems and methods of simulating the state of a distributed storage system
US-9659031-B2 · May 23, 2017 · US
US9971823B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9971823-B2 |
| Application number | US-201615090547-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 4, 2016 |
| Priority date | Jun 13, 2013 |
| Publication date | May 15, 2018 |
| Grant date | May 15, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a plurality of compute nodes, each comprising at least one processor and memory, wherein the plurality of compute nodes implement a data store; wherein the data store is configured to: maintain a plurality of replicas of data on behalf of a client of the data store at different ones of the compute nodes as a replica group for the data; obtain individual metadata for different replicas of the replica group to update status metadata stored for the replica group at one or more of the compute nodes remote from the different ones of the compute nodes that maintain the plurality of replicas; access, by a replica group status sweeper remote from the different ones of the compute nodes and remote from the one or more compute nodes that store the status metadata, the updated status metadata for the replica group at the one or more compute nodes to evaluate the replica group for compliance with a healthy state definition of a number of replicas for the replica group based, at least in part, on the updated status metadata, wherein the evaluation determines that a number of available replicas for the replica group is not compliant with the healthy state definition; and automatically restore the replica group such that the number of available replicas for the replica group is compliant with the healthy state definition for the replica group. 2. The system of claim 1 , wherein to obtain the individual metadata for different replicas of the replica group to update the status metadata, the data store is configured to send, by the different replicas of the data, the individual metadata to the one or more compute nodes to store the individual metadata as part of the status metadata. 3. The system of claim 1 , wherein one of the replicas in the replica group is a master replica for the replica group; wherein to obtain the individual metadata for different replicas of the replica group to update the status metadata, the data store is configured to: request, by the master replica, the individual metadata for the different replicas of the replica group; and update, by the master replica, the status metadata according to the individual metadata received from the different replicas. 4. The system of claim 1 , wherein to obtain the individual metadata for different replicas of the replica group to update the status metadata, the data store is configured to: request, by the replica group status sweeper, the individual metadata for the different replicas of the replica group; and update, by the replica group status sweeper, the status metadata according to the individual metadata received from the different replicas. 5. The system of claim 1 , wherein to evaluate the replica group for compliance with a healthy state definition of a number of replicas for the replica group, the data store is configured to: determine that the individual metadata for one or more replicas of the replica group has not been updated in the status metadata within a time threshold; send a request for the individual metadata to the one or more replicas; and identify the one or more replicas as unavailable upon a failure of the one or more replicas to respond to the request. 6. The system of claim 1 , wherein the data store is further configured to: prior to the performance of the automatic restoration: attempt to obtain current metadata for one or more replicas of the replica group determined to be unavailable according to the evaluation of the status metadata; and based, at least in part, on the attempt to obtain the current metadata, confirm performance of the automatic restoration, wherein unconfirmed automatic restorations at the data store are not performed. 7. The system of claim 1 , wherein the data store is a network-based storage service, wherein the replica group is one of a plurality of different replica groups for different respective data maintained at the network-based storage service among the compute nodes; wherein the network-based storage service is further configured to perform the obtainment of the individual metadata, the access of the status metadata, and the automatic restoration for one or more other replica groups in addition to the replica group; and wherein the performance of the automatic restoration for the replica group and the one or more other replica groups is ordered according to a dynamically determined schedule for performing the automatic restorations based, at least in part, on one or more resource constraints in the network-based storage service. 8. A method, comprising: performing, by a plurality of computing devices: maintaining a plurality of replicas of data on behalf of a client of a data store at different ones of a plurality of compute nodes as a replica group for the data; obtaining individual metadata for different replicas of the replica group to update status metadata stored for the replica group at one or more of the compute nodes remote from the different ones of the compute nodes; accessing, by a replica group status sweeper that is remote from the different ones of the compute nodes and remote from the one or more compute nodes that store the status metadata, the updated status metadata for the replica group at the one or more compute nodes to evaluate the replica group for compliance with a healthy state definition of a number of replicas for the replica group based, at least in part, on the updated status metadata, wherein the evaluation determines that a number of available replicas for the replica group is not compliant with the healthy state definition; and automatically restoring the replica group such that the number of available replicas for the replica group is compliant with the healthy state definition for the replica group. 9. The method of claim 8 , wherein the obtaining the individual metadata for different replicas of the replica group to update the status metadata, comprises sending, by the different replicas of the data, the individual metadata to the one or more compute nodes to store the individual metadata as part of the status metadata. 10. The method of claim 8 , wherein one of the replicas in the replica group is a master replica for the replica group; wherein the obtaining the individual metadata for different replicas of the replica group to update the status metadata, comprises: requesting, by the master replica, the individual metadata for the different replicas of the replica group; and updating, by the master replica, the status metadata according to the individual metadata received from the different replicas. 11. The method of claim 8 , wherein the obtaining the individual metadata for different replicas of the replica group to update the status metadata, comprises: requesting, by the replica group status sweeper, the individual metadata for the different replicas of the replica group; and updating, by the replica group status sweeper, the status metadata according to the individual metadata received from the different replicas. 12. The method of claim 8 , wherein evaluating the replicas for the replica group, comprises: determining that the individual metadata for one or more replicas of the replica group has not been updated in the status metadata within a time threshold; sending a request for the individual metadata to the one or more replicas; and identifying the one or more replicas as unavailable upon a failure of the one or more replicas to respond to the request. 13. The method of claim 8 , further comprising: prior to automatically restoring the replica group: attempting to obtain current metadata for one or more
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
Event-based monitoring · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.