What technology area does this patent fall under?

Primary CPC classification G06F16/27. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 15 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic replica failure detection and healing

US9971823B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9971823-B2
Application number	US-201615090547-A
Country	US
Kind code	B2
Filing date	Apr 4, 2016
Priority date	Jun 13, 2013
Publication date	May 15, 2018
Grant date	May 15, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a plurality of compute nodes, each comprising at least one processor and memory, wherein the plurality of compute nodes implement a data store; wherein the data store is configured to: maintain a plurality of replicas of data on behalf of a client of the data store at different ones of the compute nodes as a replica group for the data; obtain individual metadata for different replicas of the replica group to update status metadata stored for the replica group at one or more of the compute nodes remote from the different ones of the compute nodes that maintain the plurality of replicas; access, by a replica group status sweeper remote from the different ones of the compute nodes and remote from the one or more compute nodes that store the status metadata, the updated status metadata for the replica group at the one or more compute nodes to evaluate the replica group for compliance with a healthy state definition of a number of replicas for the replica group based, at least in part, on the updated status metadata, wherein the evaluation determines that a number of available replicas for the replica group is not compliant with the healthy state definition; and automatically restore the replica group such that the number of available replicas for the replica group is compliant with the healthy state definition for the replica group. 2. The system of claim 1 , wherein to obtain the individual metadata for different replicas of the replica group to update the status metadata, the data store is configured to send, by the different replicas of the data, the individual metadata to the one or more compute nodes to store the individual metadata as part of the status metadata. 3. The system of claim 1 , wherein one of the replicas in the replica group is a master replica for the replica group; wherein to obtain the individual metadata for different replicas of the replica group to update the status metadata, the data store is configured to: request, by the master replica, the individual metadata for the different replicas of the replica group; and update, by the master replica, the status metadata according to the individual metadata received from the different replicas. 4. The system of claim 1 , wherein to obtain the individual metadata for different replicas of the replica group to update the status metadata, the data store is configured to: request, by the replica group status sweeper, the individual metadata for the different replicas of the replica group; and update, by the replica group status sweeper, the status metadata according to the individual metadata received from the different replicas. 5. The system of claim 1 , wherein to evaluate the replica group for compliance with a healthy state definition of a number of replicas for the replica group, the data store is configured to: determine that the individual metadata for one or more replicas of the replica group has not been updated in the status metadata within a time threshold; send a request for the individual metadata to the one or more replicas; and identify the one or more replicas as unavailable upon a failure of the one or more replicas to respond to the request. 6. The system of claim 1 , wherein the data store is further configured to: prior to the performance of the automatic restoration: attempt to obtain current metadata for one or more replicas of the replica group determined to be unavailable according to the evaluation of the status metadata; and based, at least in part, on the attempt to obtain the current metadata, confirm performance of the automatic restoration, wherein unconfirmed automatic restorations at the data store are not performed. 7. The system of claim 1 , wherein the data store is a network-based storage service, wherein the replica group is one of a plurality of different replica groups for different respective data maintained at the network-based storage service among the compute nodes; wherein the network-based storage service is further configured to perform the obtainment of the individual metadata, the access of the status metadata, and the automatic restoration for one or more other replica groups in addition to the replica group; and wherein the performance of the automatic restoration for the replica group and the one or more other replica groups is ordered according to a dynamically determined schedule for performing the automatic restorations based, at least in part, on one or more resource constraints in the network-based storage service. 8. A method, comprising: performing, by a plurality of computing devices: maintaining a plurality of replicas of data on behalf of a client of a data store at different ones of a plurality of compute nodes as a replica group for the data; obtaining individual metadata for different replicas of the replica group to update status metadata stored for the replica group at one or more of the compute nodes remote from the different ones of the compute nodes; accessing, by a replica group status sweeper that is remote from the different ones of the compute nodes and remote from the one or more compute nodes that store the status metadata, the updated status metadata for the replica group at the one or more compute nodes to evaluate the replica group for compliance with a healthy state definition of a number of replicas for the replica group based, at least in part, on the updated status metadata, wherein the evaluation determines that a number of available replicas for the replica group is not compliant with the healthy state definition; and automatically restoring the replica group such that the number of available replicas for the replica group is compliant with the healthy state definition for the replica group. 9. The method of claim 8 , wherein the obtaining the individual metadata for different replicas of the replica group to update the status metadata, comprises sending, by the different replicas of the data, the individual metadata to the one or more compute nodes to store the individual metadata as part of the status metadata. 10. The method of claim 8 , wherein one of the replicas in the replica group is a master replica for the replica group; wherein the obtaining the individual metadata for different replicas of the replica group to update the status metadata, comprises: requesting, by the master replica, the individual metadata for the different replicas of the replica group; and updating, by the master replica, the status metadata according to the individual metadata received from the different replicas. 11. The method of claim 8 , wherein the obtaining the individual metadata for different replicas of the replica group to update the status metadata, comprises: requesting, by the replica group status sweeper, the individual metadata for the different replicas of the replica group; and updating, by the replica group status sweeper, the status metadata according to the individual metadata received from the different replicas. 12. The method of claim 8 , wherein evaluating the replicas for the replica group, comprises: determining that the individual metadata for one or more replicas of the replica group has not been updated in the status metadata within a time threshold; sending a request for the individual metadata to the one or more replicas; and identifying the one or more replicas as unavailable upon a failure of the one or more replicas to respond to the request. 13. The method of claim 8 , further comprising: prior to automatically restoring the replica group: attempting to obtain current metadata for one or more

Assignees

Amazon Tech Inc

Inventors

Classifications

G06F16/27Primary
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
G06F11/0709
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
G06F11/0793
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
G06F2201/86
Event-based monitoring · CPC title
G06F9/4881
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

View patent family 55588977

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9971823B2 cover?: Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing opera…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 15 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).