What technology area does this patent fall under?

Primary CPC classification G06F9/4881. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic replica failure detection and healing

US9304815B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9304815-B1
Application number	US-201313917317-A
Country	US
Kind code	B1
Filing date	Jun 13, 2013
Priority date	Jun 13, 2013
Publication date	Apr 5, 2016
Grant date	Apr 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a plurality of computing nodes, each comprising at least one processor and memory, wherein the plurality of computing nodes are configured to implement a data storage service, wherein the data storage service comprises: one or more replica groups stored among the plurality of computing nodes, wherein each of the one or more replica groups maintains one or more replicas of data on behalf of one or more storage service clients, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the replica group; a replica group status sweeper, configured to identify replica groups with a number of available replicas not compliant with the respective healthy state definition for the respective replica group, wherein said identification is based, at least in part, on status metadata for the respective replica group; and a dynamic heal scheduler, configured to schedule one or more replica healing operations to restore the number of available replicas for the identified replica groups to the respective healthy state definition for the identified replica groups based, at least in part, on one or more resource constraints for performing healing operations, wherein to schedule the one or more replica healing operations, the dynamic heal scheduler is further configured to determine an order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on one or more resource requirements for each of the one or more replica healing operations. 2. The system of claim 1 , wherein the replica group status sweeper is further configured to: update the status metadata within a table storing availability information for replicas of the one or more replica groups, wherein the table is stored on one or more of the plurality of computing nodes within the data storage service. 3. A method, comprising: performing, by a plurality of computing devices: accessing status metadata for one or more replica groups, wherein each of the one or more replica groups maintains one or more replicas of data, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the replica group; determining, based at least in part on the status metadata, that a number of available replicas for at least one replica group of the one or more replica groups is not compliant with the respective healthy state definition for the at least one replica group; and dynamically scheduling one or more replica healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition for the at least one replica group based, at least in part, on one or more resource constraints for performing healing operations, wherein the scheduling determines an order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on one or more resource requirements for each of the one or more replica healing operations. 4. The method of claim 3 , wherein each replica of a replica group of the one or more replica groups is stored on different ones of a plurality of compute nodes, and wherein the method further comprises: receiving from one or more of the different ones of the plurality of compute nodes status information for a compute node; and in response to receiving the status information for the compute node, updating the status metadata to reflect the received status information. 5. The method of claim 4 , wherein the one or more of the different ones of the plurality of compute nodes include a master node of the different ones of the plurality of compute nodes, and wherein the status information is received periodically or aperiodically. 6. The method of claim 3 , wherein each replica of a replica group of the one or more replica groups is stored on different ones of a plurality of compute nodes, and wherein said determining, based at least in part on the status metadata, that a number of available replicas for at least one replica group of the one or more replica groups is below a specified number of replicas for the at least one replica group, comprises: analyzing the status metadata to identify one or more compute nodes for status confirmation; and requesting status information from the identified one or more compute nodes storing a given replica to confirm the status of the identified one or more compute nodes. 7. The method of claim 3 , wherein the order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on the one or more resource requirements for each of the one or more replica healing operations, comprises ordering the one or more replica healing operations according to replica access frequency of the one or more replica groups. 8. The method of claim 3 , further comprising: determining the one or more resource requirements of each of the one or more replica healing operations based on identifying a heal source and heal destination for the one or more replica healing operations; wherein the order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on the one or more resource requirements for each of the one or more replica healing operations, comprises ordering the one or more replica healing operations based, at least in part, on the heal source and the heal destination for the one or more replica healing operations such that the ordering of the one or more replica healing operations does not result in a conflict between a plurality of queued healing operations. 9. The method of claim 3 , wherein the one or more resource constraints for performing healing operations comprise expected network traffic directed toward the one or more replica groups; wherein the order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on the one or more resource requirements for each of the one or more replica healing operations, comprises ordering the one or more replica healing operations according to the expected network traffic directed toward the one or more replica groups. 10. A non-transitory, computer-readable storage medium, storing program instructions that when executed by a plurality of computing devices implement a data storage service that implements: accessing status metadata for one or more replica groups, wherein each of the one or more replica groups maintains one or more replicas of data stored among a plurality of compute nodes implemented by the plurality of computing devices on behalf of one or more storage service clients, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the respective replica group; determining, based at least in part on the status metadata, that a number of available replicas for at least one replica group of the one or more replica groups is not compliant with the respective healthy state definition for the at least one replica group; and dynamically scheduling one or more replica healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition for the at least one replica group based, at least in part, on one or more resource constraints for performing healing operations, wherein the scheduling determines an order in which

Assignees

Amazon Tech Inc

Inventors

Classifications

G06F11/006
Identification (G06F11/2289 takes precedence) · CPC title
G06F3/0617
in relation to availability · CPC title
G06F11/0709
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
G06F11/0793
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
G06F9/4881Primary
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

View patent family 55588977

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9304815B1 cover?: Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing opera…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).