Job dispatching with scheduler record updates containing characteristics combinations of job characteristics
US-9015724-B2 · Apr 21, 2015 · US
US9304815B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9304815-B1 |
| Application number | US-201313917317-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 13, 2013 |
| Priority date | Jun 13, 2013 |
| Publication date | Apr 5, 2016 |
| Grant date | Apr 5, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a plurality of computing nodes, each comprising at least one processor and memory, wherein the plurality of computing nodes are configured to implement a data storage service, wherein the data storage service comprises: one or more replica groups stored among the plurality of computing nodes, wherein each of the one or more replica groups maintains one or more replicas of data on behalf of one or more storage service clients, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the replica group; a replica group status sweeper, configured to identify replica groups with a number of available replicas not compliant with the respective healthy state definition for the respective replica group, wherein said identification is based, at least in part, on status metadata for the respective replica group; and a dynamic heal scheduler, configured to schedule one or more replica healing operations to restore the number of available replicas for the identified replica groups to the respective healthy state definition for the identified replica groups based, at least in part, on one or more resource constraints for performing healing operations, wherein to schedule the one or more replica healing operations, the dynamic heal scheduler is further configured to determine an order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on one or more resource requirements for each of the one or more replica healing operations. 2. The system of claim 1 , wherein the replica group status sweeper is further configured to: update the status metadata within a table storing availability information for replicas of the one or more replica groups, wherein the table is stored on one or more of the plurality of computing nodes within the data storage service. 3. A method, comprising: performing, by a plurality of computing devices: accessing status metadata for one or more replica groups, wherein each of the one or more replica groups maintains one or more replicas of data, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the replica group; determining, based at least in part on the status metadata, that a number of available replicas for at least one replica group of the one or more replica groups is not compliant with the respective healthy state definition for the at least one replica group; and dynamically scheduling one or more replica healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition for the at least one replica group based, at least in part, on one or more resource constraints for performing healing operations, wherein the scheduling determines an order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on one or more resource requirements for each of the one or more replica healing operations. 4. The method of claim 3 , wherein each replica of a replica group of the one or more replica groups is stored on different ones of a plurality of compute nodes, and wherein the method further comprises: receiving from one or more of the different ones of the plurality of compute nodes status information for a compute node; and in response to receiving the status information for the compute node, updating the status metadata to reflect the received status information. 5. The method of claim 4 , wherein the one or more of the different ones of the plurality of compute nodes include a master node of the different ones of the plurality of compute nodes, and wherein the status information is received periodically or aperiodically. 6. The method of claim 3 , wherein each replica of a replica group of the one or more replica groups is stored on different ones of a plurality of compute nodes, and wherein said determining, based at least in part on the status metadata, that a number of available replicas for at least one replica group of the one or more replica groups is below a specified number of replicas for the at least one replica group, comprises: analyzing the status metadata to identify one or more compute nodes for status confirmation; and requesting status information from the identified one or more compute nodes storing a given replica to confirm the status of the identified one or more compute nodes. 7. The method of claim 3 , wherein the order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on the one or more resource requirements for each of the one or more replica healing operations, comprises ordering the one or more replica healing operations according to replica access frequency of the one or more replica groups. 8. The method of claim 3 , further comprising: determining the one or more resource requirements of each of the one or more replica healing operations based on identifying a heal source and heal destination for the one or more replica healing operations; wherein the order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on the one or more resource requirements for each of the one or more replica healing operations, comprises ordering the one or more replica healing operations based, at least in part, on the heal source and the heal destination for the one or more replica healing operations such that the ordering of the one or more replica healing operations does not result in a conflict between a plurality of queued healing operations. 9. The method of claim 3 , wherein the one or more resource constraints for performing healing operations comprise expected network traffic directed toward the one or more replica groups; wherein the order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on the one or more resource requirements for each of the one or more replica healing operations, comprises ordering the one or more replica healing operations according to the expected network traffic directed toward the one or more replica groups. 10. A non-transitory, computer-readable storage medium, storing program instructions that when executed by a plurality of computing devices implement a data storage service that implements: accessing status metadata for one or more replica groups, wherein each of the one or more replica groups maintains one or more replicas of data stored among a plurality of compute nodes implemented by the plurality of computing devices on behalf of one or more storage service clients, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the respective replica group; determining, based at least in part on the status metadata, that a number of available replicas for at least one replica group of the one or more replica groups is not compliant with the respective healthy state definition for the at least one replica group; and dynamically scheduling one or more replica healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition for the at least one replica group based, at least in part, on one or more resource constraints for performing healing operations, wherein the scheduling determines an order in which
Identification (G06F11/2289 takes precedence) · CPC title
in relation to availability · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.