Dynamic replica failure detection and healing
US-9304815-B1 · Apr 5, 2016 · US
US2016335166A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016335166-A1 |
| Application number | US-201514712762-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 14, 2015 |
| Priority date | May 14, 2015 |
| Publication date | Nov 17, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments include obtaining at least one system metric of a distributed storage system, generating one or more recovery parameters based on the at least one system metric, identifying at least one policy associated with data stored in a storage node of a plurality of storage nodes in the distributed storage system, and generating a recovery plan for the data based on the one or more recovery parameters and the at least one policy. In more specific embodiments, the recovery plan includes a recovery order for recovering the data. Further embodiments include initiating a recovery process to copy replicas of the data from a second storage node to a new storage node, wherein the replicas of the data are copied according to the recovery order indicated in the recovery plan.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining at least one system metric of a distributed storage system; generating one or more recovery parameters based on the at least one system metric; identifying at least one policy associated with data stored in a storage node of a plurality of storage nodes in the distributed storage system; and generating a recovery plan for the data based on the one or more recovery parameters and the at least one policy. 2 . The method of claim 1 , wherein the at least one system metric includes information related to at least one of on-going client operations, current central processing unit (CPU) utilization, disk usage, available network bandwidth, remaining disk input/output operations per second (IOPS), and remaining disk bandwidth. 3 . The method of claim 1 , wherein the at least one system metric is pushed, in real-time, to a recovery system from at least one storage node of the plurality of storage nodes in the distributed storage system. 4 . The method of claim 1 , further comprising: monitoring the plurality of storage nodes in the distributed storage system for an indication of failure, wherein the recovery plan is generated for the data after a failure of the storage node is detected. 5 . The method of claim 1 , further comprising: monitoring the plurality of storage nodes in the distributed storage system for an indication of impending failure, wherein the recovery plan is generated for the data before a failure of the storage node is detected. 6 . The method claim 1 , wherein the recovery plan includes a recovery order for recovering the data. 7 . The method of claim 6 , further comprising: initiating a recovery process to copy replicas of the data from one or more other storage nodes to a new storage node, wherein the replicas are copied according to the recovery order indicated in the recovery plan. 8 . The method of claim 1 , wherein a first subset of the data associated with a first policy are recovered before a second subset of the data associated with a second policy, wherein the first policy indicates a higher priority than the second policy. 9 . The method of claim 1 , wherein the recovery plan is to recover a first subset of the data before a second subset of the data if the first subset of the data is associated with a smaller replication factor than the second subset of the data. 10 . At least one machine readable storage medium comprising instructions stored therein, and when executed by at least one processor the instructions cause the at least one processor to: obtain at least one system metric of a distributed storage system; generate one or more recovery parameters based on the at least one system metric; identify at least one policy associated with data stored in a storage node of a plurality of storage nodes in the distributed storage system; and generate a recovery plan for the data based on the one or more recovery parameters and the at least one policy. 11 . The at least one machine readable storage medium of claim 10 , wherein the instructions when executed by the at least one processor cause the at least one processor to: monitor the plurality of storage nodes in the distributed storage system for an indication of failure, wherein the recovery plan is generated for the data after a failure of the storage node is detected. 12 . The at least one machine readable storage medium of claim 10 , wherein the instructions when executed by the at least one processor cause the at least one processor to: monitor the plurality of storage nodes in the distributed storage system for an indication of impending failure, wherein the recovery plan is generated for the data after an impending failure of the storage node is detected and before a failure of the storage node is detected. 13 . The at least one machine readable storage medium of claim 10 , wherein the recovery plan includes a recovery order for recovering the data. 14 . The at least one machine readable storage medium of claim 13 , wherein the instructions when executed by the at least one processor cause the at least one processor to: initiate a recovery process to copy replicas of the data from one or more other storage nodes to a new storage node, wherein the replicas are to be copied according to the recovery order indicated in the recovery plan. 15 . The at least one machine readable storage medium of claim 10 , wherein a first subset of the data associated with a first policy of a first tenant are recovered before a second subset of the data associated with a second policy of a second tenant, wherein the first policy indicates a higher priority than the second policy. 16 . The at least one machine readable storage medium of claim 10 , wherein the recovery plan prioritizes recovery of a first subset of the data if the first subset is not replicated in at least a threshold number of other storage nodes that are active. 17 . An apparatus comprising: at least one processor; and at least one memory element comprising instructions that when executed by the at least one processor cause the apparatus to: obtain at least one system metric of a distributed storage system; generate one or more recovery parameters based on the at least one system metric; identify at least one policy associated with data stored in a storage node of a plurality of storage nodes in the distributed storage system; and generate a recovery plan for the data based on the one or more recovery parameters and the at least one policy. 18 . The apparatus of claim 17 , wherein the at least one system metric is pushed, in real-time, to a recovery system from at least one storage node of the plurality of storage nodes in the distributed storage system. 19 . The apparatus of claim 17 , wherein the recovery plan includes a recovery order for recovering the data. 20 . The apparatus of claim 19 , wherein the instructions when executed by the at least one processor cause the apparatus to: initiating a recovery process to copy replicas of the data from a second storage node to a new storage node, wherein the replicas of the data are to be copied according to the recovery order indicated in the recovery plan.
Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title
Migration mechanisms · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Error detection; Error correction; Monitoring (error detection, correction or monitoring in information storage based on relative movement between record carrier and transducer G11B20/18; monitoring, i.e. supervising the progress of recording or reproducing G11B27/36; in static stores G11C29/00) · CPC title
in relation to data integrity, e.g. data losses, bit errors · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.