Methods to identify, handle and recover from suspect ssds in a clustered flash array
US-2017269980-A1 · Sep 21, 2017 · US
US9830107B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9830107-B2 |
| Application number | US-201615142923-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 29, 2016 |
| Priority date | Apr 29, 2016 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for optimizing the estimation and management of wear and replacement for an array of storage devices in a storage system is disclosed. An input/output workload is monitored over part of a service period for the array. An expected wear rate is determined, based on the workload and an endurance of the storage devices. A target wear rate is calculated for the service period and each of one or more contingency periods, based on the expected wear rate and a specified risk tolerance for each period. In response to determining that the expected wear rate exceeds the target wear rate calculated for at least one of the service period and the contingency period(s), an adjusted wear rate is calculated for the array of storage devices to match the target wear rate. A replacement schedule is generated for the array based on the adjusted wear rate.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: monitoring a workload of input/output (I/O) operations over an initial portion of a service period for an array of storage devices within a storage system; determining an expected wear rate for the array of storage devices for a remaining portion of the service period, based on the monitored workload and an endurance of storage devices in the array; calculating a target wear rate for the array of storage devices for the remaining portion of the service period and each of one or more contingency periods, based on the expected wear rate and a specified risk tolerance for each period, the one or more contingency periods extending the expected wear rate of the array of storage devices past the service period to a terminal state at an end of the array's remaining service life; in response to determining that the expected wear rate exceeds the target wear rate calculated for at least one of the remaining portion of the service period and the one or more contingency periods, calculating an adjusted wear rate for the array of storage devices to match the target wear rate in accordance with the determination; and generating a replacement schedule for storage devices in the array over the remaining service life, based on the adjusted wear rate. 2. The method of claim 1 , wherein the expected wear rate corresponds to an expected replacement rate for the array of storage devices within the array, and the adjusted wear rate corresponds to a target replacement rate for the storage devices. 3. The method of claim 1 , wherein the I/O operations include at least one of: host-initiated I/O operations, garbage collection I/O operations, or reconstruction IO operations. 4. The method of claim 3 , wherein the monitoring comprises: measuring a total data transfer rate for the array of storage devices based on the host-initiated I/O operations and the garbage collection I/O operations. 5. The method of claim 1 , wherein a total duration of the remaining portion of the service period and the one or more contingency periods represents a total remaining service life of the array of storage devices. 6. The method of claim 5 , wherein each of the one or more contingency periods represents a level of risk for a different type of contingency that affects the total remaining service life of the array of storage devices. 7. The method of claim 5 , wherein a duration of one of the contingency periods is based on a remaining portion of the service period and a specified weighting for risk tolerance. 8. The method of claim 5 , wherein a duration of one of the contingency periods is a fixed period of time representing a normal distribution of wear that extends beyond a point following the service period. 9. The method of claim 5 , wherein one of the contingency periods accounts for unexpected changes to the workload of the array of storage devices, and a duration of the one of the contingency periods is determined based on the workload monitored over the initial portion of the service period and a maximum workload capacity associated with the remaining portion of the service period. 10. The method of claim 9 , wherein the maximum workload capacity is based on at least one of a maximum performance limit of the storage devices in the array or a service-level agreement between a storage system operator and one or more hosts for which data services are provided by the storage system operator via the network storage system. 11. A non-transitory machine readable medium having stored thereon instructions for performing a method comprising machine executable code which when executed by at least one machine, causes the machine to: monitor a workload of input/output (I/O) operations over an initial portion of a service period for an array of storage devices within a storage system; determine an expected wear rate for the array of storage devices for a remaining portion of the service period, based on the monitored workload and an endurance of storage devices in the array; calculate a target wear rate for the array of storage devices for the remaining portion of the service period and each of one or more contingency periods, based on the expected wear rate and a specified risk tolerance for each period, the one or more contingency periods extending the expected wear rate of the array of storage devices past the service period to a terminal state at an end of the array's remaining service life; determine that the expected wear rate exceeds the target wear rate calculated for at least one of the remaining portion of the service period and the one or more contingency periods; calculate an adjusted wear rate for the array of storage devices to match the target wear rate in accordance with the determination; and generate a replacement schedule for storage devices in the array over the remaining service life, based on the adjusted wear rate. 12. The non-transitory machine readable medium of claim 11 , wherein the expected wear rate corresponds to an expected replacement rate for the array of storage devices within the array, and the adjusted wear rate corresponds to a target replacement rate for the storage devices. 13. The non-transitory machine readable medium of claim 11 , wherein the I/O operations include at least one of host-initiated I/O operations, garbage collection I/O operations, or reconstruction I/O operations. 14. The non-transitory machine readable medium of claim 13 , further comprising machine executable code that causes the machine to: measure a total data transfer rate for the array of storage devices based on the host-initiated I/O operations and the garbage collection I/O operations. 15. The non-transitory machine readable medium of claim 11 , wherein a total duration of the remaining portion of the service period and the one or more contingency periods represents a total remaining service life of the array of storage devices. 16. The non-transitory machine readable medium of claim 15 , wherein each of the one or more contingency periods represents a level of risk for a different type of contingency that affects the total remaining service life of the array of storage devices. 17. The non-transitory machine readable medium of claim 15 , wherein a duration of one of the contingency periods is based on a remaining portion of the service period and a specified weighting for risk tolerance. 18. The non-transitory machine readable medium of claim 15 , wherein the duration of one of the contingency periods is a fixed period of time representing a normal distribution of wear that extends beyond a point following the service period. 19. The non-transitory machine readable medium of claim 15 , wherein one of the contingency periods accounts for unexpected changes to the workload of the array of storage devices, and a duration of the one of the contingency periods is determined based on the workload monitored over the initial portion of the service period and a maximum workload capacity associated with the remaining portion of the service period. 20. A computing device comprising: a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method of estimating and managing wear and replacement for storage devices in a storage system; and a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to: monitor a workload of input/output (I/O) operations over an initial portion of a ser
in relation to life time, e.g. increasing Mean Time Between Failures [MTBF] · CPC title
by initialisation or re-initialisation of storage systems · CPC title
by allocating resources to storage systems · CPC title
Disk arrays, e.g. RAID, JBOD · CPC title
Non-volatile semiconductor memory arrays · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.