Drive wear-out prediction based on workload and risk tolerance

US9830107B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9830107-B2
Application numberUS-201615142923-A
CountryUS
Kind codeB2
Filing dateApr 29, 2016
Priority dateApr 29, 2016
Publication dateNov 28, 2017
Grant dateNov 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for optimizing the estimation and management of wear and replacement for an array of storage devices in a storage system is disclosed. An input/output workload is monitored over part of a service period for the array. An expected wear rate is determined, based on the workload and an endurance of the storage devices. A target wear rate is calculated for the service period and each of one or more contingency periods, based on the expected wear rate and a specified risk tolerance for each period. In response to determining that the expected wear rate exceeds the target wear rate calculated for at least one of the service period and the contingency period(s), an adjusted wear rate is calculated for the array of storage devices to match the target wear rate. A replacement schedule is generated for the array based on the adjusted wear rate.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: monitoring a workload of input/output (I/O) operations over an initial portion of a service period for an array of storage devices within a storage system; determining an expected wear rate for the array of storage devices for a remaining portion of the service period, based on the monitored workload and an endurance of storage devices in the array; calculating a target wear rate for the array of storage devices for the remaining portion of the service period and each of one or more contingency periods, based on the expected wear rate and a specified risk tolerance for each period, the one or more contingency periods extending the expected wear rate of the array of storage devices past the service period to a terminal state at an end of the array's remaining service life; in response to determining that the expected wear rate exceeds the target wear rate calculated for at least one of the remaining portion of the service period and the one or more contingency periods, calculating an adjusted wear rate for the array of storage devices to match the target wear rate in accordance with the determination; and generating a replacement schedule for storage devices in the array over the remaining service life, based on the adjusted wear rate. 2. The method of claim 1 , wherein the expected wear rate corresponds to an expected replacement rate for the array of storage devices within the array, and the adjusted wear rate corresponds to a target replacement rate for the storage devices. 3. The method of claim 1 , wherein the I/O operations include at least one of: host-initiated I/O operations, garbage collection I/O operations, or reconstruction IO operations. 4. The method of claim 3 , wherein the monitoring comprises: measuring a total data transfer rate for the array of storage devices based on the host-initiated I/O operations and the garbage collection I/O operations. 5. The method of claim 1 , wherein a total duration of the remaining portion of the service period and the one or more contingency periods represents a total remaining service life of the array of storage devices. 6. The method of claim 5 , wherein each of the one or more contingency periods represents a level of risk for a different type of contingency that affects the total remaining service life of the array of storage devices. 7. The method of claim 5 , wherein a duration of one of the contingency periods is based on a remaining portion of the service period and a specified weighting for risk tolerance. 8. The method of claim 5 , wherein a duration of one of the contingency periods is a fixed period of time representing a normal distribution of wear that extends beyond a point following the service period. 9. The method of claim 5 , wherein one of the contingency periods accounts for unexpected changes to the workload of the array of storage devices, and a duration of the one of the contingency periods is determined based on the workload monitored over the initial portion of the service period and a maximum workload capacity associated with the remaining portion of the service period. 10. The method of claim 9 , wherein the maximum workload capacity is based on at least one of a maximum performance limit of the storage devices in the array or a service-level agreement between a storage system operator and one or more hosts for which data services are provided by the storage system operator via the network storage system. 11. A non-transitory machine readable medium having stored thereon instructions for performing a method comprising machine executable code which when executed by at least one machine, causes the machine to: monitor a workload of input/output (I/O) operations over an initial portion of a service period for an array of storage devices within a storage system; determine an expected wear rate for the array of storage devices for a remaining portion of the service period, based on the monitored workload and an endurance of storage devices in the array; calculate a target wear rate for the array of storage devices for the remaining portion of the service period and each of one or more contingency periods, based on the expected wear rate and a specified risk tolerance for each period, the one or more contingency periods extending the expected wear rate of the array of storage devices past the service period to a terminal state at an end of the array's remaining service life; determine that the expected wear rate exceeds the target wear rate calculated for at least one of the remaining portion of the service period and the one or more contingency periods; calculate an adjusted wear rate for the array of storage devices to match the target wear rate in accordance with the determination; and generate a replacement schedule for storage devices in the array over the remaining service life, based on the adjusted wear rate. 12. The non-transitory machine readable medium of claim 11 , wherein the expected wear rate corresponds to an expected replacement rate for the array of storage devices within the array, and the adjusted wear rate corresponds to a target replacement rate for the storage devices. 13. The non-transitory machine readable medium of claim 11 , wherein the I/O operations include at least one of host-initiated I/O operations, garbage collection I/O operations, or reconstruction I/O operations. 14. The non-transitory machine readable medium of claim 13 , further comprising machine executable code that causes the machine to: measure a total data transfer rate for the array of storage devices based on the host-initiated I/O operations and the garbage collection I/O operations. 15. The non-transitory machine readable medium of claim 11 , wherein a total duration of the remaining portion of the service period and the one or more contingency periods represents a total remaining service life of the array of storage devices. 16. The non-transitory machine readable medium of claim 15 , wherein each of the one or more contingency periods represents a level of risk for a different type of contingency that affects the total remaining service life of the array of storage devices. 17. The non-transitory machine readable medium of claim 15 , wherein a duration of one of the contingency periods is based on a remaining portion of the service period and a specified weighting for risk tolerance. 18. The non-transitory machine readable medium of claim 15 , wherein the duration of one of the contingency periods is a fixed period of time representing a normal distribution of wear that extends beyond a point following the service period. 19. The non-transitory machine readable medium of claim 15 , wherein one of the contingency periods accounts for unexpected changes to the workload of the array of storage devices, and a duration of the one of the contingency periods is determined based on the workload monitored over the initial portion of the service period and a maximum workload capacity associated with the remaining portion of the service period. 20. A computing device comprising: a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method of estimating and managing wear and replacement for storage devices in a storage system; and a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to: monitor a workload of input/output (I/O) operations over an initial portion of a ser

Assignees

Inventors

Classifications

  • in relation to life time, e.g. increasing Mean Time Between Failures [MTBF] · CPC title

  • by initialisation or re-initialisation of storage systems · CPC title

  • by allocating resources to storage systems · CPC title

  • G06F3/0689Primary

    Disk arrays, e.g. RAID, JBOD · CPC title

  • Non-volatile semiconductor memory arrays · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9830107B2 cover?
A system and method for optimizing the estimation and management of wear and replacement for an array of storage devices in a storage system is disclosed. An input/output workload is monitored over part of a service period for the array. An expected wear rate is determined, based on the workload and an endurance of the storage devices. A target wear rate is calculated for the service period and…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0689. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).