RAID failure prevention

US9766980B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9766980-B1
Application numberUS-201313886763-A
CountryUS
Kind codeB1
Filing dateMay 3, 2013
Priority dateMay 3, 2013
Publication dateSep 19, 2017
Grant dateSep 19, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Individual storage devices of a RAID group are monitored for faults. A health indicator for each storage device is calculated based on fault growth rate. Non-failed storage device are swapped out based on the health indicator. Techniques for monitoring the storage devices include background media scans and growth list polling.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: monitoring faults in each individual storage device of a plurality of storage devices of a redundant array of inexpensive disks (RAID) group by polling each of the storage devices for a growth list size; calculating a growth list size delta for each of the storage devices by comparing a current growth list size with a previous growth list size; categorizing each growth list size delta as a growth type of a plurality of different growth types; calculating a ratio of occurrence of the different growth types for each of the storage devices; calculating an average growth list size delta for each of the storage devices; calculating a health indicator for each of the storage devices based on the ratio and the average growth list size delta for that storage device, including calculating a health indicator for a non-failed storage device based on the ratio and the average growth list size delta for that non-failed storage device; and prompting replacement of the non-failed storage device based on the health indicator for that non-failed storage device. 2. The method of claim 1 wherein monitoring faults includes performing background media scans. 3. The method of claim 2 wherein calculating a health indicator for a given storage device includes calculating an error rate indicative of bad block count per total days producing bad blocks. 4. The method of claim 3 including comparing the error rate with a threshold. 5. The method of claim 4 including calculating the threshold by adjusting a base value as a function of at least one of usage history, host scan rate, reassign rate, drive scan rate number of Write operations performed, error code generation history and physical location of errors on the storage device. 6. The method of claim 1 including calculating the health indicator of each storage device in terms of life expectancy based on the ratio, the average growth list size delta, and a predetermined maximum growth list size. 7. The method of claim 6 including prompting replacement of the non-failed storage device when the predetermined maximum growth list size has been reached or when a least acceptable threshold level for life expectancy has been reached. 8. The method of claim 1 including determining a replacement sequence of multiple non-failed storage devices based on the health indicators of the multiple non-failed storage devices. 9. The method of claim 1 including determining replacement timing of multiple non-failed storage devices based on the health indicators of the multiple non-failed storage devices. 10. A non-transitory computer-readable medium comprising: computer program code comprising: logic which monitors faults in each individual storage device of a plurality of storage devices of a redundant array of inexpensive disks (RAID) group by polling each of the storage devices for a growth list size; logic which calculates a growth list size delta for each of the storage devices by comparing a current growth list size with a previous growth list size; logic which categorizes each growth list size delta as a growth type of a plurality of different growth types; logic which calculates a ratio of occurrence of the different growth types for each of the storage devices; logic which calculates an average growth list size delta for each of the storage devices; logic which calculates a health indicator for each of the storage devices based on the ratio and the average growth list size delta for that storage device, including calculating a health indicator for a non-failed storage device; and logic which prompts replacement of the non-failed storage device based on the health indicator for that non-failed storage device. 11. The non-transitory computer-readable medium of claim 10 wherein the monitor logic performs background media scans. 12. The non-transitory computer-readable medium of claim 11 wherein the calculating logic calculates an error rate indicative of bad block count per total days producing bad blocks. 13. The non-transitory computer-readable medium of claim 12 including logic which compares the error rate with a threshold. 14. The non-transitory computer-readable medium of claim 13 including logic which calculates the threshold by adjusting a base value as a function of at least one of usage history, host scan rate, reassign rate, drive scan rate number of Write operations performed, error code generation history and physical location of errors on the storage device. 15. The non-transitory computer-readable medium of claim 10 including logic which calculates the health indicator of each storage device in terms of life expectancy based on the ratio, the average growth list size delta, and a predetermined maximum growth list size. 16. The non-transitory computer-readable medium of claim 15 including logic which prompts replacement of the non-failed storage device when the predetermined maximum growth list size has been reached or when a least acceptable threshold level for life expectancy has been reached. 17. The non-transitory computer-readable medium of claim 10 including logic which determines a replacement sequence of multiple non-failed storage devices based on the health indicators of the multiple non-failed storage devices. 18. The non-transitory computer-readable medium of claim 10 including logic which determines replacement timing of multiple non-failed storage devices based on the health indicators of the multiple non-failed storage devices. 19. Apparatus comprising: a storage subsystem including a host device and a storage array with a redundant array of inexpensive disks (RAID) group, the storage subsystem including logic which monitors faults in each individual storage device of a plurality of storage devices of the RAID group by polling each of the storage devices for a growth list size, logic which calculates a growth list size delta for each of the storage devices by comparing a current growth list size with a previous growth list size, logic which categorizes each growth list size delta as a growth type of a plurality of different growth types, logic which calculates a ratio of occurrence of the different growth types for each of the storage devices, logic which calculates an average growth list size delta for each of the storage devices, logic which calculates a health indicator for each of the storage devices based on the ratio and the average growth list size delta for that storage device, including calculating a health indicator for a non-failed storage device, and logic which prompts replacement of the non-failed storage device based on the health indicator for that non-failed storage device. 20. The apparatus of claim 19 wherein the monitor logic includes background media scans implemented by individual storage devices of the RAID group. 21. The apparatus of claim 20 wherein the host calculates an error rate indicative of bad block count per total days producing bad blocks. 22. The apparatus of claim 21 wherein the host compares the error rate with a threshold. 23. The apparatus of claim 22 wherein the host calculates the threshold by adjusting a base value as a function of at least one of usage history, host scan rate, reassign rate, drive scan rate number of Write operations performed, error code generation history and physical location of errors on the storage device. 24. The apparatus of claim 19 wherein the host calculates

Assignees

Inventors

Classifications

  • Rebuilding, e.g. when physically replacing a failing disk · CPC title

  • Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9766980B1 cover?
Individual storage devices of a RAID group are monitored for faults. A health indicator for each storage device is calculated based on fault growth rate. Non-failed storage device are swapped out based on the health indicator. Techniques for monitoring the storage devices include background media scans and growth list polling.
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/1092. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).