System and method for predicting multiple-disk failures
US-9141457-B1 · Sep 22, 2015 · US
US9535779B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9535779-B1 |
| Application number | US-201414341669-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jul 25, 2014 |
| Priority date | Jul 25, 2014 |
| Publication date | Jan 3, 2017 |
| Grant date | Jan 3, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for determining vulnerability of disks are described herein. According to one embodiment, for each of a plurality of disks representing a redundant array of independent disks (RAID), a reallocated sector count associated with the disk is obtained, the reallocated sector count representing a number of sectors that have been reallocated due to an error of a storage transaction to the disk. A failure probability of the disk given the obtained reallocated sector count is determined using a predictive model, wherein the predictive model was generated based on history operating data of a set of known disks. Thereafter, a failure probability of at least two of the disks in the RAID is determined based on the failure probability of each of the disks to determine vulnerability of the RAID.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for determining vulnerability of disks, comprising: for each of a plurality of disks representing a redundant array of independent disks (RAID), obtaining a reallocated sector count associated with the disk, the reallocated sector count representing a number of sectors that have been reallocated due to an error of a storage transaction to the disk, and determining a failure probability of the disk given the obtained reallocated sector count using a predictive model, including calculating a second probability of a failed disk that has a reallocated sector count greater than the obtained reallocated sector count, wherein the predictive model was generated based on history operating data of a set of known disks; and determining a failure probability of at least two of the disks in the RAID based on the failure probability of each of the disks to determine vulnerability of the RAID. 2. The method of claim 1 , wherein determining a failure probability of the disk given the reallocated sector count comprises calculating a first probability of a disk that is a failed disk based on the history operating data of the known disks. 3. The method of claim 2 , further comprising calculating a third probability of a disk that has a reallocated sector count greater than the obtained reallocated sector count. 4. The method of claim 3 , wherein the failure probability of the disk given the reallocated sector count is determined based on the first probability, the second probability, and the third probability. 5. The method of claim 4 , wherein the failure probability of the disk given the reallocated sector count is determined based on the first probability multiplied by the second probability and divided by the third probability. 6. The method of claim 1 , wherein the failure probability of at least two of the disks in the RAID is determined at a management server coupled to the disks over a network, and wherein the reallocated sector count of each of the disks in the RAID is periodically collected by the management server from each disk. 7. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for determining vulnerability of disks, the operations comprising: for each of a plurality of disks representing a redundant array of independent disks (RAID), obtaining a reallocated sector count associated with the disk, the reallocated sector count representing a number of sectors that have been reallocated due to an error of a storage transaction to the disk, and determining a failure probability of the disk given the obtained reallocated sector count using a predictive model, including calculating a second probability of a failed disk that has a reallocated sector count greater than the obtained reallocated sector count, wherein the predictive model was generated based on history operating data of a set of known disks; and determining a failure probability of at least two of the disks in the RAID based on the failure probability of each of the disks to determine vulnerability of the RAID. 8. The non-transitory machine-readable medium of claim 7 , wherein determining a failure probability of the disk given the reallocated sector count comprises calculating a first probability of a disk that is a failed disk based on the history operating data of the known disks. 9. The non-transitory machine-readable medium of claim 8 , wherein the operations further comprise calculating a third probability of a disk that has a reallocated sector count greater than the obtained reallocated sector count. 10. The non-transitory machine-readable medium of claim 9 , wherein the failure probability of the disk given the reallocated sector count is determined based on the first probability, the second probability, and the third probability. 11. The non-transitory machine-readable medium of claim 10 , wherein the failure probability of the disk given the reallocated sector count is determined based on the first probability multiplied by the second probability and divided by the third probability. 12. The non-transitory machine-readable medium of claim 7 , wherein the failure probability of at least two of the disks in the RAID is determined at a management server coupled to the disks over a network, and wherein the reallocated sector count of each of the disks in the RAID is periodically collected by the management server from each disk. 13. A system for determining vulnerability of disks, comprising: a processor; a data collector executed by the processor to for each of a plurality of disks representing a redundant array of independent disks (RAID), obtain a reallocated sector count associated with the disk, the reallocated sector count representing a number of sectors that have been reallocated due to an error of a storage transaction to the disk; an analysis module executed by the processor to determine a failure probability of the disk given the obtained reallocated sector count using a predictive model, including calculating a second probability of a failed disk that has a reallocated sector count greater than the obtained reallocated sector count, wherein the predictive model was generated based on history operating data of a set of known disks; and a disk failure predictor executed by the processor to determine a failure probability of at least two of the disks in the RAID based on the failure probability of each of the disks to determine vulnerability of the RAID. 14. The system of claim 13 , wherein determining a failure probability of the disk given the reallocated sector count comprises calculating a first probability of a disk that is a failed disk based on the history operating data of the known disks. 15. The system of claim 14 , wherein the analysis module is to calculate a third probability of a disk that has a reallocated sector count greater than the obtained reallocated sector count. 16. The system of claim 15 , wherein the failure probability of the disk given the reallocated sector count is determined based on the first probability, the second probability, and the third probability. 17. The system of claim 16 , wherein the failure probability of the disk given the reallocated sector count is determined based on the first probability multiplied by the second probability and divided by the third probability. 18. The system of claim 13 , wherein the failure probability of at least two of the disks in the RAID is determined at a management server coupled to the disks over a network, and wherein the reallocated sector count of each of the disks in the RAID is periodically collected by the management server from each disk.
Error detection; Error correction; Monitoring (error detection, correction or monitoring in information storage based on relative movement between record carrier and transducer G11B20/18; monitoring, i.e. supervising the progress of recording or reproducing G11B27/36; in static stores G11C29/00) · CPC title
by exceeding a count or rate limit, e.g. word- or bit count limit · CPC title
Reliability or availability analysis · CPC title
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
where the computing system component is a storage system, e.g. DASD based or network based (digital input from or digital output to record carriers G06F3/06; digital recording or reproducing G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.