Method and system for reliably forecasting storage disk failure
US-2021034450-A1 · Feb 4, 2021 · US
US11599402B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599402-B2 |
| Application number | US-201916529499-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2019 |
| Priority date | Aug 1, 2019 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for reliably forecasting storage disk failure. Specifically, the method and system disclosed herein entail predicting whether one or more storage disks may fail within a future time period. Further, the storage disk failure forecasts may rely on machine learning classification coupled with prediction reliability scoring.
Opening claim text (preview).
What is claimed is: 1. A method for forecasting storage disk failure, comprising: obtaining, from an auto-support database, a raw dataset comprising a first set of data tuples, each comprising a feature set and a disk health class, the data tuples include SMART data and SCSI error codes for a plurality of different physical storage disks that have been collected over a preset amount of time; reducing the raw dataset to a select dataset comprising a second set of data tuples, each comprising a feature subset of the feature set and the disk health class; inputting a set of missing data values in the select dataset to obtain the select-gapless dataset comprising a gapless version of the second set of data tuples; initializing a classification learning model; applying incremental learning to the classification learning model using the select-gapless dataset to obtain a set of disk failure forecasts for a set of storage disks; and performing a proactive response based on the set of disk failure forecasts, wherein the proactive response comprises replacing at least one disk from the set of storage disks. 2. The method of claim 1 , further comprising: prior to reducing the raw dataset to the select dataset: identifying the feature subset of the feature set using a set of feature selection algorithms, wherein the feature subset comprises features commonly selected by the set of feature selection algorithms, wherein the raw dataset is reduced based on the feature subset. 3. The method of claim 2 , wherein the set of feature selection algorithms comprises an extreme gradient boosting (XGB) algorithm, a light gradient boosting model (LGBM) algorithm, an extra tree algorithm, a decision tree algorithm, a gradient boost algorithm, an adaptive boosting (AdaBoost) algorithm, and a random forest algorithm. 4. The method of claim 1 , wherein the set of missing data values is imputed using median substitution. 5. The method of claim 1 , wherein the classification learning model is a stochastic gradient descent classifier. 6. The method of claim 1 , wherein the proactive response further comprises alerting a storage system administrator. 7. The method of claim 1 , further comprising: prior to performing the proactive response: applying a prediction reliability algorithm to the set of disk failure forecasts to obtain a set of confidence-credibility scores; and ranking the set of disk failure forecasts based on the set of confidence-credibility scores to obtain a ranked set of disk failure forecasts, wherein the proactive response is performed further based on the ranked set of disk failure forecasts. 8. The method of claim 7 , wherein the prediction reliability algorithm is an inductive conformal prediction (ICP) framework. 9. A system, comprising: an auto-support database operatively connected to a disk failure forecasting service, the disk failure forecasting service comprising a computer processor configured to: obtain, from an auto-support database, a raw dataset comprising a first set of data tuples, each comprising a feature set and a disk health class, the data tuples include SMART data and SCSI error codes for a plurality of different physical storage disks that have been collected over a preset amount of time; reduce the raw dataset to a select dataset comprising a second set of data tuples, each comprising a feature subset of the feature set and the disk health class; input a set of missing data values in the select dataset to obtain the select-gapless dataset comprising a gapless version of the second set of data tuples; initialize a classification learning model; apply incremental learning to the classification learning model using the select-gapless dataset to obtain a set of disk failure forecasts for a set of storage disks; and perform a proactive response based on the set of disk failure forecasts, wherein the proactive response comprises replacing at least one disk from the set of storage disks. 10. The system of claim 9 , further comprising: a storage system operatively connected to the auto-support database, and comprising a plurality of storage disks, wherein the raw dataset comprises historical configuration and performance information for the plurality of storage disks. 11. The system of claim 9 , further comprising: the sales client, wherein the sales client is operatively connected to the disk failure forecasting service. 12. The system of claim 9 , further comprising: an admin client operatively connected to the disk failure forecasting service, wherein the proactive response comprises issuing an alert to the admin client. 13. A non-transitory computer readable medium (CRM) comprising computer readable program code, which when executed by a computer processor, enables the computer processor to: obtain, from an auto-support database, a raw dataset comprising a first set of data tuples, each comprising a feature set and a disk health class, the data tuples include SMART data and SCSI error codes for a plurality of different physical storage disks that have been collected over a preset amount of time; reduce the raw dataset to a select dataset comprising a second set of data tuples, each comprising a feature subset of the feature set and the disk health class; input a set of missing data values in the select dataset to obtain the select-gapless dataset comprising a gapless version of the second set of data tuples; initialize a classification learning model; apply incremental learning to the classification learning model using the select-gapless dataset to obtain a set of disk failure forecasts for a set of storage disks; and perform a proactive response based on the set of disk failure forecasts, wherein the proactive response comprises replacing at least one disk from the set of storage disks. 14. The non-transitory CRM of claim 13 , further comprising computer readable program code, which when executed by the computer processor, enables the computer processor to reduce the raw dataset to the select dataset, by: identifying the feature subset of the feature set using a set of feature selection algorithms; and reducing the raw dataset based on the feature subset, wherein the feature subset comprises features commonly selected by the set of feature selection algorithms. 15. The non-transitory CRM of claim 13 , wherein the classification learning model is a stochastic gradient descent classifier. 16. The non-transitory CRM of claim 13 , further comprising computer readable program code, which when executed by the computer processor, enables the computer processor, prior to performing the proactive response, to: apply a prediction reliability algorithm to the set of disk failure forecasts to obtain a set of confidence-credibility scores; and rank the set of disk failure forecasts based on the set of confidence-credibility scores to obtain a ranked set of disk failure forecasts, wherein the proactive response is performed further based on the ranked set of disk failure forecasts.
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Machine learning · CPC title
Reliability or availability analysis · CPC title
in relation to life time, e.g. increasing Mean Time Between Failures [MTBF] · CPC title
Disk arrays, e.g. RAID, JBOD · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.