What technology area does this patent fall under?

Primary CPC classification G06F11/0727. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for reliably forecasting storage disk failure

US11599402B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11599402-B2
Application number	US-201916529499-A
Country	US
Kind code	B2
Filing date	Aug 1, 2019
Priority date	Aug 1, 2019
Publication date	Mar 7, 2023
Grant date	Mar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for reliably forecasting storage disk failure. Specifically, the method and system disclosed herein entail predicting whether one or more storage disks may fail within a future time period. Further, the storage disk failure forecasts may rely on machine learning classification coupled with prediction reliability scoring.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for forecasting storage disk failure, comprising: obtaining, from an auto-support database, a raw dataset comprising a first set of data tuples, each comprising a feature set and a disk health class, the data tuples include SMART data and SCSI error codes for a plurality of different physical storage disks that have been collected over a preset amount of time; reducing the raw dataset to a select dataset comprising a second set of data tuples, each comprising a feature subset of the feature set and the disk health class; inputting a set of missing data values in the select dataset to obtain the select-gapless dataset comprising a gapless version of the second set of data tuples; initializing a classification learning model; applying incremental learning to the classification learning model using the select-gapless dataset to obtain a set of disk failure forecasts for a set of storage disks; and performing a proactive response based on the set of disk failure forecasts, wherein the proactive response comprises replacing at least one disk from the set of storage disks. 2. The method of claim 1 , further comprising: prior to reducing the raw dataset to the select dataset: identifying the feature subset of the feature set using a set of feature selection algorithms, wherein the feature subset comprises features commonly selected by the set of feature selection algorithms, wherein the raw dataset is reduced based on the feature subset. 3. The method of claim 2 , wherein the set of feature selection algorithms comprises an extreme gradient boosting (XGB) algorithm, a light gradient boosting model (LGBM) algorithm, an extra tree algorithm, a decision tree algorithm, a gradient boost algorithm, an adaptive boosting (AdaBoost) algorithm, and a random forest algorithm. 4. The method of claim 1 , wherein the set of missing data values is imputed using median substitution. 5. The method of claim 1 , wherein the classification learning model is a stochastic gradient descent classifier. 6. The method of claim 1 , wherein the proactive response further comprises alerting a storage system administrator. 7. The method of claim 1 , further comprising: prior to performing the proactive response: applying a prediction reliability algorithm to the set of disk failure forecasts to obtain a set of confidence-credibility scores; and ranking the set of disk failure forecasts based on the set of confidence-credibility scores to obtain a ranked set of disk failure forecasts, wherein the proactive response is performed further based on the ranked set of disk failure forecasts. 8. The method of claim 7 , wherein the prediction reliability algorithm is an inductive conformal prediction (ICP) framework. 9. A system, comprising: an auto-support database operatively connected to a disk failure forecasting service, the disk failure forecasting service comprising a computer processor configured to: obtain, from an auto-support database, a raw dataset comprising a first set of data tuples, each comprising a feature set and a disk health class, the data tuples include SMART data and SCSI error codes for a plurality of different physical storage disks that have been collected over a preset amount of time; reduce the raw dataset to a select dataset comprising a second set of data tuples, each comprising a feature subset of the feature set and the disk health class; input a set of missing data values in the select dataset to obtain the select-gapless dataset comprising a gapless version of the second set of data tuples; initialize a classification learning model; apply incremental learning to the classification learning model using the select-gapless dataset to obtain a set of disk failure forecasts for a set of storage disks; and perform a proactive response based on the set of disk failure forecasts, wherein the proactive response comprises replacing at least one disk from the set of storage disks. 10. The system of claim 9 , further comprising: a storage system operatively connected to the auto-support database, and comprising a plurality of storage disks, wherein the raw dataset comprises historical configuration and performance information for the plurality of storage disks. 11. The system of claim 9 , further comprising: the sales client, wherein the sales client is operatively connected to the disk failure forecasting service. 12. The system of claim 9 , further comprising: an admin client operatively connected to the disk failure forecasting service, wherein the proactive response comprises issuing an alert to the admin client. 13. A non-transitory computer readable medium (CRM) comprising computer readable program code, which when executed by a computer processor, enables the computer processor to: obtain, from an auto-support database, a raw dataset comprising a first set of data tuples, each comprising a feature set and a disk health class, the data tuples include SMART data and SCSI error codes for a plurality of different physical storage disks that have been collected over a preset amount of time; reduce the raw dataset to a select dataset comprising a second set of data tuples, each comprising a feature subset of the feature set and the disk health class; input a set of missing data values in the select dataset to obtain the select-gapless dataset comprising a gapless version of the second set of data tuples; initialize a classification learning model; apply incremental learning to the classification learning model using the select-gapless dataset to obtain a set of disk failure forecasts for a set of storage disks; and perform a proactive response based on the set of disk failure forecasts, wherein the proactive response comprises replacing at least one disk from the set of storage disks. 14. The non-transitory CRM of claim 13 , further comprising computer readable program code, which when executed by the computer processor, enables the computer processor to reduce the raw dataset to the select dataset, by: identifying the feature subset of the feature set using a set of feature selection algorithms; and reducing the raw dataset based on the feature subset, wherein the feature subset comprises features commonly selected by the set of feature selection algorithms. 15. The non-transitory CRM of claim 13 , wherein the classification learning model is a stochastic gradient descent classifier. 16. The non-transitory CRM of claim 13 , further comprising computer readable program code, which when executed by the computer processor, enables the computer processor, prior to performing the proactive response, to: apply a prediction reliability algorithm to the set of disk failure forecasts to obtain a set of confidence-credibility scores; and rank the set of disk failure forecasts based on the set of confidence-credibility scores to obtain a ranked set of disk failure forecasts, wherein the proactive response is performed further based on the ranked set of disk failure forecasts.

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06F11/0727Primary
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
G06N20/00
Machine learning · CPC title
G06F11/008
Reliability or availability analysis · CPC title
G06F3/0616Primary
in relation to life time, e.g. increasing Mean Time Between Failures [MTBF] · CPC title
G06F3/0689
Disk arrays, e.g. RAID, JBOD · CPC title

Patent family

Related publications grouped by family.

View patent family 74260150

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599402B2 cover?: A method and system for reliably forecasting storage disk failure. Specifically, the method and system disclosed herein entail predicting whether one or more storage disks may fail within a future time period. Further, the storage disk failure forecasts may rely on machine learning classification coupled with prediction reliability scoring.
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06F11/0727. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).