Providing resiliency to a raid group of storage devices

US10013323B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10013323-B1
Application numberUS-201514868577-A
CountryUS
Kind codeB1
Filing dateSep 29, 2015
Priority dateSep 29, 2015
Publication dateJul 3, 2018
Grant dateJul 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique is directed to providing resiliency to a redundant array of independent disk (RAID) group which includes multiple storage devices. The technique involves operating the RAID group in a normal state in which each storage device is (i) initially online to perform write and read operations and (ii) configured to go offline in response to a respective media error count for that storage device reaching an initial take-offline threshold. The technique further involves receiving a notification that a storage device of the RAID group has encountered a particular error situation. The technique further involves transitioning, in response to the notification, the RAID group to a high resiliency state in which each storage device that is operable is (i) still online to perform write and read operations and (ii) configured to stay online even when the respective media error count for that storage device reaches the initial take-offline threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of providing resiliency to a redundant array of independent disk (RAID) group which includes a plurality of storage devices, the method comprising: operating the RAID group in a normal state in which each storage device is (i) initially online to perform write and read operations and (ii) configured to go offline in response to a respective media error count for that storage device reaching an initial take-offline threshold; receiving a notification that a storage device of the RAID group has encountered a particular error situation; and in response to the notification, transitioning the RAID group from the normal state to a high resiliency degraded state in which each storage device that is operable is (i) still online to perform write and read operations and (ii) configured to stay online even when the respective media error count for that storage device reaches the initial take-offline threshold; wherein receiving the notification includes electronically detecting that a first storage device of the RAID group has gone offline in response to the media error count for the first storage device reaching the initial take-offline threshold; and wherein transitioning includes electronically preventing a second storage device of the RAID group from going offline in response to the media error count for the second storage device reaching the initial take-offline threshold, the RAID group thereby made to operate in the high resiliency degraded state in which the second storage device remains online performing write and read operations even though the media error count of the second storage device has reached the initial take-offline threshold. 2. A computer-implemented method as in claim 1 wherein preventing includes: configuring the second storage device to remain online regardless of a respective media error count for the second storage device. 3. A computer-implemented method as in claim 2 wherein electronically detecting includes: receiving, as the notification, an alert indicating that the first storage device has gone offline in response to a respective media error count for the first storage device reaching the initial take-offline threshold. 4. A computer-implemented method as in claim 2 , further comprising: receiving an alert indicating that a proactive copy operation has begun in response to an accounting for the first storage device reaching an end-of-life threshold, the proactive copy operation involving proactively copying data from the first storage device of the RAID group to a spare storage device. 5. A computer-implemented method as in claim 4 , further comprising: proactively copying data from the first storage device of the RAID group to the spare storage device while all of the storage devices of the RAID group remain online to perform write and read operations. 6. A computer-implemented method as in claim 4 , further comprising: reconstructing data that was not proactively copied from the first storage device of the RAID group to the spare storage device, and storing that reconstructed data on the spare storage device. 7. A computer-implemented method as in claim 2 wherein processing circuitry maintains a hierarchy of objects representing the RAID group; and wherein electronically detecting includes: obtaining, by a RAID group object of the hierarchy, an alert indicating existence of the particular error situation, the RAID group object representing the RAID group. 8. A computer-implemented method as in claim 7 wherein electronically preventing includes: providing a respective don't-take-offline command from the RAID group object of the hierarchy to a storage device object of the hierarchy, the storage device object of the hierarchy representing the second storage device of the RAID group. 9. A computer-implemented method as in claim 1 wherein preventing the second storage device of the RAID group from going offline includes: sending, from processing circuitry and to the second storage device, a command that disables the second storage device from going offline in response to the media error count for the second storage device reaching the initial take-offline threshold. 10. A computer program product having a non-transitory computer readable medium which stores a set of instructions to provide resiliency to a redundant array of independent disk (RAID) group which includes a plurality of storage devices, the set of instructions, when carried out by computerized circuitry, causing the computerized circuitry to perform a method of: operating the RAID group in a normal state in which each storage device is (i) initially online to perform write and read operations and (ii) configured to go offline in response to a respective media error count for that storage device reaching an initial take-offline threshold; receiving a notification that a storage device of the RAID group has encountered a particular error situation; and in response to the notification, transitioning the RAID group from the normal state to a high resiliency degraded state in which each storage device that is operable is (i) still online to perform write and read operations and (ii) configured to stay online even when the respective media error count for that storage device reaches the initial take-offline threshold; wherein receiving the notification includes electronically detecting that a first storage device of the RAID group has gone offline in response to the media error count for the first storage device reaching the initial take-offline threshold; and wherein transitioning includes electronically preventing a second storage device of the RAID group from going offline in response to the media error count for the second storage device reaching the initial take-offline threshold, the RAID group thereby made to operate in the high resiliency degraded state in which the second storage device remains online performing write and read operations even though the media error count of the second storage device has reached the initial take-offline threshold. 11. A computer program product as in claim 10 wherein electronically preventing includes: configuring the second storage device to remain online regardless of a respective media error count for the second storage device. 12. A computer program product as in claim 11 wherein electronically detecting includes: receiving, as the notification, an alert indicating that the first storage device has gone offline in response to a respective media error count for the first storage device reaching the initial take-offline threshold. 13. A computer program product as in claim 11 , further comprising: receiving an alert indicating that a proactive copy operation has begun in response to an accounting for the first storage device reaching an end-of-life threshold, the proactive copy operation involving proactively copying data from the first storage device of the RAID group to a spare storage device. 14. A computer program product as in claim 11 wherein processing circuitry maintains a hierarchy of objects representing the RAID group; and wherein electronically detecting includes: obtaining, by a RAID group object of the hierarchy, an alert indicating existence of the particular error situation, the RAID group object representing the RAID group. 15. A computer program product as in claim 14 wherein electronically preventing includes: providing a respective don't-take-offline command from the RAID group object of the hierarchy to a storage device object of the hierarchy, the storage device object of the hierarc

Assignees

Inventors

Classifications

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title

  • Parity data used in redundant arrays of independent storages, e.g. in RAID systems · CPC title

  • by changing the state or mode of one or more devices · CPC title

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10013323B1 cover?
A technique is directed to providing resiliency to a redundant array of independent disk (RAID) group which includes multiple storage devices. The technique involves operating the RAID group in a normal state in which each storage device is (i) initially online to perform write and read operations and (ii) configured to go offline in response to a respective media error count for that storage d…
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/0727. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).