Accelerated testing using simulated failures in a multi-device storage system

US9727432B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9727432-B1
Application numberUS-201414510860-A
CountryUS
Kind codeB1
Filing dateOct 9, 2014
Priority dateOct 9, 2014
Publication dateAug 8, 2017
Grant dateAug 8, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and method for accelerated testing of a multi-device storage system. In some embodiments, the storage system includes a server adapted to communicate with a user device, and a plurality of data storage devices adapted to store and retrieve data objects from the user device. The server maintains a map structure that describes the data objects stored on the data storage devices. A fault injection module is adapted to induce simulated failures of selected data storage devices in relation to a time-varying failure rate distribution associated with the data storage devices that indicates an observed failure rate over a first time interval. The simulated failures are induced by the fault injection module over a second time interval shorter than the first time interval. The server operates to modify the map structure responsive to the simulated failures.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a storage system comprising a server adapted to communicate with a user device and a plurality of data storage devices adapted to store and retrieve data objects from the user device, the server maintaining a map structure that describes the data objects stored on the data storage devices; and a fault injection module adapted to induce simulated hardware and software based failures of selected data storage devices in relation to a time-varying failure rate distribution associated with the data storage devices that indicates an observed failure rate over a first time interval, the simulated hardware and software based failures induced by the fault injection module over a second time interval shorter than the first time interval, first time interval corresponding to a specified operational service life of the storage system, a ratio between the first time interval and the second time interval being at least 20:1, the server modifying the map structure responsive to the simulated hardware and software based failures. 2. The apparatus of claim 1 , the fault injection module inducing a failure in a selected data storage device by operably disconnecting the selected storage device from the server, instructing the selected data storage devices to erase the data objects stored thereon, and subsequently operably reconnecting the erased selected data storage devices to the server while the selected storage device remain housed within an associated storage enclosure. 3. The apparatus of claim 1 , the fault injection module inducing a failure in a selected data storage device by outputting to a user interface an identification of the selected data storage device to enable a user to physically remove the selected data storage device from the storage system. 4. The apparatus of claim 1 , the time-varying failure rate distribution characterized as a predicted mean time between failure (MTBF) distribution for the data storage devices, the fault injection module determining a total number of predicted failures of the data storage devices in relation to the MTBF distribution, the duration of the first time interval and the total number of data storage devices, the fault injection module inducing the total number of predicted failures during the second time interval. 5. The apparatus of claim 1 , the fault injection module further adapted to simulate failures in the server during the first time interval. 6. The apparatus of claim 1 , the server performing a rebalancing operation upon the map structure responsive to each simulated hardware and software based failure in the data storage devices. 7. The apparatus of claim 1 , the plurality of storage devices arranged among a plurality of storage enclosures each comprising a control board, a power supply and a cooling assembly, the fault injection module further adapted to induce simulated software and hardware based failures in the respective control boards, power supplies and cooling assemblies during the first time interval. 8. The apparatus of claim 7 , the storage system further comprising a plurality of storage servers associated with the plurality of storage enclosures, the fault injection module further adapted to induce simulated hardware and software based failures in the respective storage servers. 9. The apparatus of claim 1 , the fault injection module comprising a failure model block which identifies a total number of failures predicted for the data storage devices during the first interval, a simulation data module which provides simulated data objects during operation of the fault injection module, and a data logging and reporting module which accumulates performance statistics responsive to operation of the storage system to recover from each induced simulated hardware and software based failure. 10. A system comprising: a server adapted to communicate with users over a network, the server comprising a processor and associated memory to maintain a map structure; a plurality of storage enclosures coupled to the server, each storage enclosure housing a plurality of data storage devices which store and retrieve data objects of the users, a control board, a power supply and a cooling assembly, the map structure of the server describing the data objects stored by the data storage devices; a plurality of storage controllers, each storage controller comprising a processor and associated memory to control an associated storage enclosure; and a failure simulation module comprising a processor and associated programming in memory adapted to induce a total number of simulated failures of at least selected data storage devices of the plurality of storage enclosures over an accelerated time interval, the total number of simulated failures induced by altering operation of hardware and software of the at least selected data storage devices during the accelerated time interval equal to or greater than a total number of predicted failures of the data storage devices expected during a longer, service life interval of the storage enclosures, the accelerated time interval is equal to or less than 5% of the service life interval, the server modifying the map structure response to each of the total number of simulated failures. 11. The system of claim 10 , wherein the total number of simulated failures is selected in relation to a predicted mean time between failure (MTBF) distribution for the data storage devices, the duration of the service life interval and the total number of data storage devices. 12. A computer-implemented method comprising: using a storage system to store user data objects in a plurality of data storage devices in accordance with a map structure maintained by a server; predicting a total number of actual failures of the data storage devices over a service life interval thereof responsive to a failure rate distribution associated with the data storage devices; inducing a total number of simulated hardware and software based failures corresponding to the total number of actual failures during an accelerated testing interval shorter than the service life interval, the accelerated time interval is equal to or less than 5% of the service life interval, the server updating the map structure responsive to each of the total number of simulated hardware and software based failures. 13. The method of claim 12 , wherein the total number of actual failures equal to or less than the total number of simulated hardware and software based failures. 14. The method of claim 12 , wherein at least one of the total number of simulated hardware and software based failures is induced by electronically disconnecting a selected storage device from the server, instructing the selected data storage devices to erase the data objects stored thereon, and subsequently operably reconnecting the erased selected data storage devices to the server while the selected storage device remain housed within an associated storage enclosure. 15. The method of claim 12 , wherein at least one of the total number of simulated hardware and software based failures is induced by physically disconnecting and removing an operable data storage device from an associated storage enclosure and replacing the physically disconnected and removed operable data storage device with a replacement data storage device. 16. The method of claim 12 , wherein the time-varying failure rate distribution is characterized as a predicted mean time between failure (MTBF) distribution for the data storage devices, the predicted total number of actual failures of the data storage devices d

Assignees

Inventors

Classifications

  • G06F11/263Primary

    Generation of test inputs, e.g. test vectors, patterns or sequences {; with adaptation of the tested hardware for testability with external testers} · CPC title

  • Performance evaluation by statistical analysis · CPC title

  • in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9727432B1 cover?
Apparatus and method for accelerated testing of a multi-device storage system. In some embodiments, the storage system includes a server adapted to communicate with a user device, and a plurality of data storage devices adapted to store and retrieve data objects from the user device. The server maintains a map structure that describes the data objects stored on the data storage devices. A fault…
Who is the assignee on this patent?
Seagate Technology Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/263. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).