Managing software errors in storage systems

US9367405B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9367405-B1
Application numberUS-201313804506-A
CountryUS
Kind codeB1
Filing dateMar 14, 2013
Priority dateMar 14, 2013
Publication dateJun 14, 2016
Grant dateJun 14, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is used in managing software errors in storage systems. It is detected that a first processor of a storage system has a problem performing an I/O on a logical object. The first processor has a first path to the logical object. The problem includes a software error. Whether responsibility of performing the I/O on the logical object is transferred to a second processor of the storage system is evaluated. The second processor has a second path to the logical object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for use in managing software errors in storage systems, the method comprising: detecting that a first processor of a storage system has a problem performing an I/O on a logical object, wherein the first processor has a first path to the logical object, wherein the problem includes a software error, wherein the storage system includes first and second processors, wherein the first processor has a primary responsibility for performing the I/O on the logical object; evaluating whether to transfer responsibility of performing the I/O on the logical object to the second processor of the storage system, wherein the second processor has a second path to the logical object, wherein evaluating whether to transfer responsibility of performing the I/O includes determining whether subsequently performing the I/O on the logical object by the first processor resolves the software error and based on the determination, determining whether the second processor is available for performing the I/O operation on the logical object; based on the determination that the software error has not been resolved and the second processor is available for performing the I/O on the logical object, transferring responsibility of performing the I/O on the logical object to the second processor of the storage system upon receiving an indication from the second processor that the first processor can fail, wherein responsibility of performing the I/O on the logical object is transferred to the second processor upon failure of the first processor, wherein the failure of the first processor includes reboot of the first processor; and based on the determination that the software error has not been resolved and the second processor is not available for performing the I/O on the logical object, making the logical object inaccessible to a user of the logical object without having to reboot the storage system. 2. The method of claim 1 , further comprising: based on the evaluation, determining whether to fail the logical object by making the logical object inaccessible. 3. The method of claim 1 , further comprising: retrying the I/O on the logical object upon detecting the problem. 4. The method of claim 3 , further comprising: based on the result of retrying the I/O, determining whether to fail the first processor of the storage system. 5. The method of claim 1 , wherein the logical object comprises a set of LUNs on disk drives in a RAID group. 6. The method of claim 1 , wherein the logical object is represented by a first object on the first processor of the storage system and by a second object on the second processor of the storage system. 7. The method of claim 6 , wherein the first object communicates with the second object to determine whether the first processor can panic. 8. The method of claim 1 , further comprising: providing a status of the problem to the logical object. 9. The method of claim 2 , further comprising: making the logical object accessible upon successful reboot of the first and second processors of the storage system. 10. The method of claim 1 , wherein the logical object is in communication with a RAID object and the RAID object is in communication with a physical object. 11. A system for use in managing software errors in storage systems, the system comprising: first logic detecting that a first processor of a storage system has a problem performing an I/O on a logical object, wherein the first processor has a first path to the logical object, wherein the problem includes a software error, wherein the storage system includes first and second processors, wherein the first processor has a primary responsibility for performing the I/O on the logical object; second logic evaluating whether to transfer responsibility of performing the I/O on the logical object to the second processor of the storage system, wherein the second processor has a second path to the logical object, wherein evaluating whether to transfer responsibility of performing the I/O includes determining whether subsequently performing the I/O on the logical object by the first processor resolves the software error and based on the determination, determining whether the second processor is available for performing the I/O operation on the logical object; third logic transferring responsibility, based on the determination that the software error has not been resolved and the second processor is available for performing the I/O on the logical object, of performing the I/O on the logical object to the second processor of the storage system upon receiving an indication from the second processor that the first processor can fail, wherein responsibility of performing the I/O on the logical object is transferred to the second processor upon failure of the first processor, wherein the failure of the first processor includes reboot of the first processor; and fourth logic making, based on the determination that the software error has not been resolved and the second processor is not available for performing the I/O on the logical object, making the logical object inaccessible to a user of the logical object without having to reboot the storage system. 12. The system of claim 11 , further comprising: fifth logic determining, based on the evaluation, whether to fail the logical object by making the logical object inaccessible. 13. The system of claim 11 , further comprising: fifth logic retrying the I/O on the logical object upon detecting the problem. 14. The system of claim 13 , further comprising: sixth logic determining, based on the result of retrying the I/O, whether to fail the first processor of the storage system. 15. The system of claim 11 , wherein the logical object comprises a set of LUNs on disk drives in a RAID group. 16. The system of claim 11 , wherein the logical object is represented by a first object on the first processor of the storage system and by a second object on the second processor of the storage system. 17. The system of claim 16 , wherein the first object communicates with the second object to determine whether the first processor can panic. 18. The system of claim 11 , further comprising: fifth logic providing a status of the problem to the logical object. 19. The system of claim 12 , further comprising: sixth logic making the logical object accessible upon successful reboot of the first and second processors of the storage system. 20. The system of claim 11 , wherein the logical object is in communication with a RAID object and the RAID object is in communication with a physical object.

Assignees

Inventors

Classifications

  • in transactions (updating of structured data in databases G06F16/23) · CPC title

  • G06F11/142Primary

    Reconfiguring to eliminate the error (group management mechanisms in a peer-to-peer network H04L67/1044) · CPC title

  • G06F11/07Primary

    Responding to the occurrence of a fault, e.g. fault tolerance · CPC title

  • at system level · CPC title

  • Techniques of failing over between control units · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9367405B1 cover?
A method is used in managing software errors in storage systems. It is detected that a first processor of a storage system has a problem performing an I/O on a logical object. The first processor has a first path to the logical object. The problem includes a software error. Whether responsibility of performing the I/O on the logical object is transferred to a second processor of the storage sys…
Who is the assignee on this patent?
Emc Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/1474. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).