Survival site load balancing
US-2015269042-A1 · Sep 24, 2015 · US
US9836368B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9836368-B2 |
| Application number | US-201514920334-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 22, 2015 |
| Priority date | Oct 22, 2015 |
| Publication date | Dec 5, 2017 |
| Grant date | Dec 5, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One or more techniques and/or computing devices are provided for automatic switchover implementation. For example, a first storage controller, of a first storage cluster, may have a disaster recovery relationship with a second storage controller of a second storage cluster. In the event the first storage controller fails, the second storage controller may automatically switchover operation from the first storage controller to the second storage controller for providing clients with failover access to data previously accessible to the clients through the first storage controller. The second storage controller may detect, cross-cluster, a failure of the first storage controller utilizing remote direct memory access (RDMA) read operations to access heartbeat information, heartbeat information stored within a disk mailbox, and/or service processor traps. In this way, the second storage controller may efficiently detect failure of the first storage controller to trigger automatic switchover for non-disruptive client access to data.
Opening claim text (preview).
What is claimed is: 1. A method comprising: determining that a memory section is designated for heartbeat information exchange from a first storage controller within a first storage cluster to a second storage controller within a second storage cluster, the second storage controller configured as a disaster recovery partner for the first storage controller; performing a remote direct memory access read operation to access the memory section for obtaining a current heartbeat status of the first storage controller; determining that the current heartbeat status indicates a failure of the first storage controller; sending a communication signal from the second storage controller to the first storage cluster; initiating an automatic switchover operation from the first storage controller to the second storage controller for providing clients with failover access to data previously accessible to the clients through the first storage controller before switchover based upon responsiveness to the communication signal indicating that the failure is not a false trigger; and refraining from initiating the automatic switchover operation based upon the responsiveness to the communication signal indicating that the failure is the false trigger. 2. The method of claim 1 , wherein the current heartbeat status specifies a storage controller reboot as the failure. 3. The method of claim 1 , wherein the current heartbeat status specifies a state transition of the first storage controller. 4. The method of claim 1 , wherein the heartbeat information exchange corresponds to a series of sequence numbers used to indicate progress of the first storage controller. 5. The method of claim 1 , wherein the current heartbeat status specifies a software panic. 6. The method of claim 1 , comprising: determining that the failure is not the false trigger; initiating a manual switchover operation and not the automatic switchover operation based upon a determination that storage and a main controller of the first storage system are not available; and initiating the automatic switchover operation based upon a determination that the storage and the main controller are available. 7. The method of claim 1 , comprising: initiating the automatic switchover operation based upon a write caching synchronization state between the first storage controller and the second storage controller indicating a synchronous state; and refraining from initiating the automatic switchover operation based upon the write caching synchronization state indicating a non-synchronous state. 8. The method of claim 7 , comprising: reading the write caching synchronization state from a first disk mailbox of the first storage controller. 9. The method of claim 1 , wherein the first storage cluster is configured according to a single controller cluster configuration and the second storage cluster is configured according to the single controller cluster configuration. 10. The method of claim 1 , comprising: specifying that a first disk mailbox is to be used for heartbeat information exchange from the first storage controller to the second storage controller; reading a second current heartbeat status from the first disk mailbox; and initiating the automatic switchover operation based upon both the current heartbeat status and the second current heartbeat status indicating the failure. 11. The method of claim 10 , comprising: determining the failure as a power loss failure based upon both the current heartbeat status and the second current heartbeat status indicating the failure. 12. The method of claim 10 , comprising: initiating the automatic switchover operation after a threshold timeout based upon both the current heartbeat status and the second current heartbeat status indicating the failure. 13. A non-transitory machine readable medium having stored thereon instructions for performing a method comprising machine executable code which when executed by at least one machine, causes the machine to: determine that a first disk mailbox and a memory section are designated to be used for heartbeat information exchange from a first storage controller within a first storage cluster to a second storage controller within a second storage cluster, the second storage controller configured as a disaster recovery partner for the first storage controller; read a current heartbeat status from the first disk mailbox; perform a remote direct memory access read operation to access the memory section for obtaining a second current heartbeat status of the first storage controller; and initiate an automatic switchover operation from the first storage controller to the second storage controller for providing clients with failover access to data previously accessible to the clients through the first storage controller before switchover based upon both the current heartbeat status and the second current heartbeat status indicating a failure. 14. The non-transitory machine readable medium of claim 13 , wherein the current heartbeat status specifies a storage controller halt. 15. The non-transitory machine readable medium of claim 13 , wherein the machine executable code causes the machine to: initiate the automatic switchover operation after a timeout. 16. The non-transitory machine readable medium of claim 13 , wherein the machine executable code causes the machine to: send a communication signal from the second storage controller to the first storage cluster; evaluate responsiveness to the communication signal to determine whether the failure is a false trigger; initiate the automatic switchover operation based upon a determination that the failure is not the false trigger; and refrain from initiating the automatic switchover operation based upon a determination that the failure is the false trigger. 17. The non-transitory machine readable medium of claim 16 , wherein the machine executable code causes the machine to: determine whether storage and a main controller of the first storage cluster are available based upon the determination that the failure is not the false trigger; initiate a manual switchover operation and not the automatic switchover operation based upon the storage and the main controller not being available; and initiate the automatic switchover operation based upon the storage and the main controller being available. 18. The non-transitory machine readable medium of claim 13 , wherein the machine executable code causes the machine to: evaluate a write caching synchronization state between the first storage controller and the second storage controller; initiate the automatic switchover operation based upon the write caching synchronization state indicating a synchronous state; and refrain from initiating the automatic switchover operation based upon the write caching synchronization state indicating a non-synchronous state. 19. A computing device comprising: a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to: determine that a memory section has been designated for heartbeat information exchange from a first storage controller within a first storage cluster to a second storage controller within a second storage cluster, the second storage controller configured as a disaster recovery partner for the first storage controller; perform a remote direct memor
Techniques of failing over between control units · CPC title
by exceeding a time limit, i.e. time-out, e.g. watchdogs · CPC title
Management of state, configuration or failover · CPC title
in a storage system, e.g. in a DASD or network based storage system (drivers for digital recording or reproducing units G06F3/06; circuits for error detection or correction within digital recording or reproducing units G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Real-time · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.