Handling node failure in multi-node data storage systems

US10452502B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10452502-B2
Application numberUS-201815877405-A
CountryUS
Kind codeB2
Filing dateJan 23, 2018
Priority dateJan 23, 2018
Publication dateOct 22, 2019
Grant dateOct 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A multi-node data storage system comprises a first data storage system having an owner node and a backup node in a first location coupled to a second data storage system having an owner node and a backup node in a second location. Each storage system includes a copy of the same data volume. A failure of a node of the multi-node storage system is detected. An outstanding write request to the first storage system is identified. If the owner node in the first storage system fails, it is determined whether the outstanding write corresponds to a host write to the backup node of the first storage system. If so, a retransmission message is sent to the second storage system. Otherwise, the data region associated with the outstanding write request is read from the first storage system, and a resynchronization message is sent to the second storage system.

First claim

Opening claim text (preview).

What is claimed is: 1. In a multi-node storage system comprising a first data storage system having an owner node and a backup node in a first location, and a second data storage system having an owner node and a backup node in a second location, wherein the first data storage system is communicatively coupled to the second data storage system and each of the first and second data storage systems includes a copy of the same data volume, a method comprising: detecting failure of a node of the multi-node data storage system; identifying an outstanding write request to the first data storage system; wherein, if the failed node is the owner node of the first data storage system, the method further comprises: determining whether the outstanding write request corresponds to a host write to the backup node of the first data storage system, if the outstanding write request corresponds to a host write to the backup node of the first data storage system, sending a retransmission message of the outstanding write request to a node of the second data storage system, the retransmission message comprising an indicator that the write request is a retransmission, the data to be written and the region in the data volume where the data is to be written; or if the outstanding write request does not correspond to a host write to the backup node of the first data storage system, reading the data region associated with the outstanding write request from the first data storage system, and sending a resynchronization message to a node of the second data storage system, the resynchronization message comprising an indicator that the write request is a resynchronization, the data read from the first data storage system to be written and the region in the data volume where the data is to be written; receiving a write request message, wherein the type of write request message comprises one of a resynchronization message and a retransmission message sent following failure of a node of the multi-node data storage system, wherein the write request message is received by the second data storage system from the first data storage system and comprises an indicator of the message type, the data to be written and the region in the data volume where the data is to be written; determining whether there is an outstanding write request to the second data storage system that corresponds to the region of the data volume in the received message; wherein, if there is an outstanding write request to the first data storage system that corresponds to the region of the data volume in the received message, the method further comprises: determining whether the write of the received write request message is the same type as the outstanding write request, and if the write of the received write request message is the same type as the outstanding write request, writing the data associated with a predetermined one of the first and second data storage systems to the region in the data volume of the second data storage system last, or if the write of the received write request message is not the same type as the outstanding write request, writing the data associated with the retransmission message to the region in the data volume of the second data storage system after writing the data associated with the resynchronization message. 2. The method of claim 1 , wherein: if the failed node is the owner node of the second data storage system, the method comprises: sending a retransmission message of the outstanding write request to the backup node of the second data storage system, the retransmission message comprising an indicator that the write request is a retransmission, the data to be written and the region in the data volume where the data is to be written. 3. The method of claim 1 , further comprising: in response to a retransmission message, receiving a replication write completion message from the second data storage system, and sending a host write completion message to the host of the first data storage system. 4. The method of claim 1 , further comprising: in response to a resynchronization message, receiving a replication write completion message from the second data storage system. 5. The method of claim 1 , wherein: if the failed node is the backup node of the first data storage system, the method comprises: determining whether the outstanding write request corresponds to a host write to the owner node of the first data storage system, and if the outstanding write request corresponds to a host write to the owner node of the first data storage system, sending a retransmission message of the outstanding write request to a node of the second data storage system, the retransmission message comprising an indicator that the write request is a retransmission, the data to be written and the region in the data volume where the data is to be written; if the failed node is the backup node of the second data storage system, the method comprises: sending a retransmission message of the outstanding write request to the owner node of the second data storage system, the retransmission message comprising an indicator that the write request is a retransmission, the data to be written and the region in the data volume where the data is to be written. 6. The method of claim 1 , wherein: if the failed node is the owner node of the first data storage system, the method is performed by the backup node of the first data storage system, and if the failed node is the owner node of the second data storage system or the backup node of the first or second data storage system, the method is performed by the owner node of the first data storage system. 7. The method of claim 1 , wherein the predetermined one of the first and second data storage systems is designated as the leader. 8. The method of claim 1 , wherein: if there is no outstanding write request to the second data storage system that corresponds to the region of the data volume in the received message, the method further comprises: writing the data associated with the received write request message to the region of the data volume in the second data storage system. 9. The method of claim 1 , further comprising: sending a replication write completion message to the first data storage system. 10. A multi-node data storage system, comprising: a first data storage system having an owner node and a backup node in a first location, and a second data storage system having an owner node and a backup node in a second location, wherein the first data storage system is communicatively coupled to the second data storage system and each of the first and second data storage systems includes a copy of the same data volume; wherein a node of the multi-node data storage system is configured to: detect failure of a node of the multi-node data storage system; identify an outstanding write request to the first data storage system; determine whether the outstanding write request corresponds to a host write to the backup node of the first data storage system; wherein if the failed node is the owner node of the first data storage system, the node is configured to: send a retransmission message of the outstanding write request to a node of the second data storage system, when the outstanding write request corresponds to a host write to the backup node of the first data storage system, the retransmission message comprising an indicator that the write request is a retransmission, the data to be written and the region in the data volume where the data is to be written; read the data region associated with the outstanding write request from the first data storage system and send a resynchronization message to a nod

Assignees

Inventors

Classifications

  • Solving problems relating to consistency · CPC title

  • Management of the data involved in backup or backup restore · CPC title

  • Management of the backup or restore process · CPC title

  • Accessing, addressing or allocating within memory systems or architectures (digital input from, or digital output to record carriers, e.g. to disk storage units, G06F3/06) · CPC title

  • G06F9/5072Primary

    Grid computing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10452502B2 cover?
A multi-node data storage system comprises a first data storage system having an owner node and a backup node in a first location coupled to a second data storage system having an owner node and a backup node in a second location. Each storage system includes a copy of the same data volume. A failure of a node of the multi-node storage system is detected. An outstanding write request to the fir…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F9/5072. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).