Third vote consensus in a cluster using shared storage devices

US10664366B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10664366-B2
Application numberUS-201715813941-A
CountryUS
Kind codeB2
Filing dateNov 15, 2017
Priority dateOct 27, 2015
Publication dateMay 26, 2020
Grant dateMay 26, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A third vote consensus technique enables a first node, i.e., a surviving node, of a two-node cluster to establish a quorum and continue to operate in response to failure of a second node of the cluster. Each node maintains configuration information organized as a cluster database (CDB) which may be changed according to a consensus-based protocol. Changes to the CDB are logged on a third copy file system (TCFS) stored on a local copy of TCFS (L-TCFS). A shared copy of the TCFS (i.e., S-TCFS) may be stored on shared storage devices of one or more storage arrays coupled to the nodes. The local copy of the TCFS (i.e., L-TCFS) represents a quorum vote for each node of the cluster, while the S-TCFS represents an additional “tie-breaker” vote of a consensus-based protocol. The additional vote may be obtained from the shared storage devices by the surviving node as a third vote to establish the quorum and enable the surviving node to cast two of three votes (i.e., a majority of votes) needed to continue operation of the cluster. That is, the majority of votes allows the surviving node to update the CDB with the configuration information changes so as to continue proper operation of the cluster.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a cluster having a plurality of nodes, each node having a processor; a storage array having one or more shared storage devices coupled to each node of the cluster; a local storage device coupled to each node of the cluster; and a storage I/O stack executing on the processor of each node of the cluster, the storage I/O stack configured to: maintain a local copy of configuration information of the cluster on the local storage device, the local copy of the configuration information representing a quorum vote for each node of the cluster; obtain ownership of a shared copy of the configuration information stored on the storage array by an owner node of the cluster, the shared copy of the configuration information representing an additional vote used to establish a quorum for the cluster; and in response to the owner node failing, claim the additional vote represented by the shared copy of the configuration information at a surviving node of the cluster by fencing the failed node from the shared storage devices of the storage array. 2. The system of claim 1 wherein the additional vote is a tie-breaker vote that enables the surviving node to cast a majority of votes needed to continue operation of the cluster. 3. The system of claim 2 wherein the surviving node casts the majority of the votes to update a configuration database of the cluster with changes to the configuration information. 4. The system of claim 3 wherein the configuration information is embodied as a sequence of configuration updates resulting in one or more update events using a consensus-based protocol. 5. The system of claim 4 wherein the update events are organized as a cluster-wide consensus log representing an order of the update events as they occur and commit. 6. The system of claim 2 wherein the additional vote is a tie-breaker vote that is claimed by the surviving node through a predetermined operation that prevents the failed node from attempting to claim the shared copy of the configuration information stored on the storage array. 7. The system of claim 6 wherein the predetermined operation is a get-out-the-vote operation driven by a consensus-based protocol. 8. The system of claim 1 wherein fencing of the failed node is implemented using disk reservations asserted on the shared storage devices by the surviving node. 9. The system of claim 8 wherein the disk reservations are asserted in a predetermined order to prevent a situation where each node fences a subset of the shared storage devices, resulting in deadlock in the cluster. 10. The system of claim 1 wherein the plurality of nodes is organized as a two-node cluster. 11. The system of claim 1 wherein the plurality of nodes is organized as a four-node cluster configured as a four-way high availability group. 12. The system of claim 1 wherein the plurality of nodes is organized as a four-node cluster configured as two high availability pairs of nodes. 13. A method comprising: organizing a plurality of nodes as a cluster, each node of the cluster coupled to a storage array having one or more shared storage devices, each node further coupled to a local storage device; maintaining a local copy of configuration information of the cluster on the local storage device, the local copy of the configuration information representing a quorum vote for each node of the cluster; obtaining ownership of a shared copy of the configuration information stored on the storage array by an owner node of the cluster, the shared copy of the configuration information representing an additional vote used to establish a quorum for the cluster; and in response to the owner node failing, claiming the additional vote represented by the shared copy of the configuration information at a surviving node of the cluster by fencing the failed node from the shared storage devices of the storage array. 14. The method of claim 13 wherein the additional vote is a tie-breaker vote that enables the surviving node to cast a majority of votes needed to continue operation of the cluster. 15. The method of claim 14 wherein the surviving node casts the majority of the votes to update a configuration database of the cluster with changes to the configuration information. 16. The method of claim 15 further comprising embodying the configuration information as a sequence of configuration updates resulting in one or more update events using a consensus-based protocol. 17. The method of claim 16 further comprising organizing the update events as a cluster-wide consensus log representing an order of the update events as they occur and commit. 18. The method of claim 13 further comprising using disk reservations asserted on the shared storage devices by the surviving node to implement the fencing of the failed node. 19. The method of claim 18 wherein using the disk reservations comprises asserting the disk reservations in a predetermined order to prevent a situation where each node fences a subset of the shared storage devices, resulting in deadlock in the cluster. 20. A non-transitory computer readable medium including program instructions for execution on a processor of a storage system, the program instructions configured to: organize a plurality of nodes as a cluster, each node of the cluster coupled to a storage array having one or more shared storage devices, each node further coupled to a local storage device; maintain a local copy of configuration information of the cluster on the local storage device, the local copy of the configuration information representing a quorum vote for each node of the cluster; obtain ownership of a shared copy of the configuration information stored on the storage array by an owner node of the cluster, the shared copy of the configuration information representing an additional vote used to establish a quorum for the cluster; and in response to the owner node failing, claim the additional vote represented by the shared copy of the configuration information at a surviving node of the cluster by fencing the failed node from the shared storage devices of the storage array.

Assignees

Inventors

Classifications

  • switching over of hardware resources · CPC title

  • by reconfiguration of node membership · CPC title

  • where the redundant components share persistent storage (G06F11/2043 takes precedence) · CPC title

  • Real-time · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10664366B2 cover?
A third vote consensus technique enables a first node, i.e., a surviving node, of a two-node cluster to establish a quorum and continue to operate in response to failure of a second node of the cluster. Each node maintains configuration information organized as a cluster database (CDB) which may be changed according to a consensus-based protocol. Changes to the CDB are logged on a third copy fi…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/2033. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).