Techniques for maintaining device coordination in a storage cluster system
US-2017123945-A1 · May 4, 2017 · US
US10657013B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10657013-B2 |
| Application number | US-201715832608-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 5, 2017 |
| Priority date | Nov 10, 2015 |
| Publication date | May 19, 2020 |
| Grant date | May 19, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are a computer program product, computer system, and method for smart selection of a storage module to be excluded when a connection between two storage modules is broken. An indication is received from a first storage module that a connection between the first storage module and a second storage module is broken. In response to determining that the second storage module is accessible, values of exclusion criteria for the first storage module are determined and summed to identify a first exclusion total. Then, values of exclusion criteria for the second storage module are determined and summed to identify a second exclusion total. In response to determining that the first exclusion total exceeds the second exclusion total, the second storage node is excluded from the cluster. In response to determining that the second exclusion total exceeds the first exclusion total, the first storage node is excluded from the cluster.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer program product, the non-transitory computer program product comprising a computer readable storage medium having computer readable program code embodied therein that executes to perform operations, the operations comprising: receiving an indication from a first storage module in a cluster of an intra-module connection failure, wherein the intra-module connection failure indicates that a connection between the first storage module and a second storage module in the cluster is broken; in response to receiving the indication of the intra-module connection failure, determining whether to exclude one of the first storage module and the second storage module by: determining whether the second storage module is operational; in response to determining that the second storage module is operational, determining values of first exclusion criteria for the first storage module, wherein the first exclusion criteria are based on storage data plane related exclusion criteria and an impact of the first storage module on reliability and availability of the cluster; summing up the values of the first exclusion criteria to identify a first exclusion total; determining values of second exclusion criteria for the second storage module, wherein the second exclusion criteria are based on the storage data plane related exclusion criteria and an impact of the second storage module on the reliability and the availability of the cluster; summing up the values of the second exclusion criteria to identify a second exclusion total; in response to determining that the first exclusion total exceeds the second exclusion total, excluding the second storage module from the cluster, wherein data of the second storage module is replicated to a third storage module; in response to determining that the second exclusion total exceeds the first exclusion total, excluding the first storage module from the cluster, wherein data of the first storage module is replicated to the third storage module; and in response to determining that the first exclusion total and the second exclusion total are equal, excluding the first storage module from the cluster, wherein the data of the first storage module is replicated to the third storage module; and in response to determining that the second storage module is not operational, excluding the second storage module from the cluster, wherein the data of the second storage module is replicated to the third storage module. 2. The non-transitory computer program product of claim 1 , wherein the first storage module and the second storage module are adjacent in a mesh cluster of storage modules. 3. The non-transitory computer program product of claim 1 , wherein weights are associated with each of the first exclusion criteria and the second exclusion criteria. 4. The non-transitory computer program product of claim 1 , wherein weights are associated with the first storage module and the second storage module. 5. A computer system, comprising: a cluster of storage modules, wherein each of the storage modules includes a processor and a computer readable storage medium having program code; and wherein the program code, when executed on at least one of the storage modules in the cluster, performs operations, the operations comprising: receiving an indication from a first storage module in a cluster of an intra-module connection failure, wherein the intra-module connection failure indicates that a connection between the first storage module and a second storage module in the cluster is broken; in response to receiving the indication of the intra-module connection failure, determining whether to exclude one of the first storage module and the second storage module by: determining whether the second storage module is operational; in response to determining that the second storage module is operational, determining values of first exclusion criteria for the first storage module, wherein the first exclusion criteria are based on storage data plane related exclusion criteria and an impact of the first storage module on reliability and availability of the cluster; summing up the values of the first exclusion criteria to identify a first exclusion total; determining values of second exclusion criteria for the second storage module, wherein the second exclusion criteria are based on the storage data plane related exclusion criteria and an impact of the second storage module on the reliability and the availability of the cluster; summing up the values of the second exclusion criteria to identify a second exclusion total; in response to determining that the first exclusion total exceeds the second exclusion total, excluding the second storage module from the cluster, wherein data of the second storage module is replicated to a third storage module; in response to determining that the second exclusion total exceeds the first exclusion total, excluding the first storage module from the cluster, wherein data of the first storage module is replicated to the third storage module; and in response to determining that the first exclusion total and the second exclusion total are equal, excluding the first storage module from the cluster, wherein the data of the first storage module is replicated to the third storage module; and in response to determining that the second storage module is not operational, excluding the second storage module from the cluster, wherein the data of the second storage module is replicated to the third storage module. 6. The computer system of claim 5 , wherein the first storage module and the second storage module are adjacent in a mesh cluster of storage modules. 7. The computer system of claim 5 , wherein weights are associated with each of the first exclusion criteria and the second exclusion criteria. 8. The computer system of claim 5 , wherein weights are associated with the first storage module and the second storage module. 9. A method, comprising: receiving an indication from a first storage module in a cluster of an intra-module connection failure, wherein the intra-module connection failure indicates that a connection between the first storage module and a second storage module in the cluster is broken; in response to receiving the indication of the intra-module connection failure, determining whether to exclude one of the first storage module and the second storage module by: determining whether the second storage module is operational; in response to determining that the second storage module is operational, determining values of first exclusion criteria for the first storage module, wherein the first exclusion criteria are based on storage data plane related exclusion criteria and an impact of the first storage module on reliability and availability of the cluster; summing up the values of the first exclusion criteria to identify a first exclusion total; determining values of second exclusion criteria for the second storage module, wherein the second exclusion criteria are based on the storage data plane related exclusion criteria and an impact of the second storage module on the reliability and the availability of the cluster; summing up the values of the second exclusion criteria to identify a second exclusion total; in response to determining that the first exclusion total exceeds the second exclusion total, excluding the second storage module from the cluster, wherein data of the second storage module is replicated to a third storage module; in response to determining that the second exclusion total exceeds the first exclusion total, excluding the first storage module from the cluster, wherein data of the first storage module is replicate
Active fault masking without idle spares · CPC title
by reconfiguration of node membership · CPC title
Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title
Real-time · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.