Cluster management in large-scale storage systems

US12099719B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12099719-B2
Application numberUS-202218090792-A
CountryUS
Kind codeB2
Filing dateDec 29, 2022
Priority dateDec 29, 2022
Publication dateSep 24, 2024
Grant dateSep 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for implementing a distributed hierarchical cluster management system. A system comprises a data storage system and a cluster management system. The data storage system comprises a cluster of storage nodes that is partitioned into a plurality of subclusters of storage nodes. The cluster management system is deployed on at least some of the storage nodes of the data storage system, and comprises a global management system and a plurality of local management subsystems. Each local management subsystem is configured to manage a respective subcluster of the plurality of subclusters of storage nodes, and communicate with the global management system to provide subcluster status information to the global management system regarding a current state and configuration of the respective subcluster of storage nodes. The global management system is configured to manage the cluster of storage nodes using the subcluster status information provided by the local management subsystems.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a data storage system comprising a cluster of storage nodes that is partitioned into a plurality of subclusters of storage nodes; and a cluster management system deployed on at least some of the storage nodes of the data storage system, wherein the cluster management system comprises a global management system and a plurality of local management subsystems; wherein each of the local management subsystems is configured to manage a respective subcluster of storage nodes of the plurality of subclusters of storage nodes independent of other ones of the local management subsystems, and is further configured to communicate with the global management system to provide subcluster status information to the global management system regarding a current state and configuration of the respective subcluster of storage nodes; and wherein the global management system is configured to manage the cluster of storage nodes using the subcluster status information provided by the local management subsystems. 2. The system of claim 1 , wherein each subcluster of storage nodes of the plurality of subclusters of storage nodes comprises a logical protection domain. 3. The system of claim 1 , wherein the global management system is configured to manage and communicate with client drivers of an application layer to provide connectivity information to the client drivers which enables the client drivers to connect to storage nodes of the cluster storage nodes and to access storage volumes of a virtual storage layer comprising aggregated storage capacity of the cluster of storage nodes. 4. The system of claim 1 , wherein configuring each local management subsystem to manage a respective subcluster of storage nodes of the plurality of subclusters of storage nodes comprises configuring each local management subsystem to manage operations associated with the respective subcluster of storage nodes, wherein the operations include at least one of: managing a rebuild operation within the respective subcluster of storage nodes; managing a rebalance operation within the respective subcluster of storage nodes; adding a new storage node to the respective subcluster of storage nodes; removing an existing storage node from the respective subcluster of storage nodes; and performing a migration operation within the respective subcluster of storage nodes. 5. The system of claim 1 , wherein: the data storage system comprises a scale-out software-defined storage system comprising software components that are deployed on each storage node of the cluster of storage nodes to implement a storage control system on each storage node; and each local management subsystem is configured to manage and communicate with the storage control systems of the storage nodes within the respective subclusters of storage nodes. 6. The system of claim 1 , wherein: the global management system comprises a cluster of global metadata manager (MDM) nodes deployed on different storage nodes of the data storage system, wherein the cluster of global MDM nodes comprises a global primary MDM node and at least one global secondary MDM node; and the local management subsystem for a given subcluster of storage nodes comprises a cluster of local MDM nodes deployed on different storage nodes within the given subcluster of storage nodes, wherein the cluster of local MDM nodes comprises a local primary MDM node and at least one local secondary MDM node; wherein the local primary MDM node of the local management subsystem communicates with the global primary MDM node of the global management system. 7. The system of claim 6 , wherein the cluster management system is configured to cause a local secondary MDM node of a given cluster of local MDM nodes to assume a role of the global primary MDM node of the cluster of global MDM nodes. 8. The system of claim 6 , wherein the cluster management system is configured to automatically reallocate an MDM role of one or more MDM nodes of one or more of the cluster of global MDM nodes and a given cluster of local MDM nodes, in response to detection, by the cluster management system, of an event that warrants the MDM role reallocation. 9. The system of claim 8 , wherein the event comprises at least one of: detection of a loss of one or more network paths to one or more storage nodes; detection of congestion on one or more network paths; and reduced availability of storage or compute resources on one or more storage nodes on which the MDM nodes are deployed. 10. A method, comprising: partitioning a cluster of storage nodes of a data storage system into a plurality of subclusters of storage nodes; deploying a cluster management system on at least some of the storage nodes of the data storage system, wherein the cluster management system comprises a global management system and a plurality of local management subsystems; configuring each of the local management subsystems to manage a respective subcluster of storage nodes of the plurality of subclusters of storage nodes independent of other ones of the local management subsystems, and to communicate with the global management system to provide subcluster status information to the global management system regarding a current state and configuration of the respective subcluster of storage nodes; and configuring the global management system to manage the cluster of storage nodes using the subcluster status information provided by the local management subsystems. 11. The method of claim 10 , wherein partitioning the cluster of storage nodes of the data storage system into the plurality of subclusters of storage nodes comprises logically partitioning the cluster of storage nodes into a plurality of protection domains. 12. The method of claim 10 , further comprising configuring the global management system to manage and communicate with client drivers of an application layer to provide connectivity information to the client drivers which enables the client drivers to connect to storage nodes of the cluster of storage nodes and to access storage volumes of a virtual storage layer comprising aggregated storage capacity of the cluster of storage nodes. 13. The method of claim 10 , wherein configuring each local management subsystem to manage a respective subcluster of storage nodes of the plurality of subclusters of storage nodes comprises configuring each local management subsystem to manage operations associated with the respective subcluster of storage nodes, wherein the operations include at least one of: managing a rebuild operation within the respective subcluster of storage nodes; managing a rebalance operation within the respective subcluster of storage nodes; adding a new storage node to the respective subcluster of storage nodes; removing an existing storage node from the respective subcluster of storage nodes; and performing a migration operation within the respective subcluster of storage nodes. 14. The method of claim 10 , wherein deploying the cluster management system on at least some of the storage nodes of the data storage system comprises: deploying a cluster of global metadata manager (MDM) nodes on different storage nodes of the data storage system to implement the global management system, wherein the cluster of global MDM nodes comprises a global primary MDM node and at least one global secondary MDM node; and deploying a cluster of local MDM nodes on different storage nodes within a given subcluster of storage nodes to implement a given local management subsystem for the given subcluster of storage nodes, wherein the cluster of local MDM nodes comprises a local primary MDM node and

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0631Primary

    by allocating resources to storage systems · CPC title

  • G06F3/0604Primary

    Improving or facilitating administration, e.g. storage management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12099719B2 cover?
Techniques are provided for implementing a distributed hierarchical cluster management system. A system comprises a data storage system and a cluster management system. The data storage system comprises a cluster of storage nodes that is partitioned into a plurality of subclusters of storage nodes. The cluster management system is deployed on at least some of the storage nodes of the data stora…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F3/0631. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).