Scale out capacity load-balancing for backup appliances

US10754696B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10754696-B1
Application numberUS-201715655792-A
CountryUS
Kind codeB1
Filing dateJul 20, 2017
Priority dateJul 20, 2017
Publication dateAug 25, 2020
Grant dateAug 25, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed to a load balancer process for use in a deduplication backup process implemented in a cluster system that provides ideal placement of the Mtrees on the expanded capacity by monitoring the available capacity and providing recommendations on the best node to place newly created Mtrees. Continuous monitoring of the capacity and activity level of the nodes helps identify the appropriate node to place a new Mtree. The monitoring of existing node in the cluster and balancing capacity by recommending migration of files from heavily-utilized nodes to under-utilized nodes produces an overall increase in cluster performance.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of balancing nodes comprising virtual machines (VMs) in a cluster system executing a deduplication backup process, comprising: presenting protocol-specific namespaces to clients for accessing a logical file system layer for the nodes; spreading an Mtree namespace among the nodes, wherein an Mtree stores files and directories for each protocol-specific namespace; balancing processor (CPU) cycles among the nodes by migrating data of the files and directories from a first node to a second node when a defined processor threshold of the first node is met or exceeded; balancing storage capacity of the nodes by migrating the data from the first node to the second node when a defined storage threshold of the first node is met or exceeded; and balancing streams processed in the system by migrating one or more streams processed by the first node to the second node when the number of streams is at a defined stream limit, wherein the stream number comprises a number of concurrently open files at a same time. 2. The method of claim 1 further comprising: monitoring each node to determine CPU, capacity, and stream usage statistics on a periodic basis; and compiling the usage statistics for storage in a single database on the node. 3. The method of claim 2 further comprising: aggregating the single databases for each node into an aggregated cluster database; querying the cluster database to determine if any node of the cluster has met or exceeded at least one of: the defined processor threshold, the defined storage threshold, and the defined stream limit. 4. The method of claim 3 further comprising: sending a first workflow command from the load balancer through a system manager to the second node to initiate migrating the data or migrating the one or the one or more streams based on the querying; and sending a second workflow command to a cluster inventory manager through the system manager to increase a capacity of the second node through a scale-up process, or spawn a new node as the second node through a scale-out process. 5. The method of claim 1 further comprising: selecting data to evict from the first node in the event of exceeding a defined threshold or number of streams; selecting a set of candidate nodes including the second node by identifying nodes that have sufficient capacity to store the evicted data; and selecting the second node from the set of candidate nodes through an intersection process that compares the evicted data to an existing dataset in the second node and identifying which candidate node contains an existing dataset that most closely matches the evicted data to maintain deduplication of the evicted data. 6. The method of claim 5 further comprising selecting the second node at least in part in consideration of user actions comprising resource consumption, capacity, and performance parameters, and policies comprising resource consumption policies including new node allocations and node expansion, performance policies including capacity, CPU usage and deduplication, and provisioning policies. 7. The method of claim 5 wherein the thresholds and number of streams are set upon system configuration and dynamic during runtime based on usage. 8. The method of claim 1 further comprising balancing a network interface associated with the first node when the interface exceeds a specified line rate, by moving one or more network addresses associated with the interface to another interface in the first node or to an interface in the second node. 9. The method of claim 8 further comprising preserving a data locality of the data by aligning addresses with the data location. 10. The method of claim 1 wherein the deduplication backup process executed on a deduplication backup server running a Data Domain file system (DDFS). 11. A computer-implemented method of balancing nodes comprising virtual machines (VMs) in a cluster system executing a deduplication backup process, comprising: presenting protocol-specific namespaces to clients for accessing a logical file system layer for the nodes; spreading an Mtree namespace among the nodes, wherein an Mtree stores files and directories for each protocol-specific namespace; sampling, on a periodic basis and on each node, usage data comprising a respective CPU cycle use, storage capacity, and stream number; storing the usage data in a local database on each node; collecting the data in the local database on each node for aggregation into a single database maintained on a cluster manager; and querying, by a load balancer the single database to determine whether or not to initiate a file migration of the files and directories from a node that exhibits overuse based on defined storage and CPU thresholds, wherein the stream number comprises a number of concurrently open files at a same time. 12. The method of claim 11 wherein the load balancer is configured to: balance the CPU cycles among the node by migrating data from a first node to a second node when a defined processor threshold of the first node is met or exceeded; balance the storage capacity of the node by migrating the data from the first node to the second node when a defined storage threshold of the first node is met or exceeded; and balance streams processed in the system by migrating one or more streams processed by the first node to the second node when the number of streams is at a defined stream limit, wherein the stream number comprises a number of concurrently open files. 13. The method of claim 12 further comprising: selecting data to evict from the first node in the event of exceeding a defined threshold or number of streams; selecting a set of candidate nodes including the second node by identifying nodes that have sufficient capacity to store the evicted data; and selecting the second node from the set of candidate nodes through an intersection process that compares the evicted data to an existing dataset in the second node and identifying which candidate node contains an existing dataset that most closely matches the evicted data to maintain deduplication of the evicted data. 14. The method of claim 13 further comprising selecting the second node at least in part in consideration of user actions comprising resource consumption, capacity, and performance parameters, and policies comprising resource consumption policies including new node allocations and node expansion, performance policies including capacity, CPU usage and deduplication, and provisioning policies, wherein the thresholds and number of streams are set upon system configuration and dynamic during runtime based on usage. 15. The method of claim 11 further comprising: balancing a network interface associated with the first node when the interface exceeds a specified line rate, by moving one or more network addresses associated with the interface to another interface in the first node or to an interface in the second node; and preserving a data locality of the data by aligning addresses with the data location. 16. The method of claim 12 wherein the load balancer is further configured to: select data to evict from the first node in the event of exceeding a defined threshold or number of streams; select a set of candidate nodes including the second node by identifying nodes that have sufficient capacity to store the evicted data; and select the second node from the set of candidate nodes through an intersection process that compares the evicted data to an existing dataset in the second node and identifying which candidate node co

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • involving task migration · CPC title

  • Partitioning or combining of resources · CPC title

  • Virtual · CPC title

  • for networked environments · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10754696B1 cover?
Embodiments are directed to a load balancer process for use in a deduplication backup process implemented in a cluster system that provides ideal placement of the Mtrees on the expanded capacity by monitoring the available capacity and providing recommendations on the best node to place newly created Mtrees. Continuous monitoring of the capacity and activity level of the nodes helps identify th…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/1464. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 25 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).