Load balancing and fault tolerant service in a distributed data system

US9785480B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785480-B2
Application numberUS-201514620591-A
CountryUS
Kind codeB2
Filing dateFeb 12, 2015
Priority dateFeb 12, 2015
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for load balancing and fault tolerant service are described. An apparatus may comprise load balancing and fault tolerant component operative to execute a load balancing and fault tolerant service in a distributed data system. The load balancing and fault tolerant service distributes a load of a task to a first node in a cluster of nodes using a routing table. The load balancing and fault tolerant service stores information to indicate the first node from the cluster of nodes is assigned to perform the task. The load balancing and fault tolerant service detects a failure condition for the first node. The load balancing and fault tolerant service moves the task to a second node from the cluster of nodes to perform the task for the first node upon occurrence of the failure condition.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: distributing a load of a task to a first node in a cluster of nodes using a routing table; replicating the routing table to the first node and a second node in the cluster of nodes using a replicated database (RDB) service; storing information to indicate that the first node is assigned to perform the task; detecting a failure condition for the first node; and reassigning the task from the first node to the second node to perform the task based upon occurrence of the failure condition, wherein the first node and the second node are computers capable of executing the task. 2. The method of claim 1 , comprising: creating a relationship between the first node and the second node; delegating the relationship from the first node to the second node for load balancing the second node based upon detection of the failure condition of the first node; detecting a return of the first node to an active status; and returning the relationship from the second node to the first node based upon the first node returning to the active status following the failure condition. 3. The method of claim 1 , comprising: saving routing information, associated with the routing table, in a structured format in the first node; maintaining a quorum for the cluster of nodes; and providing a notification upon a node joining or leaving the quorum. 4. The method of claim 1 , comprising: restarting the task by the second node upon reassignment of the task to the second node; and restoring the load of the task to the first node based upon the first node becoming active after the failure condition. 5. The method of claim 1 , comprising joining a new node, capable of receiving the load, to the cluster of nodes, wherein the new node is the second node. 6. The method of claim 1 , comprising maintaining in the routing table an ownership list indicating which node in the cluster of nodes is a most recent owner responsible for the task. 7. The method of claim 1 , comprising determining a performance state for the first node. 8. The method of claim 1 , comprising maintaining user space processes on a plurality of nodes of the cluster of nodes for distributing the load of the task. 9. A computing device, comprising: a memory containing computer-readable storage medium having stored thereon instructions for performing a method; and a processor coupled with the memory, the processor configured to execute the instructions to cause the processor to: distribute a load of a task to a first node in a cluster of nodes using a routing table; replicate the routing table to the first node and a second node using a replicated database (RDB) service; store information to indicate that the first node is assigned to perform the task; detect a failure condition for the first node; and reassign the task from the first node to the second node in the cluster of nodes to perform the task based upon occurrence of the failure condition. 10. The computing device of claim 9 , the instructions to cause the processor to: create a relationship between the first node and the second node; delegate the relationship from the first node to the second node for load balancing the second node based upon detection of the failure condition of the first node; detect a return of the first node to an active status; and return the relationship from the second node to the first node based upon the first node returning to the active status following the failure condition. 11. The computing device of claim 9 , the instructions to cause the processor to: save routing information, associated with the routing table, in a structured format in the first node; maintain a quorum of the cluster of nodes; and provide a notification upon a node joining or leaving the quorum. 12. The computing device of claim 9 , the instructions to cause the processor to: restart the task by the second node upon occurrence of the failure condition; and restore the load of the task to the first node based upon the first node becoming active after the failure condition. 13. The computing device of claim 9 , the instructions to cause the processor to join a new node, capable of receiving the load, to the cluster of nodes, wherein the new node is the second node. 14. The computing device of claim 9 , the instructions to cause the processor to: maintain in the routing table an ownership list indicating which node in the cluster of nodes is a most recent owner responsible for the task; determine a performance state for the first node; and maintain user space processes on a plurality of nodes of the cluster of nodes for distributing the load of the task. 15. A non-transitory computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to: distribute a load of a task to a first node in a cluster of nodes using a routing table; replicate the routing table to the first node and a second node in the cluster of nodes using a replicated database (RDB) service; store information to indicate that the first node is assigned to perform the task; detect a failure condition for the first node; and reassign the task from the first node to the second node to perform the task based upon occurrence of the failure condition, wherein the first node and the second node comprise storage controllers capable of executing the task to perform a storage operation. 16. The computer-readable storage medium of claim 15 , comprising further instructions that, when executed by the processor, cause the processor to: create a relationship between the first node and the second node; delegate the relationship from the first node to the second node for load balancing the second node based upon detection of the failure condition of the first node; detect a return of the first node to an active status; and return the relationship from the second node to the first node based upon the first node returning to the active status following the failure condition. 17. The computer-readable storage medium of claim 15 , comprising further instructions that, when executed by the processor, cause the processor to: save routing information, associated with the routing table, in a structured format in the first node; maintain a quorum for the cluster of nodes; and provide a notification upon a node joining or leaving the quorum. 18. The computer-readable storage medium of claim 15 , comprising further instructions that, when executed by the processor, cause the processor to: restart the task by the second node upon occurrence of the failure condition; and restore the load of the task to the first node based upon the first node becoming active after the failure condition. 19. The computer-readable storage medium of claim 15 , comprising further instructions that, when executed by the processor, cause the processor to join a new node, capable of receiving the load, to the cluster of nodes, wherein the new node is the second node. 20. The computer-readable storage medium of claim 15 , comprising further instructions that, when executed by the processor, cause the processor to: maintain in the routing table an ownership list indicating which node in the cluster of nodes is a most recent owner responsible for the task; determine a performance state for the first node; and maintain user space processes on a plurality of nodes of the cluster of nodes for distributing the load of the task.

Assignees

Inventors

Classifications

  • using migration · CPC title

  • involving virtual machines · CPC title

  • by reconfiguration of node membership · CPC title

  • for load management (allocation of a server based on load conditions G06F9/505; load rebalancing G06F9/5083; redistributing the load in a network by a load balancer H04L67/1029) · CPC title

  • without idle spare hardware · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785480B2 cover?
Techniques for load balancing and fault tolerant service are described. An apparatus may comprise load balancing and fault tolerant component operative to execute a load balancing and fault tolerant service in a distributed data system. The load balancing and fault tolerant service distributes a load of a task to a first node in a cluster of nodes using a routing table. The load balancing and f…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/5088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).