Load balancing and fault tolerant service in a distributed data system

US11080100B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11080100-B2
Application numberUS-201916690860-A
CountryUS
Kind codeB2
Filing dateNov 21, 2019
Priority dateFeb 12, 2015
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for load balancing and fault tolerant service are described. An apparatus may comprise load balancing and fault tolerant component operative to execute a load balancing and fault tolerant service in a distributed data system. The load balancing and fault tolerant service distributes a load of a task to a first node in a cluster of nodes using a routing table. The load balancing and fault tolerant service stores information to indicate the first node from the cluster of nodes is assigned to perform the task. The load balancing and fault tolerant service detects a failure condition for the first node. The load balancing and fault tolerant service moves the task to a second node from the cluster of nodes to perform the task for the first node upon occurrence of the failure condition.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, comprising: distributing a snapshot task to a first node of a cluster; updating a last owning node list within a routing table to indicate that the snapshot task has been distributed to the first node; detecting a failure condition for the first node; and reassigning the snapshot task from the first node to a second node to perform the snapshot task based upon detecting the failure condition. 2. The method of claim 1 , comprising: restarting a failed user space process as a restarted user space process executing at the second node based upon detecting the failure condition. 3. The method of claim 2 , comprising: recreating a state transition table for the restarted user space process based upon information within the routing table. 4. The method of claim 1 , comprising: identifying a list of healthy nodes within the cluster. 5. The method of claim 4 , comprising: identifying a list of unhealthy nodes within the cluster. 6. The method of claim 5 , comprising: redistributing relationships between nodes of the cluster based upon the list of healthy nodes and the list of unhealthy nodes. 7. The method of claim 6 , comprising: determining if a new node of a quorum of nodes within the cluster is a most recent node responsible for a relationship. 8. The method of claim 1 , comprising: utilizing the last owning node list within the routing table during restoration of the first node to assign the snapshot task back to the first node. 9. The method of claim 1 , comprising: utilizing a library within a replicated database to select a new master node when a current master node of the cluster fails. 10. The method of claim 1 , comprising: updating the routing table based upon the snapshot task being reassigned. 11. The method of claim 1 , comprising: performing a cleanup of a transition state of a user space process that failed based upon the failure condition of the first node. 12. A computing device, comprising: a memory comprising instructions; and a processor coupled with the memory, the processor configured to execute the instructions to cause the processor to: distribute a snapshot task to a first node of a cluster; detect a failure condition for the first node; and reassign the snapshot task from the first node to a second node to perform the snapshot task based upon detecting the failure condition, comprising utilizing a last owning node list within a routing table during restoration of the first node to assign the snapshot task back to the first node. 13. The computing device of claim 12 , the instructions to cause the processor to: restart a failed user space process as a restarted user space process executing at the second node based upon detecting the failure condition. 14. The computing device of claim 13 , the instructions to cause the processor to: recreate a state transition table for the restarted user space process based upon information within the routing table. 15. The computing device of claim 12 , the instructions to cause the processor to: identify a list of healthy nodes within the cluster. 16. The computing device of claim 15 , the instructions to cause the processor to: identify a list of unhealthy nodes within the cluster. 17. The computing device of claim 16 , the instructions to cause the processor to: redistribute relationships between nodes of the cluster based upon the list of healthy nodes and the list of unhealthy nodes. 18. The computing device of claim 17 , the instructions to cause the processor to: determine if a new node of a quorum of nodes within the cluster is a most recent node responsible for a relationship. 19. The computing device of claim 12 , the instructions to cause the processor to: recreate a state transition table based upon information within the routing table. 20. A non-transitory computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to: distribute a snapshot task to a first node of a cluster; detect a failure condition for the first node; and reassign the snapshot task from the first node to a second node to perform the snapshot task based upon detecting the failure condition, comprising utilizing a last owning node list within a routing table during restoration of the first node to assign the snapshot task back to the first node.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080100B2 cover?
Techniques for load balancing and fault tolerant service are described. An apparatus may comprise load balancing and fault tolerant component operative to execute a load balancing and fault tolerant service in a distributed data system. The load balancing and fault tolerant service distributes a load of a task to a first node in a cluster of nodes using a routing table. The load balancing and f…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/5088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).