Fault tolerant distributed tasks using distributed file systems
US-9672122-B1 · Jun 6, 2017 · US
US11080100B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11080100-B2 |
| Application number | US-201916690860-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 21, 2019 |
| Priority date | Feb 12, 2015 |
| Publication date | Aug 3, 2021 |
| Grant date | Aug 3, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for load balancing and fault tolerant service are described. An apparatus may comprise load balancing and fault tolerant component operative to execute a load balancing and fault tolerant service in a distributed data system. The load balancing and fault tolerant service distributes a load of a task to a first node in a cluster of nodes using a routing table. The load balancing and fault tolerant service stores information to indicate the first node from the cluster of nodes is assigned to perform the task. The load balancing and fault tolerant service detects a failure condition for the first node. The load balancing and fault tolerant service moves the task to a second node from the cluster of nodes to perform the task for the first node upon occurrence of the failure condition.
Opening claim text (preview).
The invention claimed is: 1. A method, comprising: distributing a snapshot task to a first node of a cluster; updating a last owning node list within a routing table to indicate that the snapshot task has been distributed to the first node; detecting a failure condition for the first node; and reassigning the snapshot task from the first node to a second node to perform the snapshot task based upon detecting the failure condition. 2. The method of claim 1 , comprising: restarting a failed user space process as a restarted user space process executing at the second node based upon detecting the failure condition. 3. The method of claim 2 , comprising: recreating a state transition table for the restarted user space process based upon information within the routing table. 4. The method of claim 1 , comprising: identifying a list of healthy nodes within the cluster. 5. The method of claim 4 , comprising: identifying a list of unhealthy nodes within the cluster. 6. The method of claim 5 , comprising: redistributing relationships between nodes of the cluster based upon the list of healthy nodes and the list of unhealthy nodes. 7. The method of claim 6 , comprising: determining if a new node of a quorum of nodes within the cluster is a most recent node responsible for a relationship. 8. The method of claim 1 , comprising: utilizing the last owning node list within the routing table during restoration of the first node to assign the snapshot task back to the first node. 9. The method of claim 1 , comprising: utilizing a library within a replicated database to select a new master node when a current master node of the cluster fails. 10. The method of claim 1 , comprising: updating the routing table based upon the snapshot task being reassigned. 11. The method of claim 1 , comprising: performing a cleanup of a transition state of a user space process that failed based upon the failure condition of the first node. 12. A computing device, comprising: a memory comprising instructions; and a processor coupled with the memory, the processor configured to execute the instructions to cause the processor to: distribute a snapshot task to a first node of a cluster; detect a failure condition for the first node; and reassign the snapshot task from the first node to a second node to perform the snapshot task based upon detecting the failure condition, comprising utilizing a last owning node list within a routing table during restoration of the first node to assign the snapshot task back to the first node. 13. The computing device of claim 12 , the instructions to cause the processor to: restart a failed user space process as a restarted user space process executing at the second node based upon detecting the failure condition. 14. The computing device of claim 13 , the instructions to cause the processor to: recreate a state transition table for the restarted user space process based upon information within the routing table. 15. The computing device of claim 12 , the instructions to cause the processor to: identify a list of healthy nodes within the cluster. 16. The computing device of claim 15 , the instructions to cause the processor to: identify a list of unhealthy nodes within the cluster. 17. The computing device of claim 16 , the instructions to cause the processor to: redistribute relationships between nodes of the cluster based upon the list of healthy nodes and the list of unhealthy nodes. 18. The computing device of claim 17 , the instructions to cause the processor to: determine if a new node of a quorum of nodes within the cluster is a most recent node responsible for a relationship. 19. The computing device of claim 12 , the instructions to cause the processor to: recreate a state transition table based upon information within the routing table. 20. A non-transitory computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to: distribute a snapshot task to a first node of a cluster; detect a failure condition for the first node; and reassign the snapshot task from the first node to a second node to perform the snapshot task based upon detecting the failure condition, comprising utilizing a last owning node list within a routing table during restoration of the first node to assign the snapshot task back to the first node.
involving virtual machines · CPC title
by reconfiguration of node membership · CPC title
involving task migration · CPC title
without idle spare hardware · CPC title
Real-time · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.