Dynamically modifying a cluster of computing nodes used for distributed execution of a program
US-9329909-B1 · May 3, 2016 · US
US10270646B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10270646-B2 |
| Application number | US-201615332558-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2016 |
| Priority date | Oct 24, 2016 |
| Publication date | Apr 23, 2019 |
| Grant date | Apr 23, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Fault tolerance techniques for a plurality of nodes executing application thread groups include executing at least a portion of a first application thread group based on a delegation by a first node, wherein the first node delegates an execution of the first application thread group amongst the plurality of nodes and has a highest priority indicated by an ordered priority of the plurality of nodes. A failure of the first node can be identified based on the first node failing to respond to a message sent to it. A second node can then be identified as having a next highest priority indicated by the ordered priority such that the second node can delegate an execution of a second application thread group amongst the plurality of nodes.
Opening claim text (preview).
What is claimed is: 1. A fault tolerance system, comprising: a plurality of nodes, wherein each of the plurality of nodes comprises a virtual machine instance configured to execute at least a portion of an application thread group; and a centralized database that is accessible to each of the plurality of nodes and configured to store status information for each of the plurality of nodes, including an indication of an ordered priority of the plurality of nodes, wherein the plurality of nodes comprises: a first node having a highest priority and designated as a master node in the centralized database, wherein the master node is configured to delegate execution of the application thread group among the plurality of nodes; and a second node configured to: execute at least a portion of the application thread group based on a delegation by the first node; send a message to the first node; identify a failure of the first node based on the first node failing to respond to the message; update the status information for the first node in the centralized database to indicate that the first node is no longer the master node based on the identified failure of the first node; identify a second highest priority node of the plurality of nodes from the ordered priority in the centralized database; and update the status information of the second highest priority node in the centralized database to designate the second highest priority node as the master node. 2. The fault tolerance system of claim 1 , wherein the message sent to the first node includes a request for a response by the first node indicating that the first node is accessible, wherein the failure of the first node is identified responsive to a determination that the first node is not accessible based on the response not being received from the first node. 3. The fault tolerance system of claim 1 , wherein the ordered priority of the plurality of nodes represents an order in which each of the plurality of nodes registered to the centralized database. 4. The fault tolerance system of claim 3 , wherein the centralized database is configured to register each node of the plurality of nodes to the centralized database, wherein, after registration, the centralized database is configured to store an identifier of each node within a respective entry of the centralized database. 5. The fault tolerance system of claim 4 , wherein to identify the second highest priority node, the second node is configured to: identify an identifier in a second entry of the centralized database following a first entry of the centralized database, the first entry including an identifier of the first node, wherein the identifier in the second entry is of the second node. 6. The fault tolerance system of claim 1 , wherein the second node is configured to identify the failure of the first node responsive to failing to receive a response to the message within a threshold period starting at a time the message is sent to the first node. 7. A fault tolerance method for a plurality of nodes, wherein each of the plurality of nodes comprises a virtual machine instance executing at least a portion of an application thread group, and wherein each of the plurality of nodes is in communication with a centralized database that stores status information for each of the plurality of nodes, including an indication of an ordered priority of the plurality of nodes, wherein the plurality of nodes comprises: a first node having a highest priority and designated as a master node in the centralized database, wherein the master node delegates execution of the application thread group among the plurality of nodes; and a second node that performs the method, comprising: executing at least a portion of the application thread group based on a delegation by the first node; sending a message to the first node; identifying a failure of the first node based on the first node failing to respond to the message; updating status information for the first node in the centralized database to indicate that the first node is no longer the master node based on the identified failure of the first node; identifying a second highest priority node of the plurality of nodes from the ordered priority in the centralized database; and updating the status information of the second highest priority node in the centralized database to designate the second highest priority node as the master node. 8. The fault tolerance method of claim 7 , wherein the message sent to the first node includes a request for a response by the first node indicating that the first node is accessible, wherein the failure of the first node is identified responsive to determining that the first node is not accessible based on the response not being received from the first node. 9. The fault tolerance method of claim 7 , wherein the ordered priority of the plurality of nodes represents an order in which each of the plurality of nodes registered to the centralized database. 10. The fault tolerance method of claim 9 , wherein each node of the plurality of nodes registers to the centralized database, and wherein, after registration, the centralized database stores an identifier of each node within a respective entry of the centralized database. 11. The fault tolerance method of claim 10 , wherein identifying the second highest priority node comprises: identifying an identifier in a second entry of the centralized database following a first entry of the centralized database, the first entry including an identifier of the first node, wherein the identifier in the second entry is of the second node. 12. The fault tolerance method of claim 7 , wherein identifying the second highest priority node comprises: identifying the second node as the second highest priority node from the ordered priority in the centralized database. 13. The fault tolerance method of claim 7 , wherein identifying the failure of the first node comprises: determining that a response has not been received from the first node within a threshold period starting at a time the message is sent to the first node. 14. A non-transitory computer-readable storage medium, comprising executable instructions that, when executed by a processor, facilitate the performance of operations for fault tolerance for a plurality of nodes, wherein each of the plurality of nodes is comprises virtual machine instance configured to execute at least a portion of an application thread group, and wherein each of the plurality of nodes is communicatively coupled to a centralized database that stores status information for each of the nodes, including an indication of an ordered priority of the plurality of nodes, wherein the plurality of nodes comprises a first node having a highest priority and designated as a master node in the centralized database, wherein the master node delegates execution of the application thread group among the plurality of nodes; the instructions comprising: instructions that cause a second node of the plurality of nodes to execute at least a portion of the application thread group based on a delegation by the first node; instructions that cause the second node to send a message to the first node; instructions that cause the second node to identify a failure of the first node based on the first node failing to respond to the message; instructions that cause the second node to update the status information for the first node in the centralized database to indicate that the first node is no longer the master node based on the identified failure of the first node; instructions that cause the second node to identify a
comprising hierarchical management structures · CPC title
Performing the actions predefined by failover planning, e.g. switching to standby network elements · CPC title
Active monitoring, e.g. heartbeat, ping or trace-route · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.