Dynamic recovery from a split-brain failure in edge nodes

US10237123B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10237123-B2
Application numberUS-201615387549-A
CountryUS
Kind codeB2
Filing dateDec 21, 2016
Priority dateDec 21, 2016
Publication dateMar 19, 2019
Grant dateMar 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments provide a method for employing the management and control system of a network to dynamically recover from a split-brain condition in the edge nodes of the network. The method of some embodiments takes a corrective action to automatically recover from a split-brain failure occurred at a pair of high availability (HA) edge nodes of the network. The HA edge nodes include an active machine and a standby machine. The active edge node actively passes through the network traffic (e.g., north-south traffic for a logical network), while the standby edge node is synchronized and ready to transition to the active state, should a failure occur. Both HA nodes share the same configuration settings and only one is active until a path, link, or system failure occurs. The active edge node also provides stateful services (e.g., stateful firewall, load balancing, etc.) to the data compute nodes of the network.

First claim

Opening claim text (preview).

We claim: 1. A method for specifying an operational state of first and second logical gateways, the method comprising: receiving a message from the first gateway that indicates the first gateway has transitioned from a standby state to an active state, the first and second gateways (i) serving as a pair of high availability (HA) logical gateways of a logical network implemented on a physical network and (ii) connecting the logical network to a network external to the logical network; determining that the second gateway is in the active state when the first gateway is in the active state; determining that the first and second gateways do not communicate with each other; and based on the determination that the first and second gateways do not communicate with each other, directing the first gateway to transition back to the standby state. 2. The method of claim 1 , wherein the pair of HA gateways forwards north-south traffic for the logical network. 3. The method of claim 2 , wherein an active gateway forwards the north-south traffic for the logical network and a standby gateway takes over the forwarding of the north-south traffic when the active gateway becomes unavailable. 4. The method of claim 2 , wherein the forwarding of network traffic comprises performing layer three routing of network traffic to connect the logical network to one or more external networks. 5. The method of claim 1 , wherein the first and second gateways comprise virtual machines executing on different host machines. 6. The method of claim 1 , wherein the first and second gateways communicate with each other through a set of private links in order to monitor states of each other. 7. The method of claim 1 , wherein the message is a first message, wherein determining that the first and second gateways do not communicate with each other comprises receiving a second message from the first gateway that indicates the first and second gateways are disconnected. 8. The method of claim 7 , wherein the first gateway sends the second message when the first gateway sends a third message to the second gateway and does not receive an acknowledgement back from the second gateway within a specified period. 9. The method of claim 8 , wherein the third message comprises a bidirectional forwarding detection (BFD) message. 10. The method of claim 1 , wherein the message is a first message, wherein directing the first gateway to transition back to the first state comprises sending a second message to the first gateway instructing the first gateway to move back into a standby state. 11. A non-transitory machine readable medium, storing a program, which when implemented by at least one processing unit specifies an operational state of first and second logical gateways, the program comprising sets of instructions for: receiving a message from the first gateway that indicates the first gateway has transitioned from a standby state to an active state, the first and second gateways (i) serving as a pair of high availability (HA) logical gateways of a logical network implemented on a physical network and (ii) connecting the logical network to a network external to the logical network; determining that the second gateway is in the active state when the first gateway is in the active state; determining that the first and second gateways do not communicate with each other; and based on the determination that the first and second gateways do not communicate with each other, directing the first gateway to transition back to the standby state. 12. The non-transitory machine readable medium of claim 11 , wherein the pair of HA gateways forwards north-south traffic for the logical network. 13. The non-transitory machine readable medium of claim 12 , wherein an active gateway forwards the north-south traffic for the logical network and a standby gateway takes over the forwarding of the north-south traffic when the active gateway becomes unavailable. 14. The non-transitory machine readable medium of claim 12 , wherein the forwarding of network traffic comprises performing layer three routing of network traffic to connect the logical network to one or more external networks. 15. The non-transitory machine readable medium of claim 11 , wherein the first and second gateways comprise virtual machines executing on different host machines. 16. The non-transitory machine readable medium of claim 11 , wherein the first and second gateways communicate with each other through a set of private links in order to monitor states of each other. 17. The non-transitory machine readable medium of claim 11 , wherein the message is a first message, wherein the set of instructions for determining that the first and second gateways do not communicate with each other comprises a set of instructions for receiving a second message from the first gateway that indicates the first and second gateways are disconnected. 18. The non-transitory machine readable medium of claim 17 , wherein the first gateway sends the second message when the first gateway sends a third message to the second gateway and does not receive an acknowledgement back from the second gateway within a specified period. 19. The non-transitory machine readable medium of claim 18 , wherein the third message comprises a bidirectional forwarding detection (BFD) message. 20. The non-transitory machine readable medium of claim 11 , wherein the message is a first message, wherein the set of instructions for directing the first gateway to transition back to the first state comprises a set of instructions for sending a second message to the first gateway instructing the first gateway to move back into a standby state.

Assignees

Inventors

Classifications

  • H04L69/40Primary

    for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection (management of faults, events, alarms or notifications in data switching networks H04L41/06) · CPC title

  • Active monitoring, e.g. heartbeat, ping or trace-route · CPC title

  • by checking functioning · CPC title

  • Physical resource allocation for ACK/NACK (for physical mapping arrangements in ARQ protocols H04L1/1861) · CPC title

  • in the network layer [OSI layer 3], e.g. X.25 (H04L69/16 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10237123B2 cover?
Some embodiments provide a method for employing the management and control system of a network to dynamically recover from a split-brain condition in the edge nodes of the network. The method of some embodiments takes a corrective action to automatically recover from a split-brain failure occurred at a pair of high availability (HA) edge nodes of the network. The HA edge nodes include an active…
Who is the assignee on this patent?
Nicira Inc
What technology area does this patent fall under?
Primary CPC classification H04L69/40. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).