Data center network fault detection and localization

US2018270102A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018270102-A1
Application numberUS-201715459879-A
CountryUS
Kind codeA1
Filing dateMar 15, 2017
Priority dateMar 15, 2017
Publication dateSep 20, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One or more processors of a device execute instructions to identify a set of servers that includes a first server and a second server in a plurality of data centers; send a first list of servers to the first server; send a second list of servers to the second server; receive a first set of response data from the first server, the first set of response data indicating responsiveness of the servers in the first list of servers; receive a second set of response data from the second server, the second set of response data indicating responsiveness of the servers in the second list of servers; analyze the first set of response data and the second set of response data; and based on the analysis, generate an alert that indicates a network error in a data center.

First claim

Opening claim text (preview).

What is claimed is: 1 . A device comprising: a memory storage comprising instructions; a network interface connected to a network; and one or more processors in communication with the memory storage, wherein the one or more processors execute the instructions to perform: identifying a set of servers in a plurality of data centers, the set of servers including a first server and a second server; sending, via the network interface, a first list of servers in the set of servers to the first server; sending, via the network interface, a second list of servers in the set of servers to the second server; receiving, via the network interface, a first set of response data from the first server, the first set of response data indicating responsiveness of the servers in the first list of servers; receiving, via the network interface, a second set of response data from the second server, the second set of response data indicating responsiveness of the servers in the second list of servers; analyzing the first set of response data and the second set of response data; and based on the analysis, generating an alert that indicates a network error in a data center of the plurality of data centers. 2 . The device of claim 1 , wherein the analyzing of the first set of response data and the second set of response data comprises: determining a drop rate for a third server in the first list of servers. 3 . The device of claim 1 , wherein the analyzing of the first set of response data and the second set of response data comprises: determining a failure state for a third server in the first list of servers. 4 . The device of claim 1 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers; and determining that all servers in the set of servers corresponding to sibling nodes of a node corresponding to a third server in the set of servers report dropped packets to the third server. 5 . The device of claim 1 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers and each other node of the tree corresponds to a distinct subset of the set of servers; and determining that a node in the tree data structure and all children of the node are in a failure state. 6 . The device of claim 1 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers and each other node of the tree corresponds to a distinct subset of the set of servers; and determining that a node in the tree is in a failure state and that no children of the node are in the failure state. 7 . The device of claim 1 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers and each other node of the tree corresponds to a distinct subset of the set of servers; and determining that a node is not in a failure state and that at least one child of the node is in the failure state. 8 . The device of claim 1 , wherein the one or more processors further perform: creating the first list of servers by including each server in a same rack as the first server. 9 . The device of claim 1 , wherein the one or more processors further perform: creating the first list of servers by including a third server, based on the third server being in a different rack than the first server. 10 . The device of claim 1 , wherein the one or more processors further perform: creating the first list of servers by including a third server, based on the third server being in a different data center than the first server. 11 . A computer-implemented method for automated fault detection in data center networks comprising: identifying, by one or more processors, a set of servers in a plurality of data centers, the set of servers including a first server and a second server; sending, via a network interface, a first list of servers in the set of servers to the first server; sending, via the network interface, a second list of servers in the set of servers to the second server; receiving, via the network interface, a first set of response data from the first server, the first set of response data indicating responsiveness of the servers in the first list of servers; receiving, via the network interface, a second set of response data from the second server, the second set of response data indicating responsiveness of the servers in the second list of servers; analyzing, by the one or more processors, the first set of response data and the second set of response data; and based on the analysis, generating an alert that indicates a network error in a data center of the plurality of data centers. 12 . The computer-implemented method of claim 11 , wherein the analyzing of the first set of response data and the second set of response data comprises: determining a drop rate for a third server in the first list of servers. 13 . The computer-implemented method of claim 11 , wherein the analyzing of the first set of response data and the second set of response data comprises: determining a failure state for a third server in the first list of servers. 14 . The computer-implemented method of claim 11 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers; and determining that all servers in the set of servers corresponding to sibling nodes of a node corresponding to a third server in the set of servers report dropped packets to the third server. 15 . The computer-implemented method of claim 11 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers and each other node of the tree corresponds to a distinct subset of the set of servers; and determining that a node in the tree data structure and all children of the node are in a failure state. 16 . The computer-implemented method of claim 11 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers and each other node of the tree corresponds to a distinct subset of the set of servers; and determining that a node in the tree is in a failure state and that no children of the node are in the failure state. 17 . The computer-implemented method of claim 11 , wherein the analyzing of the first set of response data and the second set of response data comprises: using a tree data structure in which each leaf node of the tree corresponds to a server of the set of servers and each other node of the tree corresponds to a distinct subset of the set of servers; and determining that a node is not in a failure state and that at least one child of the node is in the failure state. 18 . A non-transitory computer-readable medium storing computer instructions for a

Assignees

Inventors

Classifications

  • in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title

  • using logs of notifications; Post-processing of notifications · CPC title

  • by checking connectivity · CPC title

  • Errors, e.g. transmission errors · CPC title

  • H04L41/06Primary

    Management of faults, events, alarms or notifications · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018270102A1 cover?
One or more processors of a device execute instructions to identify a set of servers that includes a first server and a second server in a plurality of data centers; send a first list of servers to the first server; send a second list of servers to the second server; receive a first set of response data from the first server, the first set of response data indicating responsiveness of the serve…
Who is the assignee on this patent?
Futurewei Technologies Inc
What technology area does this patent fall under?
Primary CPC classification H04L43/0823. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Sep 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).