Intra-cluster node troubleshooting method and device

US11115263B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11115263-B2
Application numberUS-202016732749-A
CountryUS
Kind codeB2
Filing dateJan 2, 2020
Priority dateJul 12, 2017
Publication dateSep 7, 2021
Grant dateSep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application relate to an intra-cluster node troubleshooting method and device. The method includes: obtaining fault detection topology information of a cluster, where the fault detection topology information includes a fault detection relationship between all nodes in the cluster; obtaining a fault indication message, where the fault indication message is used to indicate unreachability from a detection node to a detected node; determining a sub-cluster of the cluster based on the fault detection topology information and the fault indication message, where nodes that belong to different sub-clusters are unreachable to each other; and determining a working cluster based on the sub-cluster of the cluster. According to the embodiments of this application, available nodes in the cluster can be retained to a maximum extent at relatively low costs. In this way, a quantity of available nodes in the cluster is increased, high availability is ensured.

First claim

Opening claim text (preview).

What is claimed is: 1. A troubleshooting method for nodes in a cluster, the method comprising: obtaining fault detection topology information of the cluster, fault detection being performed on one node in the cluster by at least one other node in the cluster, the fault detection topology information comprising a fault detection relationship between a detection node and a detected node in the cluster; receiving a fault indication message from the detection node, the fault indication message indicating unreachability from the detection node to the detected node; determining a sub-cluster of the cluster based on the fault detection topology information and the fault indication message, with nodes belonging to different sub-clusters of the cluster being unreachable to each other; and determining a working cluster based on the determined sub-cluster. 2. The method according to claim 1 , wherein the determining the working cluster based on the determined sub-cluster comprises any one of: determining, as the working cluster, a sub-cluster having a largest quantity of nodes; determining, as the working cluster, a sub-cluster comprising a seed node and having a largest quantity of nodes, wherein the seed node is a preconfigured node and a non-seed node joins the cluster using the seed node; determining, as the working cluster, a sub-cluster comprising a largest quantity of seed nodes; or determining, as the working cluster, a sub-cluster having a largest quantity of nodes running a main service. 3. The method according to claim 1 , wherein the determining the working cluster based on the determined sub-cluster comprises: determining the working cluster based on a health status or a resource availability status of a node in the determined sub-cluster, the health status of the node being determined based on a period of time in which the node makes a response to a detection packet. 4. The method according to claim 1 , wherein the determining the sub-cluster based on the fault detection topology information and the fault indication message comprises: determining a fault detection relationship topology between nodes based on the fault detection topology information; deleting, from the fault detection relationship topology, an edge corresponding to the fault indication message to obtain an updated fault detection relationship topology; determining a connected subgraph of the updated fault detection relationship topology; and determining the sub-cluster based on the determined connected subgraph of the updated fault detection relationship topology. 5. The method according to claim 1 , wherein the determining the sub-cluster based on the fault detection topology information and the fault indication message comprises: determining a faulty node and a faulty link in the cluster based on the fault detection topology information and the fault indication message; deleting the faulty node, the faulty link, or the faulty node and the faulty link from a network topology of the cluster to obtain an updated network topology; determining a connected subgraph of the updated network topology, the updated network topology comprising information about network connections between all nodes in the cluster; and determining the sub-cluster based on the determined connected subgraph. 6. The method according to claim 1 , wherein the working cluster includes a set of unreachable nodes each being a detected node that has one or more fault indication messages pointing to it, each of the one or more fault indication messages being from a detection node and indicating unreachability from the detection node to the detected node, and the method further comprises: determining, among the set of unreachable nodes in the working cluster, a first unreachable node that has a largest quantity of fault indication messages pointing to it as a to-be-deleted node; and sending a first indication message to another node in the working cluster, the first indication message indicating the to-be-deleted node. 7. The method according to claim 6 , wherein the determining the first unreachable node as the to-be-deleted node comprises: determining, among the set of unreachable nodes in the working cluster, an unreachable node that has the largest quantity of the fault indication messages pointing to it and whose health status is worst, as the to-be-deleted node. 8. The method according to claim 1 , wherein the obtaining the fault detection topology information of the cluster comprises: receiving the fault detection relationship sent by another node in the cluster, and determining the fault detection topology information based on the received fault detection relationship; or deducing the fault detection topology information according to a preset rule. 9. A troubleshooting device, comprising: a transceiver configured to communicate with a detection node; a memory storing instructions; and a processor in communication with the transceiver and the memory, the processor executing the instructions to perform: obtaining fault detection topology information of a cluster, fault detection being performed on one node in the cluster by at least one other node in the cluster, the fault detection topology information comprising a fault detection relationship between the detection node and a detected node in the cluster; receiving a fault indication message from the detection node, the fault indication message indicating unreachability from the detection node to the detected node; determining a sub-cluster of the cluster based on the fault detection topology information and the fault indication message, with nodes belonging to different sub-clusters of the cluster being unreachable to each other; and determining a working cluster based on the determined sub-cluster. 10. The troubleshooting device according to claim 9 , wherein the determining the working cluster based on the determined sub-cluster comprises any one of: determining, as the working cluster, a sub-cluster having a largest quantity of nodes; determining, as the working cluster, a sub-cluster comprising a seed node and having a largest quantity of nodes, wherein the seed node is a preconfigured node and a non-seed node joins the cluster using the seed node; determining, as the working cluster, a sub-cluster comprising a largest quantity of seed nodes; or determining, as the working cluster, a sub-cluster having a largest quantity of nodes running a main service. 11. The troubleshooting device according to claim 9 , wherein the determining the working cluster based on the determined sub-cluster comprises: determining the working cluster based on a health status or a resource availability status of a node in the sub-cluster, the health status of the node being determined based on a period of time in which the node makes a response to a detection packet. 12. The troubleshooting device according to claim 9 , wherein the determining the sub-cluster based on the fault detection topology information and the fault indication message comprises: determining a fault detection relationship topology between nodes based on the fault detection topology information; deleting, from the fault detection relationship topology, an edge corresponding to the fault indication message to obtain an updated fault detection relationship topology; determining a connected subgraph of the updated fault detection relationship topology; and determining the sub-cluster based on the determined connected subgraph of the updated fault detection relationship topology. 13. The troubleshooting device according to claim 9 , wherein the determining the sub-clu

Assignees

Inventors

Classifications

  • Discovery or management of network topologies · CPC title

  • the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV · CPC title

  • using virtualisation of network functions or resources, e.g. SDN or NFV entities · CPC title

  • Additional information in the notification, e.g. enhancement of specific meta-data · CPC title

  • H04L41/065Primary

    involving logical or physical relationship, e.g. grouping and hierarchies · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11115263B2 cover?
Embodiments of this application relate to an intra-cluster node troubleshooting method and device. The method includes: obtaining fault detection topology information of a cluster, where the fault detection topology information includes a fault detection relationship between all nodes in the cluster; obtaining a fault indication message, where the fault indication message is used to indicate un…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification H04L41/0686. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).