What technology area does this patent fall under?

Primary CPC classification H04L41/0686. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Intra-cluster node troubleshooting method and device

US11115263B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11115263-B2
Application number	US-202016732749-A
Country	US
Kind code	B2
Filing date	Jan 2, 2020
Priority date	Jul 12, 2017
Publication date	Sep 7, 2021
Grant date	Sep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application relate to an intra-cluster node troubleshooting method and device. The method includes: obtaining fault detection topology information of a cluster, where the fault detection topology information includes a fault detection relationship between all nodes in the cluster; obtaining a fault indication message, where the fault indication message is used to indicate unreachability from a detection node to a detected node; determining a sub-cluster of the cluster based on the fault detection topology information and the fault indication message, where nodes that belong to different sub-clusters are unreachable to each other; and determining a working cluster based on the sub-cluster of the cluster. According to the embodiments of this application, available nodes in the cluster can be retained to a maximum extent at relatively low costs. In this way, a quantity of available nodes in the cluster is increased, high availability is ensured.

First claim

Opening claim text (preview).

What is claimed is: 1. A troubleshooting method for nodes in a cluster, the method comprising: obtaining fault detection topology information of the cluster, fault detection being performed on one node in the cluster by at least one other node in the cluster, the fault detection topology information comprising a fault detection relationship between a detection node and a detected node in the cluster; receiving a fault indication message from the detection node, the fault indication message indicating unreachability from the detection node to the detected node; determining a sub-cluster of the cluster based on the fault detection topology information and the fault indication message, with nodes belonging to different sub-clusters of the cluster being unreachable to each other; and determining a working cluster based on the determined sub-cluster. 2. The method according to claim 1 , wherein the determining the working cluster based on the determined sub-cluster comprises any one of: determining, as the working cluster, a sub-cluster having a largest quantity of nodes; determining, as the working cluster, a sub-cluster comprising a seed node and having a largest quantity of nodes, wherein the seed node is a preconfigured node and a non-seed node joins the cluster using the seed node; determining, as the working cluster, a sub-cluster comprising a largest quantity of seed nodes; or determining, as the working cluster, a sub-cluster having a largest quantity of nodes running a main service. 3. The method according to claim 1 , wherein the determining the working cluster based on the determined sub-cluster comprises: determining the working cluster based on a health status or a resource availability status of a node in the determined sub-cluster, the health status of the node being determined based on a period of time in which the node makes a response to a detection packet. 4. The method according to claim 1 , wherein the determining the sub-cluster based on the fault detection topology information and the fault indication message comprises: determining a fault detection relationship topology between nodes based on the fault detection topology information; deleting, from the fault detection relationship topology, an edge corresponding to the fault indication message to obtain an updated fault detection relationship topology; determining a connected subgraph of the updated fault detection relationship topology; and determining the sub-cluster based on the determined connected subgraph of the updated fault detection relationship topology. 5. The method according to claim 1 , wherein the determining the sub-cluster based on the fault detection topology information and the fault indication message comprises: determining a faulty node and a faulty link in the cluster based on the fault detection topology information and the fault indication message; deleting the faulty node, the faulty link, or the faulty node and the faulty link from a network topology of the cluster to obtain an updated network topology; determining a connected subgraph of the updated network topology, the updated network topology comprising information about network connections between all nodes in the cluster; and determining the sub-cluster based on the determined connected subgraph. 6. The method according to claim 1 , wherein the working cluster includes a set of unreachable nodes each being a detected node that has one or more fault indication messages pointing to it, each of the one or more fault indication messages being from a detection node and indicating unreachability from the detection node to the detected node, and the method further comprises: determining, among the set of unreachable nodes in the working cluster, a first unreachable node that has a largest quantity of fault indication messages pointing to it as a to-be-deleted node; and sending a first indication message to another node in the working cluster, the first indication message indicating the to-be-deleted node. 7. The method according to claim 6 , wherein the determining the first unreachable node as the to-be-deleted node comprises: determining, among the set of unreachable nodes in the working cluster, an unreachable node that has the largest quantity of the fault indication messages pointing to it and whose health status is worst, as the to-be-deleted node. 8. The method according to claim 1 , wherein the obtaining the fault detection topology information of the cluster comprises: receiving the fault detection relationship sent by another node in the cluster, and determining the fault detection topology information based on the received fault detection relationship; or deducing the fault detection topology information according to a preset rule. 9. A troubleshooting device, comprising: a transceiver configured to communicate with a detection node; a memory storing instructions; and a processor in communication with the transceiver and the memory, the processor executing the instructions to perform: obtaining fault detection topology information of a cluster, fault detection being performed on one node in the cluster by at least one other node in the cluster, the fault detection topology information comprising a fault detection relationship between the detection node and a detected node in the cluster; receiving a fault indication message from the detection node, the fault indication message indicating unreachability from the detection node to the detected node; determining a sub-cluster of the cluster based on the fault detection topology information and the fault indication message, with nodes belonging to different sub-clusters of the cluster being unreachable to each other; and determining a working cluster based on the determined sub-cluster. 10. The troubleshooting device according to claim 9 , wherein the determining the working cluster based on the determined sub-cluster comprises any one of: determining, as the working cluster, a sub-cluster having a largest quantity of nodes; determining, as the working cluster, a sub-cluster comprising a seed node and having a largest quantity of nodes, wherein the seed node is a preconfigured node and a non-seed node joins the cluster using the seed node; determining, as the working cluster, a sub-cluster comprising a largest quantity of seed nodes; or determining, as the working cluster, a sub-cluster having a largest quantity of nodes running a main service. 11. The troubleshooting device according to claim 9 , wherein the determining the working cluster based on the determined sub-cluster comprises: determining the working cluster based on a health status or a resource availability status of a node in the sub-cluster, the health status of the node being determined based on a period of time in which the node makes a response to a detection packet. 12. The troubleshooting device according to claim 9 , wherein the determining the sub-cluster based on the fault detection topology information and the fault indication message comprises: determining a fault detection relationship topology between nodes based on the fault detection topology information; deleting, from the fault detection relationship topology, an edge corresponding to the fault indication message to obtain an updated fault detection relationship topology; determining a connected subgraph of the updated fault detection relationship topology; and determining the sub-cluster based on the determined connected subgraph of the updated fault detection relationship topology. 13. The troubleshooting device according to claim 9 , wherein the determining the sub-clu

Assignees

Huawei Tech Co Ltd

Inventors

Classifications

H04L41/12
Discovery or management of network topologies · CPC title
H04L43/20
the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV · CPC title
H04L41/40
using virtualisation of network functions or resources, e.g. SDN or NFV entities · CPC title
H04L41/0686Primary
Additional information in the notification, e.g. enhancement of specific meta-data · CPC title
H04L41/065Primary
involving logical or physical relationship, e.g. grouping and hierarchies · CPC title

Patent family

Related publications grouped by family.

View patent family 65001087

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11115263B2 cover?: Embodiments of this application relate to an intra-cluster node troubleshooting method and device. The method includes: obtaining fault detection topology information of a cluster, where the fault detection topology information includes a fault detection relationship between all nodes in the cluster; obtaining a fault indication message, where the fault indication message is used to indicate un…
Who is the assignee on this patent?: Huawei Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification H04L41/0686. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Input/output fencing optimization

Systems and methods for preventing split-brain scenarios in high-availability clusters

Systems and methods for preventing failures of nodes in clusters

Systems and methods for changing fencing modes in clusters

Frequently asked questions