Rapid fault detection method and device

US2016239369A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016239369-A1
Application numberUS-201615136690-A
CountryUS
Kind codeA1
Filing dateApr 22, 2016
Priority dateOct 24, 2013
Publication dateAug 18, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for quickly detecting a fault includes: detecting, by a Kernel Black Box KBox set, a fault occurred in an operation system; and generating, by the KBox set, fault information based on the detected fault; and transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem. Thus, the fault or unhealthiness of an OS is detected rapidly and a service application layer is timely notified to process the fault, thus reducing service loss.

First claim

Opening claim text (preview).

1 . A method for detecting a fault, comprising: detecting, by a Kernel Black Box (KBox) set, that a fault occurred in an operating system; generating, by the KBox set, fault information based on detecting the fault; and transmitting, by the KBox set, system fault notification information including the fault information to an application high availability (HA) subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem. 2 . The method according to claim 1 , wherein the detecting, by the KBox set, that the fault occurred comprises: determining, by the KBox set, an unexpected resetting of the operating system based on an execution flow detected by a probe used for a resetting process of the operating system, in a process of detecting the unexpected resetting of the operating system; determining, by the KBox set, a memory exhaustion of the operating system in a case that a probe used for a memory resource distributing process of the operating system detects that a memory of the operating system is less than or equal to a preset threshold or the probe detects that the memory of the operating system is less than or equal to a preset threshold in a preset period, in a process of detecting the memory exhaustion of the operating system; determining, by the KBox set, that the kernel of the operating system is locked up in a case that a probe, arranged at a central processing unit (CPU) and used for detecting an operation state, detects that the CPU is in a deadlock state, in a process of detecting the lockup of the kernel of the operating system; determining, by the KBox set, a kernel crash of the operating system based on an execution flow detected by a probe used for a kernel crash processing process of the operating system, in a process of detecting the kernel crash of the operating system; determining, by the KBox set, a fault of the CPU based on an interruption and fault reason transmitted by a probe which is arranged in the operating system and used for detecting a hardware fault interruption of the CPU, in a process of detecting a hardware fault of the CPU; and determining, by the KBox set, that a virtual machine is to be reset in a case that the KBox set detects a reset interruption of the virtual machine transmitted on the infrastructure layer, in a process of detecting a resetting of the virtual machine. 3 . The method according to claim 1 , wherein the detecting, by the KBox set, that the fault occurred and generating, by the KBox set, the fault information based on the detected fault comprises: detecting, by a first KBox of the KBox set, a fault occurred in a guest operating system and generating, by the first KBox, first fault information based on the detected fault, wherein the first KBox is provided in a virtual machine, and the first fault information includes a fault reason, a fault description, a fault occurrence time and a virtual machine identifier corresponding to the fault; and wherein the management unit is an infrastructure management unit (IMU) provided in a host operating system, and the transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem comprises: transmitting, by the first KBox, the first fault information to the IMU, wherein the IMU transmits first system fault notification information to an application HA subsystem corresponding to the virtual machine, to trigger the service fault processing of the application HA subsystem, wherein the first system fault notification information includes the first fault information and the virtual machine identifier corresponding to the first KBox. 4 . The method according to claim 1 , wherein the detecting, by the KBox set, that the fault occurred and generating, by the KBox set, the fault information comprises: detecting, by a second KBox of the KBox set, a fault occurred in a host operating system and generating, by the second KBox, second fault information based on the detected fault, wherein the second KBox is provided in the host operating system and the second fault information includes a fault reason, a fault description and a fault occurrence time; and wherein the management unit is a board management controller (BMC) implemented in hardware, and the transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an infrastructure layer, to trigger the service fault processing of the application HA subsystem comprises: transmitting, by the second KBox, the second fault information to the BMC, wherein the BMC transmits second system fault notification information including the second fault information to an application HA subsystem corresponding to at least one virtual machine, to trigger the service fault processing of the application HA subsystem, and the at least one virtual machine is established on the host operating system. 5 . The method according to claim 1 , further comprising: transmitting, by the KBox set, the fault information to an infrastructure HA subsystem via the management unit of the infrastructure layer, wherein the infrastructure HA subsystem transmits the system fault notification information including the fault information to the application HA subsystem. 6 . A method for detecting a fault, comprising: receiving, by an application high availability (HA) subsystem, system fault notification information including fault information from a Kernel Black Box (KBox) set via a management unit of an infrastructure layer; and triggering, by the application HA subsystem, a service fault processing of the application HA subsystem based on the system fault notification information. 7 . The method according to claim 6 , wherein the management unit is an infrastructure management unit (IMU) provided in a host operating system; and wherein the receiving, by an application HA subsystem, system fault notification information including fault information from a KBox set via a management unit of an infrastructure layer comprises: receiving, by the application HA subsystem, first system fault notification information from the IMU, wherein the first system fault notification information includes first fault information and a virtual machine identifier corresponding to a first KBox of the KBox set, wherein the first KBox is provided in a virtual machine, and the first fault information includes a fault reason, a fault description, a fault occurrence time and a virtual machine identifier corresponding to the fault; and the triggering, by the application HA subsystem, a service fault processing of the application HA subsystem based on the system fault notification information comprises: triggering, by the application HA subsystem, the service fault processing of the application HA subsystem based on the first system fault notification information. 8 . The method according to claim 6 , wherein the management unit is a board management controller (BMC) implemented in hardware; and wherein the receiving, by an application HA subsystem, system fault notification information including fault information from a KBox set via a management unit of an infrastructure layer comprises: receiving, by the application HA subsystem, second system fault notification information from a second KBox of the KBox set via the BMC, wherein the second system fault notification information includes second fault information, the second KBox is provided in the host operating system, and the second fault information

Assignees

Inventors

Classifications

  • Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers · CPC title

  • in a virtual computing platform, e.g. logically partitioned systems · CPC title

  • Monitoring or debugging support · CPC title

  • within a central processing unit [CPU] · CPC title

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016239369A1 cover?
A method for quickly detecting a fault includes: detecting, by a Kernel Black Box KBox set, a fault occurred in an operation system; and generating, by the KBox set, fault information based on the detected fault; and transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an in…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F11/0772. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 18 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).