Rapid fault detection method and device

US10223183B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10223183-B2
Application numberUS-201615136690-A
CountryUS
Kind codeB2
Filing dateApr 22, 2016
Priority dateOct 24, 2013
Publication dateMar 5, 2019
Grant dateMar 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for quickly detecting a fault includes: detecting, by a Kernel Black Box KBox set, a fault occurred in an operation system; and generating, by the KBox set, fault information based on the detected fault; and transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem. Thus, the fault or unhealthiness of an OS is detected rapidly and a service application layer is timely notified to process the fault, thus reducing service loss.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting a fault, comprising: detecting, by a processor, that a fault occurred in an operating system of a virtual machine; generating, by the processor, fault information based on detecting the fault; and transmitting, by the processor, system fault notification information including the fault information to an application high availability (HA) subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem, wherein the management unit is an infrastructure management unit (IMU) provided in a host operating system; wherein the detecting, by the processor, that the fault occurred comprises: detecting, by the processor, that the fault occurred in a guest operating system; wherein the generating, by the processor, the fault information based on detecting the fault comprises: generating, by the processor, first fault information based on detecting the fault, wherein the first fault information includes a fault reason, a fault description, a fault occurrence time, and a virtual machine identifier corresponding to the fault, and wherein the transmitting, by the processor, the system fault notification information comprises: transmitting, by the processor, the first fault information to the IMU, wherein the IMU transmits first system fault notification information to an application HA subsystem corresponding to the virtual machine, to trigger the service fault processing of the application HA subsystem, wherein the first system fault notification information includes the first fault information and the virtual machine identifier corresponding to the virtual machine. 2. The method according to claim 1 , wherein the detecting, by the processor, that the fault occurred comprises: determining, by the processor, an unexpected resetting of the operating system based on an execution flow detected by a probe used for a resetting process of the operating system, in a process of detecting the unexpected resetting of the operating system; determining, by the set processor, a memory exhaustion of the operating system in a case that a probe used for a memory resource distributing process of the operating system detects that a memory of the operating system is less than or equal to a preset threshold or the probe detects that the memory of the operating system is less than or equal to a preset threshold in a preset period, in a process of detecting the memory exhaustion of the operating system; determining, by the processor, that the kernel of the operating system is locked up in a case that a probe, arranged at a central processing unit (CPU) and used for detecting an operation state, detects that the CPU is in a deadlock state, in a process of detecting the lockup of the kernel of the operating system; determining, by the processor, a kernel crash of the operating system based on an execution flow detected by a probe used for a kernel crash processing process of the operating system, in a process of detecting the kernel crash of the operating system; determining, by the processor, a fault of the CPU based on an interruption and fault reason transmitted by a probe which is arranged in the operating system and used for detecting a hardware fault interruption of the CPU, in a process of detecting a hardware fault of the CPU; and determining, by the processor, that a virtual machine is to be reset in a case that the processor detects a reset interruption of the virtual machine transmitted on the infrastructure layer, in a process of detecting a resetting of the virtual machine. 3. A method for detecting a fault of a virtual machine, comprising: receiving, by an application high availability (HA) subsystem, system fault notification information including fault information from a processor via a management unit of an infrastructure layer, wherein the management unit comprises an infrastructure management unit (IMU) provided in a host operating system; and triggering, by the application HA subsystem, a service fault processing of the application HA subsystem based on the system fault notification information, wherein the receiving, by the application HA subsystem, the system fault notification comprises: receiving, by the application HA subsystem, first system fault notification information from the IMU, wherein the first system fault notification information includes first fault information and a virtual machine identifier corresponding to the virtual machine, wherein the first fault information includes a fault reason, a fault description, a fault occurrence time, and a virtual machine identifier corresponding to the fault; and the triggering, by the application HA subsystem, the service fault processing of the application HA subsystem based on the system fault notification information comprises: triggering, by the application HA subsystem, the service fault processing of the application HA subsystem based on the first system fault notification information. 4. A non-transitory computer readable storage medium storing program codes that, when executed, cause a device to detect a fault of a virtual machine, by performing the steps of: detect, by a processor, that a fault occurred in an operating system of the virtual machine and generate fault information based on the fault detected by the processor, wherein detecting the fault occurred comprises: detecting, by the processor, that a fault occurred in a guest operating system; generate, by the processor, first fault information based on detecting the fault, wherein the first fault information includes a fault reason, a fault description, a fault occurrence time, and a virtual machine identifier corresponding to the fault; and transmit, by the processor, system fault notification information including the fault information to an application high availability (HA) subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem, wherein the management unit is an infrastructure management unit (IMU) provided in a host operating system, wherein transmitting the system fault notification information comprises: transmitting, by the processor, the first fault information to the IMU, wherein the IMU transmits first system fault notification information to an application HA subsystem corresponding to the virtual machine, to trigger the service fault processing of the application HA subsystem, wherein the first system fault notification information includes the first fault information and a virtual machine identifier corresponding to the virtual machine. 5. The computer readable storage medium according to claim 4 , wherein the processor performs the program codes to: determine, by the processor, an unexpected resetting of the operating system based on an execution flow detected by a probe used for a resetting process of the operating system, in a process of detecting the unexpected resetting of the operating system; determine, by the processor, a memory exhaustion of the operating system in a case that a probe used for a memory resource distributing process of the operating system detects that a memory of the operating system is less than or equal to a preset threshold or the probe detects that a memory of the operating system is less than or equal to a preset threshold in a preset period, in a process of detecting the memory exhaustion of the operating system; determine, by the processor, that a kernel of the operating system is locked up in a case that a probe, arranged at a central processing unit (CPU) and used for detecting an operating state, detects that the CPU is in a deadlock state, in a process of detecting the lockup of the kernel of the operating system; determine, by the processor, a kernel cra

Assignees

Inventors

Classifications

  • Monitoring or debugging support · CPC title

  • Hypervisor-specific management and integration aspects · CPC title

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers · CPC title

  • in a virtual computing platform, e.g. logically partitioned systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10223183B2 cover?
A method for quickly detecting a fault includes: detecting, by a Kernel Black Box KBox set, a fault occurred in an operation system; and generating, by the KBox set, fault information based on the detected fault; and transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an in…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F11/0772. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).