Method, apparatus, and system for handling virtual machine internal fault
US-9483368-B2 · Nov 1, 2016 · US
US10223183B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10223183-B2 |
| Application number | US-201615136690-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 22, 2016 |
| Priority date | Oct 24, 2013 |
| Publication date | Mar 5, 2019 |
| Grant date | Mar 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for quickly detecting a fault includes: detecting, by a Kernel Black Box KBox set, a fault occurred in an operation system; and generating, by the KBox set, fault information based on the detected fault; and transmitting, by the KBox set, system fault notification information including the fault information to an application high availability HA subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem. Thus, the fault or unhealthiness of an OS is detected rapidly and a service application layer is timely notified to process the fault, thus reducing service loss.
Opening claim text (preview).
What is claimed is: 1. A method for detecting a fault, comprising: detecting, by a processor, that a fault occurred in an operating system of a virtual machine; generating, by the processor, fault information based on detecting the fault; and transmitting, by the processor, system fault notification information including the fault information to an application high availability (HA) subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem, wherein the management unit is an infrastructure management unit (IMU) provided in a host operating system; wherein the detecting, by the processor, that the fault occurred comprises: detecting, by the processor, that the fault occurred in a guest operating system; wherein the generating, by the processor, the fault information based on detecting the fault comprises: generating, by the processor, first fault information based on detecting the fault, wherein the first fault information includes a fault reason, a fault description, a fault occurrence time, and a virtual machine identifier corresponding to the fault, and wherein the transmitting, by the processor, the system fault notification information comprises: transmitting, by the processor, the first fault information to the IMU, wherein the IMU transmits first system fault notification information to an application HA subsystem corresponding to the virtual machine, to trigger the service fault processing of the application HA subsystem, wherein the first system fault notification information includes the first fault information and the virtual machine identifier corresponding to the virtual machine. 2. The method according to claim 1 , wherein the detecting, by the processor, that the fault occurred comprises: determining, by the processor, an unexpected resetting of the operating system based on an execution flow detected by a probe used for a resetting process of the operating system, in a process of detecting the unexpected resetting of the operating system; determining, by the set processor, a memory exhaustion of the operating system in a case that a probe used for a memory resource distributing process of the operating system detects that a memory of the operating system is less than or equal to a preset threshold or the probe detects that the memory of the operating system is less than or equal to a preset threshold in a preset period, in a process of detecting the memory exhaustion of the operating system; determining, by the processor, that the kernel of the operating system is locked up in a case that a probe, arranged at a central processing unit (CPU) and used for detecting an operation state, detects that the CPU is in a deadlock state, in a process of detecting the lockup of the kernel of the operating system; determining, by the processor, a kernel crash of the operating system based on an execution flow detected by a probe used for a kernel crash processing process of the operating system, in a process of detecting the kernel crash of the operating system; determining, by the processor, a fault of the CPU based on an interruption and fault reason transmitted by a probe which is arranged in the operating system and used for detecting a hardware fault interruption of the CPU, in a process of detecting a hardware fault of the CPU; and determining, by the processor, that a virtual machine is to be reset in a case that the processor detects a reset interruption of the virtual machine transmitted on the infrastructure layer, in a process of detecting a resetting of the virtual machine. 3. A method for detecting a fault of a virtual machine, comprising: receiving, by an application high availability (HA) subsystem, system fault notification information including fault information from a processor via a management unit of an infrastructure layer, wherein the management unit comprises an infrastructure management unit (IMU) provided in a host operating system; and triggering, by the application HA subsystem, a service fault processing of the application HA subsystem based on the system fault notification information, wherein the receiving, by the application HA subsystem, the system fault notification comprises: receiving, by the application HA subsystem, first system fault notification information from the IMU, wherein the first system fault notification information includes first fault information and a virtual machine identifier corresponding to the virtual machine, wherein the first fault information includes a fault reason, a fault description, a fault occurrence time, and a virtual machine identifier corresponding to the fault; and the triggering, by the application HA subsystem, the service fault processing of the application HA subsystem based on the system fault notification information comprises: triggering, by the application HA subsystem, the service fault processing of the application HA subsystem based on the first system fault notification information. 4. A non-transitory computer readable storage medium storing program codes that, when executed, cause a device to detect a fault of a virtual machine, by performing the steps of: detect, by a processor, that a fault occurred in an operating system of the virtual machine and generate fault information based on the fault detected by the processor, wherein detecting the fault occurred comprises: detecting, by the processor, that a fault occurred in a guest operating system; generate, by the processor, first fault information based on detecting the fault, wherein the first fault information includes a fault reason, a fault description, a fault occurrence time, and a virtual machine identifier corresponding to the fault; and transmit, by the processor, system fault notification information including the fault information to an application high availability (HA) subsystem via a management unit of an infrastructure layer, to trigger a service fault processing of the application HA subsystem, wherein the management unit is an infrastructure management unit (IMU) provided in a host operating system, wherein transmitting the system fault notification information comprises: transmitting, by the processor, the first fault information to the IMU, wherein the IMU transmits first system fault notification information to an application HA subsystem corresponding to the virtual machine, to trigger the service fault processing of the application HA subsystem, wherein the first system fault notification information includes the first fault information and a virtual machine identifier corresponding to the virtual machine. 5. The computer readable storage medium according to claim 4 , wherein the processor performs the program codes to: determine, by the processor, an unexpected resetting of the operating system based on an execution flow detected by a probe used for a resetting process of the operating system, in a process of detecting the unexpected resetting of the operating system; determine, by the processor, a memory exhaustion of the operating system in a case that a probe used for a memory resource distributing process of the operating system detects that a memory of the operating system is less than or equal to a preset threshold or the probe detects that a memory of the operating system is less than or equal to a preset threshold in a preset period, in a process of detecting the memory exhaustion of the operating system; determine, by the processor, that a kernel of the operating system is locked up in a case that a probe, arranged at a central processing unit (CPU) and used for detecting an operating state, detects that the CPU is in a deadlock state, in a process of detecting the lockup of the kernel of the operating system; determine, by the processor, a kernel cra
Monitoring or debugging support · CPC title
Hypervisor-specific management and integration aspects · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers · CPC title
in a virtual computing platform, e.g. logically partitioned systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.