Who is the assignee on this patent?

Beijing Baidu Netcom Science And Tech Ltd, Beijing Baidu Netcom Sci & Tec

What technology area does this patent fall under?

Primary CPC classification G06F11/1438. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for monitoring virtual machine cluster

US10152382B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10152382-B2
Application number	US-201615239612-A
Country	US
Kind code	B2
Filing date	Aug 17, 2016
Priority date	Oct 26, 2015
Publication date	Dec 11, 2018
Grant date	Dec 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for monitoring a virtual machine cluster comprising sending, by physical machine, state parameter query instruction to a virtual machine in the virtual machine cluster at a first preset time interval; sending response information to the physical machine in response to receiving the query; the physical machine determining that the virtual machine is faulty, in response to the response information beyond a second preset time, judging whether the faulty machine satisfies a restart condition, and sending a restart instruction to a second machine on which the faulty machine runs, if the faulty machine satisfies the restart condition, by the virtual machine; and restarting, the second physical machine, the faulty virtual machine according to the restart instruction. The disclosure can be used to monitor virtual machines and recover a faulty virtual machine, thereby improving the availability of the virtual machine cluster and shortening service intervals.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for monitoring a virtual machine cluster, comprising: sending, by a first physical machine, a virtual machine state parameter query instruction to a virtual machine in the virtual machine cluster at a first preset time interval; sending, by the virtual machine, response information to the first physical machine in response to receiving the query instruction; determining, by the first physical machine, that the virtual machine is faulty, in response to the response information beyond a second preset time, judging, by the first physical machine, whether the faulty virtual machine satisfies a preset restart condition, and sending, by the first physical machine, a virtual machine restart instruction to a second physical machine on which the faulty virtual machine runs, if the faulty virtual machine satisfies the preset restart condition; restarting, by the second physical machine, the faulty virtual machine according to the virtual machine restart instruction; sending, by the second physical machine, a restart response signal to the first physical machine, when restarting the faulty virtual machine; obtaining, by the first physical machine, an address of the faulty virtual machine from pre-recorded meta-information of virtual machines in response to receiving the restart response signal, connecting, by the first physical machine, to the restarted virtual machine according to the address, and sending, by the first physical machine, a first service process restart signal to the restarted virtual machine; and starting, by the restarted virtual machine, a service process of the restarted virtual machine according to the first service process restart signal. 2. The method according to claim 1 , wherein the sending of a virtual machine restart instruction to a second physical machine on which the faulty virtual machine runs, if the faulty virtual machine satisfies the preset restart condition, comprises: sending the virtual machine restart instruction to the second physical machine, if a ratio of the faulty virtual machines is smaller than a preset ratio; or sending the virtual machine restart instruction to the second physical machine, if an interval from a preceding virtual machine restart or reconstruction of the faulty virtual machine exceeds a third preset time. 3. The method according to claim 1 , further comprising: determining, by the first physical machine, a restart failure of the faulty virtual machine in response to the restart response signal being not received within a preset time after sending the virtual machine restart instruction, and sending, by the first physical machine, a virtual machine reconstruction instruction to a third physical machine in response to times of the restart failure reaching preset times, wherein the third physical machine is a physical machine, except for the second physical machine, in a host physical machine cluster of the virtual machine cluster; and reconstructing, by the third physical machine, the faulty virtual machine according to the virtual machine reconstruction instruction. 4. The method according to claim 3 , further comprising: sending, by the third physical machine, a reconstruction response signal to the first physical machine; obtaining, by the first physical machine, meta-information of the faulty virtual machine from the meta-information of the virtual machines in response to receiving the reconstruction response signal, and sending, by the first physical machine, a node recovery instruction to the reconstructed virtual machine according to the obtained meta-information; and downloading, by the reconstructed virtual machine, previously backed-up incremental data associated with a previous management node from a remote storage according to the node recovery instruction, if it is determined that the reconstructed virtual machine is a management node according to the node recovery instruction; recovering, by the reconstructed virtual machine, metadata of the reconstructed management node based on the incremental data; accepting, by the reconstructed virtual machine, a registration of a computing node in the virtual machine cluster; and registering, by the reconstructed virtual machine, to the management node in the virtual machine cluster according to the node recovery instruction, if it is determined that the reconstructed virtual machine is a computing node according to the node recovery instruction. 5. The method according to claim 4 , further comprising: determining reconstruction success and sending a reconstruction success indication signal, by the reconstructed management node, to the first physical machine in response to a ratio of computing nodes in the virtual machine cluster registered within a preset time being larger than or equal to a preset ratio, and sending, by the reconstructed management node, a reconstruction failure indication alarm signal to the first physical machine in response to the ratio of computing nodes in the virtual machine cluster registered within the preset time being smaller than the preset ratio; and submitting, by the first physical machine, a received user job to the reconstructed management node according to the reconstruction success indication signal, and displaying, by the first physical machine, an alarm prompt according to the reconstruction failure indication alarm signal. 6. The method according to claim 5 , further comprising: executing the following operations by using the first physical machine: determining whether the faulty virtual machines comprise a management node and whether a ratio of faulty computing nodes exceeds a threshold according to the meta-information of the virtual machines; determining that the virtual machine cluster is faulty, in response to determining that the faulty virtual machines comprises a management node or the ratio the faulty computing nodes exceeds the threshold; continuing to receive user jobs, and stopping submitting user jobs to the management node in the virtual machine cluster, in response to the virtual machine cluster being faulty; judging whether the restarted or reconstructed virtual machines comprise a management node and whether the ratio of the faulty computing nodes exceeds the threshold, in response to the response information being from the restarted or reconstructed virtual machines; determining that the virtual machine cluster is recovered from a fault, in response to determining the restarted or reconstructed virtual machines comprising a management node and the ratio of the faulty computing nodes not exceeding the threshold; continuing to submit jobs to the management node in the virtual machine cluster, in response to the virtual machine cluster being recovered from the fault, determining whether a job running before the fault of the virtual machine cluster is incomplete according to job state information queried from the management node, if yes, submitting a next job, and if not, submitting the incomplete job, wherein the job state information is obtained by the management node according to a job log of the computing node; and continuing to receive user jobs and submitting the user jobs to the management node in the virtual machine cluster, in response to determining that the faulty virtual machines comprise no management node and the ratio of the faulty computing nodes does not exceed the threshold. 7. The method according to claim 6 , further comprising: periodically backing up, by the management node in the virtual machine cluster, incremental operation logs into the remote storage; and periodically merging, by the remote storage, the backed-up operation logs and deleting, by the remote storage, the operation logs prior to a merging time. 8. The method

Assignees

Inventors

Liu Hu

Classifications

G06F11/0757
by exceeding a time limit, i.e. time-out, e.g. watchdogs · CPC title
G06F11/1438Primary
Restarting or rejuvenating · CPC title
G06F11/2035
without idle spare hardware · CPC title
G06F11/1484Primary
involving virtual machines · CPC title
G06F2201/815
Virtual · CPC title

Patent family

Related publications grouped by family.

View patent family 55332914

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10152382B2 cover?: A method and system for monitoring a virtual machine cluster comprising sending, by physical machine, state parameter query instruction to a virtual machine in the virtual machine cluster at a first preset time interval; sending response information to the physical machine in response to receiving the query; the physical machine determining that the virtual machine is faulty, in response to the…
Who is the assignee on this patent?: Beijing Baidu Netcom Science And Tech Ltd, Beijing Baidu Netcom Sci & Tec
What technology area does this patent fall under?: Primary CPC classification G06F11/1438. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Hypervisor remedial action for a virtual machine in response to an error message from the virtual machine

Methods and systems to hot-swap a virtual machine

Operation verification device for virtual apparatus, and operation verification system and program for virtual apparatus

Frequently asked questions