Fault tolerant method and system for multiple servers

US2016277271A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016277271-A1
Application numberUS-201615073744-A
CountryUS
Kind codeA1
Filing dateMar 18, 2016
Priority dateMar 19, 2015
Publication dateSep 22, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A fault tolerant method for multiple servers includes the following steps: sensing, by each server, a voltage of hardware of the server; receiving, by a cabinet manager, data of an operating state of a blade server and data of a voltage of hardware of each server; reading, by a monitoring server, the data of the operating state of the blade server and the data of the voltage of the hardware of a monitored server, where the data is transmitted by the monitored server in a cabinet manager; determining, by the monitoring server, whether the operating state of the blade server of the monitored server is faulty or whether the voltage of the hardware has no power supply; if the operating state of the blade server of the monitored server is faulty or the voltage of the hardware has no power supply, starting, by the monitoring server, a backup virtual machine; and restarting, by the cabinet manager, a faulty server.

First claim

Opening claim text (preview).

What is claimed is: 1 . A fault tolerant system for multiple servers, wherein the system comprises a first server, a second server, and a cabinet manager, and the first server and the second server monitor each other, wherein the first server comprises: a first voltage sensor, used to sense a voltage of hardware of the first server; a first virtual machine manager, used to manage an operation of a virtual machine in the first server; and a first monitor, used to read data of an operating state of a blade server of the second server and data of a voltage of hardware of the second server, wherein the data is transmitted by the second server monitored by the first server; determine whether the operating state of the blade server of the monitored second server is faulty or whether the voltage of the hardware has no power supply, and send a backup command to the first virtual machine manager so that the first virtual machine manager starts a backup virtual machine; the second server comprises: a second voltage sensor, used to sense a voltage of hardware of the second server; a second virtual machine manager, used to manage an operation of a virtual machine in the second server; and a second monitor, used to read data of an operating state of a blade server of the first server and data of a voltage of hardware of the first server, wherein the data is transmitted by the first server monitored by the second server; determine whether the operating state of the blade server of the monitored first server is faulty or whether the voltage of the hardware has no power supply, and send a backup command to the second virtual machine manager so that the second virtual machine manager starts the backup virtual machine; and the cabinet manager is used to receive the data of the operating states of the blade servers of the first server and the second server and the data of voltages of the hardware of the two servers, transmit the data to the first server or the second server, and restart the faulty first server or the faulty second server. 2 . The system according to claim 1 , wherein the first server comprises: a first Intelligent Platform Management Controller (IPMC), used to receive the data of the operating state of the blade server and data of the voltage sensed by the first voltage sensor, transmit the data to the cabinet manager, and receive the data, transmitted by the cabinet manager, of the voltage of the hardware of the second server monitored by the first server; a first Intelligent Platform Management Interface (IPMI) module, used to receive the data, transmitted by the first IPMC, of the voltage of the hardware of the second server monitored by the first server; a first fault detection library, used to store the data, transmitted by the first IPMI module, of the voltage of the hardware of the second server monitored by the first server; and the first monitor, used to read the data, in the first fault detection library, of the voltage of the hardware of the second server monitored by the first server; the second server comprises: a second IPMC, used to receive the data of the operating state of the blade server and data of the voltage sensed by the second voltage sensor, transmit the data to the cabinet manager, and receive the data, transmitted by the cabinet manager, of the voltage of the hardware of the first server monitored by the second server; a second IPMI module, used to receive the data, transmitted by the second IPMC, of the voltage of the hardware of the first server monitored by the second server; a second fault detection library, used to store the data, transmitted by the second IPMI module, of the voltage of the hardware of the first server monitored by the second server; and the second monitor, used to read the data, in the second fault detection library, of the voltage of the hardware of the first server monitored by the second server. 3 . The system according to claim 1 , further comprising: a virtual machine image file database, used to store execution data of the virtual machines of the first server and the second server, so that the first server or the second server reads virtual machine execution data corresponding to the backup virtual machine. 4 . A fault tolerant system for multiple servers, wherein the system comprises a first server, a second server, and a cabinet manager, and the first server and the second server monitor each other, wherein the first server comprises: a first watchdog timer, used to begin countdown from a timing value and send out a timing completion signal when the countdown ends; a first virtual machine manager, used to manage an operation of a virtual machine in the first server; a first watchdog updater, used to send a reset signal to the first watchdog timer after a reset time elapses, to update the first watchdog timer so that the first watchdog timer begins countdown from the timing value; and a first monitor, used to receive the timing completion signal that is transmitted by the second server monitored by the first server, and send a backup command according to the timing completion signal to the first virtual machine manager, so that the first virtual machine manager starts a backup virtual machine; the second server comprises: a second watchdog timer, used to begin countdown from the timing value and send out the timing completion signal when the countdown ends; a second virtual machine manager, used to manage an operation of a virtual machine in the second server; a second watchdog updater, used to send the reset signal to the second watchdog timer after the reset time elapses, to update the second watchdog timer so that the second watchdog timer begins countdown from the timing value; and a second monitor, used to receive the timing completion signal that is transmitted by the first server monitored by the second server, send the backup command according to the timing completion signal to the second virtual machine manager, so that the second virtual machine manager starts the backup virtual machine; and the cabinet manager is used to receive the timing completion signal of the first server and the second server and transmit the timing completion signal to the first server or the second server, and restart the faulty first server or the faulty second server. 5 . The system according to claim 4 , wherein the first server comprises: a first IPMC, used to receive the timing completion signal sent by the first watchdog timer, transmit the timing completion signal to the cabinet manager, and receive the timing completion signal, transmitted by the cabinet manager, of the second server monitored by the first server; a first IPMI module, used to receive the timing completion signal, transmitted by the first IPMC, of the second server monitored by the first server; and the first monitor, used to receive the timing completion signal, transmitted by the first IPMI module, of the second server monitored by the first server; the second server comprises: a second IPMC, used to receive the timing completion signal sent by the second watchdog timer, transmit the timing completion signal to the cabinet manager, and receive the timing completion signal, transmitted by the cabinet manager, of the first server monitored by the second server; a second IPMI module, used to receive the timing completion signal, transmitted by the second IPMC, of the first server monitored by the second server; and the second monitor, used to receive the timing completion signal, transmitted by the second IPMI module, of the first server monitored by the second server. 6 . The system according to claim 4 , further comprising: a virtual machine image file database, used to store execution data of the virtual machines of the fir

Assignees

Inventors

Classifications

  • by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure · CPC title

  • by checking functioning · CPC title

  • using network fault recovery (ring fault isolation or reconfiguration in loop networks without recovery actions by a network management system H04L12/437) · CPC title

  • Errors, e.g. transmission errors · CPC title

  • the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016277271A1 cover?
A fault tolerant method for multiple servers includes the following steps: sensing, by each server, a voltage of hardware of the server; receiving, by a cabinet manager, data of an operating state of a blade server and data of a voltage of hardware of each server; reading, by a monitoring server, the data of the operating state of the blade server and the data of the voltage of the hardware of …
Who is the assignee on this patent?
Univ Nat Central
What technology area does this patent fall under?
Primary CPC classification H04L43/0817. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Sep 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).