Coordinated reliability management of virtual machines in a virtualized system

US9069730B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9069730-B2
Application numberUS-49360209-A
CountryUS
Kind codeB2
Filing dateJun 29, 2009
Priority dateJun 29, 2009
Publication dateJun 30, 2015
Grant dateJun 30, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and methods for reliability management of virtual machines in a host system. The reliability of the host system is monitored and compared with a reliability threshold level for a virtual machine. If the reliability of the host system drops below the reliability threshold level, the virtual machine is migrated to another host system having an appropriate level of reliability.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for reliability management of virtual machines in a node of a host system, the method comprising: determining a reliability requirement of a virtual machine hosted by a first node of a computing system; continuously monitoring system parameters of a plurality of other nodes of the computing system as system parameters change over time; determining a reliability state for each of the plurality of other nodes based on the changing system parameters; monitoring, with virtualization sensors coupled to a registry and proxy service, for at least one event affecting host reliability of the first node; modeling the host reliability using a reliability model that is stored in the host; and when the host reliability is less than a predetermined reliability threshold for the virtual machine, identifying a second node of the plurality of other nodes whose reliability state satisfies the reliability requirement of the virtual machine and migrating the virtual machine from memory of the first node to memory of the second node. 2. The method of claim 1 wherein migrating the virtual machine comprises migrating the virtual machine to a second host having a host reliability greater than the predetermined reliability threshold. 3. The method of claim 1 wherein the at least one event affecting host reliability comprise at least one of disk errors, processor machine check events, or bus errors. 4. The method of claim 1 wherein monitoring the at least one event affecting host reliability comprises scanning system logs that track reliability altering events. 5. The method of claim 4 wherein the system logs comprise at least one of bus errors, disk errors, or temperature errors. 6. The method of claim 1 and further including modeling the host reliability using a reliability model that is learned dynamically from a learning algorithm. 7. The method of claim 6 wherein the learning algorithm is comprised of Bayesian machine learning algorithm. 8. A method for reliability management of virtual machines in a node of a host system, the method comprising: determining a reliability threshold of a virtual machine hosted by a host node of a computing system; continuously monitoring system parameters of a plurality of other nodes of the computing system as system parameters change over time; scanning, with virtualization sensors coupled to a registry and proxy service, the computing system for events that affect reliability of the host node; estimating a reliability state for each of the plurality of other nodes in response to the system parameters; determining an estimated reliability of the host node, wherein the estimated reliability of the host node is found in a range of reliability levels determined by monitored host node parameters and a reliability model; and matching the virtual machine to a selected node of the plurality of other nodes whose estimated reliability state satisfies a reliability requirement of the virtual machine in response to the estimated reliability of the host node is less than the reliability threshold of the virtual machine. 9. The method of claim 8 and further including migrating a resident virtual machine in the host node to a second node if the resident virtual machine has a higher reliability threshold than the estimated reliability of the host node. 10. The method of claim 8 wherein scanning the host node for events that affect reliability comprise monitoring sensor data of the host node. 11. The method of claim 10 wherein monitoring the sensor data comprises monitoring at least one of Intelligent Platform Management Interface sensor data, processor machine check event sensor data, disk error data, virtual machine utility sensor data, and service level agreement data. 12. A host system comprising: a processor; a reliability manager configured to continuously monitor system parameters of a plurality of nodes of a computing system as system parameters change over time; a plurality of sensor inputs configured to indicate a status of each of the system parameters and at least one event affecting reliability of a host node of the plurality of nodes hosting a virtual machine; determining a hardware reliability state for each of the plurality of nodes, wherein the hardware reliability state is found in a range of reliability levels determined by the monitored system parameters and a reliability model; and at least one coordinator coupled to a registry and proxy service, the at least one coordinator configured to map the hardware reliability states of the plurality of nodes to a reliability requirement of the virtual machine and, in response to the hardware reliability state of the host node is less than a reliability threshold of the virtual machine, to initiate virtual machine migration to a selected node of the plurality of nodes, wherein the selected node is a node whose hardware reliability state satisfies the reliability requirement of the virtual machine. 13. The host system of claim 12 and further including the registry and proxy service coupled to the at least one coordinator for providing discovery, meta-data registration and proxying of calls to sensors and actuators. 14. The host system of claim 13 wherein a system node further comprises virtualization sensors and actuators coupled to the registry and proxy service for monitoring. 15. The host system of claim 12 and further at least one management node configured to manage virtual machines in a plurality of system nodes and a stabilizer coupled between the management node and the plurality of system nodes to monitor system stability and prevent the at least one coordinator from performing redundant actions. 16. The host system of claim 12 wherein the host system is a storage node configured to storage data. 17. The host system of claim 12 wherein the at least one coordinator is further configured to perform proactive backup of virtual machine data on a storage node.

Assignees

Inventors

Classifications

  • Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents (software debugging using additional hardware using a specific debug interface G06F11/3656; performance evaluation by tracing or monitoring G06F11/3466) · CPC title

  • Error avoidance (G06F11/07 and subgroups take precedence) · CPC title

  • G06F11/203Primary

    using migration · CPC title

  • Virtual · CPC title

  • by exceeding limits · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9069730B2 cover?
A system and methods for reliability management of virtual machines in a host system. The reliability of the host system is monitored and compared with a reliability threshold level for a virtual machine. If the reliability of the host system drops below the reliability threshold level, the virtual machine is migrated to another host system having an appropriate level of reliability.
Who is the assignee on this patent?
Talwar Vanish, Kumar Sanjay, Ranganathan Parthasarathy, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F11/203. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 30 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).