Detecting high availability readiness of a distributed computing system

US9454416B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9454416-B2
Application numberUS-201414513881-A
CountryUS
Kind codeB2
Filing dateOct 14, 2014
Priority dateOct 14, 2014
Publication dateSep 27, 2016
Grant dateSep 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technology is disclosed for determining high availability readiness of a distributed computing system (“system”). A confidence measure (CM) can be computed for a particular controller in the system to determine whether a takeover by the particular controller from a first controller would be successful. The CM can be a percentage value. A CM of 0% indicates that a takeover would be a failure, which results in loss of access to data managed by the first controller. A CM of 100% indicates a successful takeover with no performance impact on the system. A CM between 0% and 100% indicates a successful takeover but with a performance impact. The CM can be computed based on events occurring in the system, e.g., veto and non-veto events. The CM is computed as a function of various weights and/or indices associated with the veto events and/or non-veto events.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method, comprising: receiving a list of multiple events that have occurred in a distributed computing system over a specified period, the events related to a first computer node and a second computer node of the distributed computing system, the first computer node configured to manage a data access request received from a client computer node for data stored at a storage system associated with the first computer node, the second computer node configured to take over from the first computer node in case the first computer node becomes unavailable; determining, based on an event classification policy, a set of non-veto events and a set of veto events related to the second computer node from the events; retrieving, based on the event classification policy, a severity index and a compliance factor for each event of the set of non-veto events; and computing a confidence measure of the second computer node as a function of the set of veto events and the severity index and the compliance factor of the set of non-veto events. 2. The computer-implemented method of claim 1 , wherein the confidence measure indicates at least one of whether the takeover by the second computer node fails, which results in a loss of access to the storage system or a magnitude of an impact on a performance of the distributed computing system if the second computer node takes over from the first computer node. 3. The computer-implemented method of claim 1 , wherein the severity index of an event of the set of non-veto events indicates a magnitude of performance impact on the distributed computing system due to the occurrence of the event. 4. The computer-implemented method of claim 3 , wherein the severity index of the event is recorded in real-time as the event occurs. 5. The computer-implemented method of claim 1 , wherein the compliance factor an event of the set of non-veto events indicates a deviation of the event from an expected behavior of the event. 6. The computer-implemented method of claim 1 , wherein the set of non-veto events are events that have an adverse impact on computing resources of the distributed computing system if the second computer node takes over from the first computer node, the set of non-veto events excluding events that cause the takeover to fail. 7. The computer-implemented method of claim 1 , wherein the confidence measure having the value equal to a first threshold indicates that the takeover by the second computer node fails. 8. The computer-implemented method of claim 1 , wherein the confidence measure having the value equal to a second threshold indicates that there is no adverse impact on computing resources of the distributed computing system if the second computer node takes over from the first computer node. 9. The computer-implemented method of claim 7 further comprising: determining that the value of the confidence measure is below a specified threshold, the specified threshold being between the first threshold and a second threshold; and generating a notification indicating the value of the confidence measure. 10. The computer-implemented method of claim 9 , wherein generating the notification includes generating a list of tasks to be performed to increase the confidence measure above the specified threshold. 11. The computer-implemented method of claim 7 , wherein computing the confidence measure of the second computer node includes: determining that the set of veto events is not a null set, which indicates that a veto event related to the second computer node occurred in the distributed computing system, the veto event causing the takeover by the second computer node to fail and result in loss of access to data stored at the storage system, and computing the value of the confidence measure as equal to the first threshold if the set of veto events is not a null set. 12. A computer-readable storage medium storing computer-executable instructions comprising: instructions for identifying, in a distributed computing system having a first computer node and a second computer node, among multiple events that have occurred over a specified period, a set of non-veto events and a set of veto events related to the second computer node, the first computer node configured to manage a data access request received from a client computer node for accessing data stored at a storage system associated with the first computer node, the second computer node configured to take over from the first computer node to respond to the data access request in case the first computer node becomes unavailable; instructions for computing, based on an event classification policy, a weight of each of the set of non-veto events as a function of a severity index of the corresponding event and the set of non-veto events; instructions for computing, based on the event classification policy, a primitive value of each of the set of non-veto events as a function of a compliance factor of the corresponding event and the weight of the corresponding event; and instructions for computing a confidence measure of the second computer node as a function of the primitive values of the set of non-veto events and the set of veto events. 13. The computer-readable storage medium of claim 12 , wherein the confidence measure having the value equal to a first threshold indicates that the takeover by the second computer node fails resulting in a loss of access to the storage system if the first computer node fails. 14. The computer-readable storage medium of claim 12 , wherein the confidence measure having the value equal to a second threshold indicates that there is no adverse impact on computing resources of the distributed computing system if the second computer node takes over from the first computer node. 15. The computer-readable storage medium of claim 13 , wherein the instructions for computing the confidence measure of the second computer node includes: instructions for determining that the set of veto events is not a null set, which indicates that a veto event related to the second computer node occurred in the distributed computing system, the veto event causing the takeover by the second compute node to fail resulting in a loss of access to data stored at the storage system, and computing the value of the confidence measure as equal to the first threshold. 16. The computer-readable storage medium of claim 12 further comprising: instructions for determining that the value of the confidence measure is below a specified threshold, the specified threshold being between a first threshold and a second threshold; and instructions for generating a notification indicating the value of the confidence measure, the notification including a list of tasks to be performed to increase the confidence measure above the specified threshold. 17. The computer-readable storage medium of claim 12 , wherein the instructions for computing the weight of each of the set of non-veto events includes instructions for computing the weight of an event i as Wi=SI i /(Σ i=1 n SIi) where W i is the weight of the event i, SI i is a severity index of the event i and n is the number of events. 18. The computer-readable storage medium of claim 12 , wherein the instructions for computing the primitive value of each of the set of non-veto events includes instructions for computing the primitive value of an event i as P i =W i ·CF i where P i is the primitive value of the event i and CF i is the compliance factor of the event i. 19. The computer-readable stor

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9454416B2 cover?
Technology is disclosed for determining high availability readiness of a distributed computing system (“system”). A confidence measure (CM) can be computed for a particular controller in the system to determine whether a takeover by the particular controller from a first controller would be successful. The CM can be a percentage value. A CM of 0% indicates that a takeover would be a failure, wh…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).