Graph database with links to underlying data
US-2015281011-A1 · Oct 1, 2015 · US
US9497072B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9497072-B2 |
| Application number | US-201414242861-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 1, 2014 |
| Priority date | Apr 1, 2014 |
| Publication date | Nov 15, 2016 |
| Grant date | Nov 15, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods for monitoring a networked computing environment and for consolidating multiple alarms under a single root cause are described. In some embodiments, in response to detecting an alert corresponding with a performance issue in a networked computing environment, a root cause identification tool may aggregate a plurality of alarms from a plurality of performance management tools monitoring the networked computing environment. The root cause identification tool may then generate a failure graph associated with the performance issue based on the plurality of alarms, determine a first set of leaf nodes of the failure graph, determine a first chain of failures based on the first set of leaf nodes, suppress (or hide) alarms that are not associated with the first chain of failures, and output a consolidated alarm associated with the first chain of failures.
Opening claim text (preview).
What is claimed is: 1. A method for monitoring a networked computing environment, comprising: detecting an alert corresponding with a performance issue within the networked computing environment; aggregating data from a plurality of monitoring applications monitoring the networked computing environment, the aggregated data comprises a plurality of alarms; generating a failure graph based on the aggregated data, the failure graph comprises a plurality of nodes and a set of directed edges, each directed edge of the set of directed edges corresponds with a causal relationship between a pair of the plurality of nodes, the alert corresponds with a root node of the failure graph; detecting a second alert corresponding with a second performance issue within the networked computing environment, the alert corresponds with a first application-level failure of a first application within the networked computing environment and the second alert corresponds with a second application-level failure of a second application within the networked computing environment different from the first application; generating a second failure graph based on the aggregated data, the second alert corresponds with a root node of the second failure graph; identifying a first set of leaf nodes associated with the failure graph; identifying a second set of leaf nodes associated with the second failure graph; and identifying a common leaf node that is common to both the failure graph and the second failure graph, the first set of leaf nodes comprises the common leaf node and the second set of leaf nodes comprises the common leaf node; identifying a first leaf node of the plurality of nodes, the first leaf node corresponds with a root cause of the performance issue; determining a first chain of failures corresponding with the first leaf node and the root node of the failure graph; and outputting a consolidated alarm corresponding with the first chain of failures. 2. The method of claim 1 , wherein: the plurality of monitoring applications comprises an application-level monitor, a network-level monitor, and a system-level monitor. 3. The method of claim 1 , wherein: the aggregated data comprises log file data generated by devices within the networked computing environment. 4. The method of claim 1 , wherein: the aggregated data comprises help desk ticket information associated with help desk tickets covering performance issues affecting the networked computing environment. 5. The method of claim 1 , further comprising: suppressing each alarm of the plurality of alarms that is not associated with a node in the first chain of failures. 6. The method of claim 1 , wherein: the outputting a consolidated alarm corresponding with the first chain of failures comprises transmitting a message specifying the root cause of the performance issue to a target recipient. 7. The method of claim 1 , wherein: the outputting a consolidated alarm corresponding with the first chain of failures comprises transmitting a message providing information only associated with the first chain of failures to a target recipient. 8. The method of claim 1 , wherein: the plurality of monitoring applications comprises an application-level monitor, the application-level monitor generates a first alarm of the plurality of alarms in response to a performance metric for an application being outside of an acceptable range, the acceptable range is determined based on a time of day. 9. The method of claim 1 , wherein: the performance issue comprises an unavailability of the first application, the networked computing environment comprises a plurality of servers, the root cause of the performance issue comprises a power failure to a first server of the plurality of servers. 10. A system for monitoring a networked computing environment, comprising: a network interface configured to receive data from a plurality of monitoring applications monitoring the networked computing environment; and a processor configured to detect an alert corresponding with a performance issue within the networked computing environment and aggregate the data from the plurality of monitoring applications, the processor configured to generate a failure graph based on the aggregated data, the failure graph comprises a plurality of nodes and a set of directed edges, each directed edge of the set of directed edges corresponds with a causal relationship between a pair of the plurality of nodes, the alert corresponds with a root node of the failure graph, the processor configured to detect a second alert corresponding with a second performance issue within the networked computing environment, the alert corresponds with a first application-level failure of a first application within the networked computing environment and the second alert corresponds with a second application-level failure of a second application within the networked computing environment different from the first application, the processor configured to generate a second failure graph based on the aggregated data, the second alert corresponds with a root node of the second failure graph, the processor configured to identify a common leaf node that is common to both the failure graph and the second failure graph, the processor configured to identify a first leaf node of the plurality of nodes, the first leaf node corresponds with a root cause of the performance issue, the processor configured to determine a first chain of failures corresponding with the first leaf node and the root node of the failure graph and output a consolidated alarm corresponding with the first chain of failures. 11. The system of claim 10 , wherein: the plurality of monitoring applications comprises an application-level monitor, a network-level monitor, and a system-level monitor. 12. The system of claim 10 , wherein: the aggregated data comprises a plurality of alarms generated by the plurality of monitoring applications monitoring the networked computing environment. 13. The system of claim 10 , wherein: the aggregated data comprises log file data generated by devices within the networked computing environment. 14. The system of claim 10 , wherein: the aggregated data comprises help desk ticket information associated with help desk tickets covering performance issues affecting the networked computing environment. 15. The system of claim 10 , wherein: the processor configured to output the consolidated alarm by transmitting a message providing information only associated with the first chain of failures to a target recipient. 16. A computer program product, comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to detect an alert corresponding with a performance issue within a networked computing environment; computer readable program code configured to aggregate data from a plurality of monitoring applications monitoring the networked computing environment, the aggregated data comprises a plurality of alarms; computer readable program code configured to generate a failure graph based on the aggregated data, the failure graph comprises a plurality of nodes and a set of directed edges, each directed edge of the set of directed edges corresponds with a causal relationship between a pair of the plurality of nodes, the alert corresponds with a root node of the failure graph; computer readable program code configured to detect a second alert corresponding with a second performance iss
involving logical or physical relationship, e.g. grouping and hierarchies · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
for graphical visualisation of monitoring data · CPC title
Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title
comprising specially adapted graphical user interfaces [GUI] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.