Using cluster processing to identify sets of similarly failing hosts
US-10592328-B1 · Mar 17, 2020 · US
US11080126B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11080126-B2 |
| Application number | US-201716095015-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 7, 2017 |
| Priority date | Feb 7, 2017 |
| Publication date | Aug 3, 2021 |
| Grant date | Aug 3, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus for monitoring a computer system includes a data collecting unit that collects performance data on a plurality of performance items related to the performance of the computer system; a performance degradation cause model 30 in which a cause event and degradation performance items which are one or more performance items degraded by the cause event are associated with one another; and an outlier score calculator 50 that specifies a degree of deviation, when target performance data which is performance data of degradation performance items collected by the data collecting unit deviates from a normal range under a condition that the normal range of the performance data of the one or more degradation performance items for the cause event is predetermined, and outputs information on the cause event based on a temporal change in the degree of deviation.
Opening claim text (preview).
The invention claimed is: 1. A monitoring apparatus for monitoring a computer system, comprising: at least one hardware processor; and a software program that is configured to, when executed by the at least one hardware processor, collect performance data on a plurality of performance items related to performance of the computer system, access at least one performance degradation cause model that associates each of a plurality of cause events with degradation performance items that are degraded by the cause event, access a job execution history that comprises, for each of a plurality of jobs executed by the computer system, a type of the job and an execution period of the job, during a machine-learning period, use a portion of the performance data that represents normal behavior of the computer system to train a machine-learning model to output an outlier score that represents a degree of deviation of performance data of the degradation performance items from a normal range, after the machine-learning period, apply the machine-learning model to target performance data of the degradation performance items to produce an outlier score for each of a plurality of cause events for each type of job and each execution period over a time period, and output information on each of the plurality of cause events based on a temporal change in the produced outlier scores over the time period. 2. The monitoring apparatus for monitoring a computer system according to claim 1 , wherein at least one performance degradation cause model comprises a plurality of performance degradation cause models for a plurality of cause events, wherein the machine-learning model is applied to target performance data of the degradation performance items associated with each of the plurality of cause events, for each of the plurality of jobs, to produce an outlier score for each of the plurality of cause events and each of the plurality of jobs over the time period. 3. The monitoring apparatus for monitoring a computer system according to claim 1 , wherein the software program is further configured to, for each of the plurality of jobs, select the cause event based on the temporal change that most closely matches a change in the execution period of the job. 4. A monitoring apparatus for monitoring a computer system, the monitoring apparatus comprising: at least one hardware processor; and a software program that is configured to, when executed by the at least one hardware processor, collect performance data on a plurality of performance items related to performance of the computer system, during a machine-learning period, use a portion of the performance data that represents degraded performance of the computer system to train a machine-learning classifier to output a pattern score that represents a similarity to at least one reference pattern of performance items associated with degradation of an evaluation index; after the machine-learning period, apply the machine-learning classifier to target performance data to produce one or more pattern scores for the target performance data, and determine a reference pattern that is most similar to the target performance data based on the produced one or more pattern scores, and output a label associated with the reference pattern that is determined to be most similar to the target performance data. 5. The monitoring apparatus for monitoring a computer system according to claim 4 , wherein the software program is further configured to output information indicating a degree of contribution to the degradation of the evaluation index of each associated performance item, based on the target performance data. 6. The monitoring apparatus for monitoring a computer system according to claim 5 , wherein the information indicating a degree of contribution comprises a one-dimensional graph in which each degree of contribution of each associated performance item is indicated by a length. 7. The monitoring apparatus for monitoring a computer system according to claim 6 , wherein the one-dimensional graph only comprises a degree of contribution of each associated performance item whose contribution is larger than a predetermined value. 8. The monitoring apparatus for monitoring a computer system according to claim 4 , wherein the software program is configured to, when no reference pattern is similar to the target performance data, output information indicating a degree of contribution to degradation of the evaluation index of each associated performance item, based on the target performance data, and output a screen for accepting input of a label to be assigned to a new reference pattern representing the target performance data. 9. The monitoring apparatus for monitoring a computer system according to claim 4 , wherein the software program is further configured to: generate a plurality of reference patterns by grouping values of performance items in the portion of the performance data that represents degraded performance of the computer system based on features; and assign a label to each of the plurality of reference patterns based on the features. 10. A method for monitoring a computer system, the method comprising: collecting performance data on a plurality of performance items related to performance of the computer system; accessing at least one performance degradation cause model that associates each of a plurality of cause events with degradation performance items that are degraded by the cause event; accessing a job execution history that comprises, for each of a plurality of jobs executed by the computer system, a type of the job and an execution period of the job, during a machine-learning period, using a portion of the performance data that represents normal behavior of the computer system to train a machine-learning model to output an outlier score that represents a degree of deviation of performance data of the degradation performance items from a normal range; after the machine-learning period, applying the machine-learning model to target performance data of the degradation performance items to produce an outlier score for each of a plurality of cause events for each type of job and each execution period over a time period; and outputting information on each of the plurality of cause events based on a temporal change in the produced outlier scores over the time period.
Performance evaluation by modeling · CPC title
Performance evaluation by statistical analysis · CPC title
for systems · CPC title
where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title
Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.