Method for checking the integrity of a compute node
US-2024303346-A1 · Sep 12, 2024 · US
US10558544B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10558544-B2 |
| Application number | US-201113026351-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 14, 2011 |
| Priority date | Feb 14, 2011 |
| Publication date | Feb 11, 2020 |
| Grant date | Feb 11, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are described for monitoring a performance metric. A multiple modeling approach is used to improve predictive analysis by avoiding the issuance of warnings during spikes which occur as a part of normal system processing. This approach increases the accuracy of predictive analytics on a monitored computing system, does not require creating rules defining periodic processing cycles, reduces the amount of data required to perform predictive modeling, and reduces the amount of CPU required to perform predictive modeling.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable medium storing an application executable to perform an operation of separate modeling to reduce processing overhead in identifying performance spikes as false positives, the operation comprising: training a first model by sampling a performance metric of a computing system over a first training period, in order to derive a first threshold, wherein the performance metric is sampled by a monitoring system via a network, wherein the computing and monitoring systems are distinct systems; training, based on the first threshold, a second model by sampling the performance metric only when the first threshold is exceeded, wherein the performance metric is sampled over a second training period longer in duration than the first training period, in order to derive a second threshold greater than the first threshold, wherein the first and second models comprise separate models; evaluating the performance metric of the computing system using the trained first model at a first sampling frequency, including determining that a value of the performance metric exceeds the first threshold; upon determining that the first threshold is exceeded, beginning evaluation of the computing system using the trained second model at an increased sampling frequency relative to the first sampling frequency; upon determining, during the evaluation of the performance metric of the computing system using the trained second model, that a second value of the performance metric does not exceed the second threshold, identifying the second value as a false positive by a processors when executing the application, wherein training the separate models for use in identifying the second value as a false positive reduces a required number of sampled values relative to training a single model; and upon determining, during the evaluation of the performance metric of the computing system using the trained second model, that a third value of the performance metric exceeds the second threshold, identifying the third value as being indicative of a performance error on the computing system, and causing a remedial action to be taken on the computing system responsive to the performance error. 2. The non-transitory computer-readable medium of claim 1 , wherein the operation further comprises, upon determining the first value does not exceed the first threshold, updating the first model based on the first value. 3. The non-transitory computer-readable medium of claim 1 , wherein the operation further comprises, upon determining the first value exceeds the first threshold, updating the second model based on the first value. 4. The non-transitory computer-readable medium of claim 1 , wherein the performance metric corresponds to a usage of a shared resource. 5. The non-transitory computer-readable medium of claim 1 , wherein the performance metric corresponds to one of processor utilization, storage resource consumption, memory consumption, and message traffic. 6. The non-transitory computer-readable medium of claim 1 , wherein upon identifying the first value as a false positive, an indication that the first value is identified as a false positive is stored, wherein the third value is sampled only as a result of the increased sampling frequency, thereby avoiding a performance complication associated with a delayed remedial action absent measurement of the third value. 7. The non-transitory computer-readable medium of claim 6 , wherein the application executes on the monitoring system, wherein the first and second thresholds are derived automatically from the first and second models, respectively, wherein the operation further comprises: dynamically updating the first threshold over time using the first model and without using the second model; and dynamically updating the second threshold over time using the second model and without using the first model. 8. The non-transitory computer-readable medium of claim 7 , wherein the second value is measured subsequent to measuring the first value, wherein the third value is measured subsequent to measuring the second value. 9. The non-transitory computer-readable medium of claim 8 , wherein the operation further comprises: upon determining the first value does not exceed the first threshold, updating the first model based on the first value. 10. The non-transitory computer-readable medium of claim 9 , wherein the operation further comprises: upon determining the first value exceeds the first threshold, updating the second model based on the first value. 11. The non-transitory computer-readable medium of claim 10 , wherein the performance metric corresponds to, in respective instances of executing the operation, processor utilization, memory consumption, shared or dedicated storage consumption, virtual storage consumption, error message traffic, system message traffic, latches held, latches released, transaction response times, disk input/output (I/O) response times, and disk I/O activity. 12. The non-transitory computer-readable medium of claim 11 , wherein the operation further comprises training a third model in addition to the first and second models, wherein the first training period is two weeks in duration, wherein the second training period is one year in duration, wherein the first sampling frequency comprises sampling every thirty minutes. 13. The non-transitory computer-readable medium of claim 1 , wherein the second model is for transient spikes. 14. The non-transitory computer-readable medium of claim 1 , wherein the second model is for known-spike periods. 15. The non-transitory computer-readable medium of claim 1 , wherein the first model models expected behavior of the computing system during non-spike and spike periods. 16. The non-transitory computer-readable medium of claim 1 , wherein the first model models expected behavior of the computing system during non-spike periods. 17. A system of separate modeling to reduce processing overhead in identifying performance spikes as false positives, the system comprising: one or more computer processors; and a memory storing an application which, when executed on the one or more computer processors, performs an operation comprising: training a first model by sampling a performance metric of a computing system over a first training period, in order to derive a first threshold, wherein the performance metric is sampled by a monitoring system via a network, wherein the computing and monitoring systems are distinct systems; training, based on the first threshold, a second model by sampling the performance metric only when the first threshold is exceeded, wherein the performance metric is sampled over a second training period longer in duration than the first training period, in order to derive a second threshold greater than the first threshold, wherein the first and second models comprise separate models; evaluating the performance metric of the computing system using the trained first model at a first sampling frequency, including determining that a first value of the performance metric exceeds the first threshold; upon determining that the first threshold is exceeded, beginning evaluation of the computing system using the trained second model at an increased sampling frequency relative to the first sampling frequency; upon determining, during the evaluation of the performance metric of the computing system using the trained second model, that a second value of the performance metric does not exceed the second threshold, identifying the second value as a false positive,
Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents (software debugging using additional hardware using a specific debug interface G06F11/3656; performance evaluation by tracing or monitoring G06F11/3466) · CPC title
Reliability or availability analysis · CPC title
Performance evaluation by statistical analysis · CPC title
Performance evaluation by modeling · CPC title
where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.