Root cause analysis and automation using machine learning
US-2020382361-A1 · Dec 3, 2020 · US
US12007865B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12007865-B2 |
| Application number | US-202217810167-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2022 |
| Priority date | Apr 16, 2022 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A performance monitoring system includes a metric collector configured to receive, via metric exporters, telemetry data comprising metrics related to a network of computing devices. A metric time series database stores related metrics. An alert rule evaluator service is configured to evaluate rules using stored metrics. The performance monitoring system may include a machine learning module and is configured to determine optimized metric collection sampling intervals and rule evaluation intervals, and to automatically determine recommended alert rules.
Opening claim text (preview).
What is claimed is: 1. A method comprising: collecting, by a performance monitoring system, telemetry data comprising metrics related to a network of computing devices, wherein, for each metric, metric values associated with a corresponding metric name are collected at each of a plurality of times; evaluating, by the performance monitoring system, alert rules using the collected telemetry data, wherein evaluating a first rule includes comparing metric values associated with a corresponding metric name of the first rule to a corresponding threshold value of the first rule at each of a plurality of rule evaluation times based on a first evaluation interval to generate a rule evaluation attribute; determining, by the performance monitoring system, a predicted rule weight for the first rule based on the rule evaluation attribute; and determining, by the performance monitoring system, a second evaluation interval for the first rule based on the predicted rule weight and the first evaluation interval. 2. The method of claim 1 , wherein the second evaluation interval is different than the first evaluation interval, the method further comprising subsequently evaluating, by the performance monitoring system, the first rule using the second evaluation interval. 3. The method of claim 2 , further comprising determining, by the performance monitoring system, whether the first rule is a critical rule, and if so, using a predetermined minimum evaluation interval for the second evaluation interval. 4. The method of claim 3 , wherein the rule evaluation attribute comprises a first rule evaluation attribute and a second rule evaluation attribute, wherein the first rule evaluation attribute is a rule hit rate and the second rule evaluation attribute is a rule close missed rate. 5. The method of claim 4 , further comprising determining a related rule that is related to the first rule and determining a third rule evaluation attribute that is a related rule hit rate, wherein the related rule hit rate is defined as a ratio of a number of rule hits for the related rule to a total number of rule evaluations performed for the related rule, and using the first, the second, and the third rule evaluation attributes in determining the predicted rule weight. 6. The method of claim 1 , wherein determining the predicted rule weight comprises performing a regression analysis. 7. The method of claim 1 , further comprising updating a current evaluation interval for the first rule on an on-going basis. 8. The method of claim 1 , further comprising coordinating a collection rate of the telemetry data with the second evaluation interval. 9. A performance monitoring system, comprising: a memory; and one or more processors in communication with the memory, the one or more processors configured to execute a collector and an alert rule evaluator service, wherein the collector is configured to receive telemetry data via metric exporters, the telemetry data comprising metrics related to a network of computing devices, and wherein, for each metric, metric values associated with a corresponding metric name are collected at each of a plurality of collection times, and wherein the alert rule evaluator service is configured to evaluate rules using the received telemetry data, wherein, to evaluate a first rule, the alert rule evaluator service uses metric values associated with a corresponding metric name of the first rule, compares a corresponding metric value to a corresponding threshold value of the first rule at each of a plurality of rule evaluation times based on a first evaluation interval to generate a rule evaluation attribute, determines a predicted rule weight for the first rule based on the rule evaluation attribute, and determines a second evaluation interval for the first rule based on the predicted rule weight and the first evaluation interval. 10. The performance monitoring system of claim 9 , wherein the alert rule evaluator service subsequently evaluates the first rule using the second evaluation interval. 11. The performance monitoring system of claim 9 , wherein the alert rule evaluator service comprises a machine learning model that is trained using historical rule evaluation results. 12. The performance monitoring system of claim 9 , wherein the alert rule evaluator service determines whether the first rule is a critical rule, and if the alert rule evaluator service determines that the first rule is a critical rule, then the alert rule evaluator service sets the second evaluation interval to a predetermined minimum evaluation interval. 13. The performance monitoring system of claim 9 , wherein the rule evaluation attribute is a rule hit rate, defined as a ratio of a number of rule hits for the first rule to a total number of rule evaluations performed for the first rule. 14. The performance monitoring system of claim 9 , wherein the rule evaluation attribute includes a rule hit rate as a first rule evaluation attribute and a rule close missed rate as a second rule evaluation attribute. 15. The performance monitoring system of claim 14 , wherein the alert rule evaluator service determines a related rule that is related to the first rule, wherein the rule evaluation attribute include a third rule evaluation attribute that is a related rule hit rate, the related rule hit rate defined as a ratio of a number of rule hits for the related rule to a total number of rule evaluations performed for the related rule. 16. The performance monitoring system of claim 15 , wherein the alert rule evaluator service determines the predicted rule weight using the rule evaluation attributes and a regression analysis. 17. The performance monitoring system of claim 9 , wherein the alert rule evaluator service determines a collection rate for the telemetry data that is coordinated with the second evaluation interval. 18. The performance monitoring system of claim 9 , wherein the memory comprises a metric time series database for storing and aggregating the received telemetry data. 19. The performance monitoring system of claim 9 , wherein the first evaluation interval is a predetermined default evaluation interval. 20. The performance monitoring system of claim 9 , wherein an evaluation interval for the first rule is evaluated and updated on an on-going basis.
Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters · CPC title
by adaptive sampling · CPC title
based on severity or priority · CPC title
using filtering, e.g. reduction of information by using priority, element types, position or time · CPC title
Processing captured monitoring data, e.g. for logfile generation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.