Automated root cause detection using data flow analysis
US-10452515-B2 · Oct 22, 2019 · US
US11645141B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11645141-B2 |
| Application number | US-202117462955-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 31, 2021 |
| Priority date | Mar 4, 2019 |
| Publication date | May 9, 2023 |
| Grant date | May 9, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for identifying root cause of anomalies in execution of an application comprising a plurality of operations is provided. The system comprising a preprocessing module configured to receive tracing data comprising a plurality of tracing spans each documenting, for a corresponding operation of the application, a plurality of properties and corresponding values, a signal splitting module configured to group the plurality of tracing spans in a plurality of groups such that each of the plurality of groups comprises operations with identical properties and corresponding values, an anomaly detection module configured to determine anomalous operations for each of the plurality of tracing data spans, a scoring module configured to calculate a plurality of anomaly scores each indicating a level of anomaly within each of the plurality of groups and a root cause identification module configured to analyze the anomaly scores and identify root cause of the detected anomalies according to the analysis.
Opening claim text (preview).
What is claimed is: 1. A system for identifying a root cause of anomalies in executing an application, the application comprising a plurality of operations, the system comprising: a processing circuitry and a non-transitory storage, wherein the non-transitory storage is configured to store code, and the processing circuitry is configured to execute the code to: receive tracing data, comprising a plurality of tracing spans, each tracing span documenting for a corresponding operation of the application a plurality of properties and corresponding values; group the plurality of tracing spans in a plurality of groups such that each of the plurality of groups comprises operations with identical properties and corresponding values; determine anomalous operations for each of the plurality of tracing data spans; calculate a plurality of anomaly scores each anomaly score indicating a level of anomaly within each one of the plurality of groups; and analyze the plurality of anomaly scores and to identify a root cause of the detected anomalies according to the analysis of the plurality of anomaly scores. 2. The system of claim 1 , wherein the processing circuitry is further configured to execute the code to generate the tracing spans, wherein the system further comprises a tracing library, and a plurality of tracing servers, the tracing library being configured to: generate one tracing span for each operation processed by the application; and transmit the tracing span to at least one of the plurality of tracing servers, wherein the tracing servers are configured to collect the received tracing spans and to transmit them to the processing circuitry. 3. The system of claim 1 , wherein the processing circuitry is configured to execute the code to: for each of the plurality of properties: compute a plurality of aggregated anomaly scores, for each property value, over the anomalies of all groups with the same property value; compute a standard deviation in the plurality of aggregated anomaly scores; and select at least one property with the standard deviation exceeding a threshold; wherein the root cause is identified according to the selected at least one property and the property value having the maximal aggregated anomaly score. 4. The system of claim 1 , wherein the processing circuitry is configured to execute the code to: compute a distance metric indicating distances between all pairs of property values; apply a clustering algorithm on the plurality of property values using the distance metric to obtain a plurality of clusters of property values; compute a plurality of anomaly scores each for one of the plurality of clusters; and select at least one cluster according to the plurality of anomaly scores and a threshold; wherein the root cause is identified according to the at least one selected cluster. 5. The system of claim 4 , wherein the processing circuitry is configured to execute the code to compute the distance metric by: constructing a graph G=(V, E), with vertices V and edges E, wherein V comprises vertices representing the plurality of property values and the plurality of groups of tracing spans, and E comprising edges between vertices representing the plurality of property values and the plurality of groups of tracing spans and edge capacities based on the group anomaly scores; computing a plurality of maximum flow values between pairs of vertices representing the plurality of property values; computing a plurality of distances between all vertices representing the plurality of property values from the plurality of maximum flow values; and obtaining the distance metric from the plurality of distances. 6. The system of claim 4 , wherein each distance of the plurality of distances of pairs of property values is one of: inverse proportional to the anomaly score of a corresponding group of tracing spans, zero when both property values are the same, or inverse proportional to the number of groups of tracing spans with non-zero anomaly score. 7. A method for identifying a root cause of a fault in executing an application, comprising: receiving tracing data, comprising a plurality of tracing spans, each tracing span documenting for a corresponding operation of the application a plurality of properties and corresponding values; grouping the plurality of tracing spans in a plurality of groups such that each of the plurality of groups comprises operations with identical properties and corresponding values; determining anomalous operations for each of the plurality of tracing data spans; calculating a plurality of anomaly scores, each anomaly score indicating a level of anomaly within each one of the plurality of groups; and analyzing the plurality of anomaly scores and to identify a root cause of the detected anomalies according to the analysis of the plurality of anomaly scores. 8. The method of claim 7 , wherein further comprising: generating one tracing span for each operation processed by the application. 9. The method of claim 7 , wherein for each property, comprising: computing a plurality of aggregated anomaly scores, for each property value, over the anomalies of all groups with the same property value; computing a standard deviation in the plurality of aggregated anomaly scores; and selecting at least one property with the standard deviation exceeding a threshold; wherein the root cause is identified according to the selected at least one property and the property value having the maximal aggregated anomaly score. 10. The method of claim 7 , wherein comprising: computing a distance metric indicating distances between all pairs of property values; applying a clustering algorithm on the plurality of property values using the distance metric to obtain a plurality of clusters of property values; computing a plurality of anomaly scores each for one of the plurality of clusters; and selecting at least one cluster according to the plurality of anomaly scores and a threshold; wherein the root cause is identified according to the at least one selected cluster. 11. The method of claim 10 , wherein the computing the distance metric comprising: constructing a graph G=(V, E), with vertices V and edges E, wherein V comprises vertices representing the plurality of property values and the plurality of groups of tracing spans, and E comprising edges between vertices representing the plurality of property values and the plurality of groups of tracing spans and edge capacities based on the group anomaly scores; computing a plurality of maximum flow values between pairs of vertices representing the plurality of property values; computing a plurality of distances between all vertices representing the plurality of property values from the plurality of maximum flow values; and obtaining the distance metric from the plurality of distances. 12. A non-transitory computer readable storage medium comprising computer program code instructions, being executable by a computer, for performing the following steps: receive tracing data, comprising a plurality of tracing spans, each tracing span documenting for a corresponding operation of the application a plurality of properties and corresponding values; group the plurality of tracing spans in a plurality of groups such that each of the plurality of groups comprises operations with identical properties and corresponding values; determine anomalous operations for each of the plurality of tracing data spans; calculate a plurality of anomaly scores, each anomaly score indicating a level of anomaly within each one of the plurality of groups; and analyze the plurality of anomaly score
where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Performance evaluation by tracing or monitoring · CPC title
Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title
Monitoring of software · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.