Trace backtracking in distributed systems
US-9450849-B1 · Sep 20, 2016 · US
US10171335B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10171335-B2 |
| Application number | US-201514956131-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 1, 2015 |
| Priority date | Dec 1, 2015 |
| Publication date | Jan 1, 2019 |
| Grant date | Jan 1, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed embodiments provide a system for processing data. During operation, the system obtains a component of a time-series performance metric associated with a server-side root cause of an anomaly in the time-series performance metric. Next, the system obtains a call graph representation of the component, wherein the call graph representation includes a parent node having a parent value of the component and a set of child nodes of the parent node, each child node having a corresponding child value of the component. The system then analyzes the call graph representation to identify one or more of the child nodes as sources of the anomaly. Finally, the system outputs an alert that identifies the sources of the anomaly.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: analyzing a plurality of time-series performance metrics related to operation of an application to identify deviations from corresponding baseline values for the time-series performance metrics, wherein each deviation represents a corresponding anomaly in performance of the application; in response to identifying a given deviation in a time-series performance metric, obtaining a component of the time-series performance metric associated with a server-side root cause of the corresponding anomaly; obtaining a call graph representation of the component, wherein the call graph representation comprises a parent node having a parent value of the component and a set of child nodes of the parent node, each child node having a corresponding child value of the component; analyzing, by a computer system, the call graph representation to identify one or more of the child nodes as sources of the anomaly by: using the call graph representation to generate a first regression model that estimates the parent value based on the set of child values; using a historic call graph representation of the component to generate a second regression model that estimates a baseline parent value of the component for the parent node based on a set of baseline child values of the component for the set of child nodes; and comparing a first set of coefficients from the first regression model to a second set of coefficients from the second regression model to identify the one or more of the child nodes as the sources of the anomaly; and outputting an alert that identifies the sources of the anomaly. 2. The method of claim 1 , wherein analyzing the call graph representation of the component to identify one or more of the child nodes as sources of anomaly further comprises: for each child node in the set of child nodes: determining a correlation between the parent value and the corresponding child value; and when the correlation exceeds a threshold, identifying the child node as a source of the anomaly. 3. The method of claim 2 , wherein analyzing the call graph representation of the component to identify one or more of the child nodes as sources of the anomaly further comprises: for each child node identified as a source of the anomaly: calculating a difference between the corresponding child value and a baseline value of the component for the child node; and updating the sources of the anomaly based on the difference. 4. The method of claim 3 , wherein outputting the alert that identifies the sources of the anomaly comprises: including the correlation and the difference in the outputted alert. 5. The method of claim 1 , wherein the one or more of the child nodes comprise a child node with a first coefficient in the first regression model that is higher than a second coefficient for the child node in the second regression model. 6. The method of claim 1 , further comprising: performing a statistical hypothesis test on the component to assess a deviation from a baseline value of the component; and when the statistical hypothesis test identifies a statistically significant deviation of the component from the baseline value, associating the server-side root cause of the anomaly with the time-series performance metric. 7. The method of claim 6 , wherein the statistical hypothesis test comprises a sign test. 8. The method of claim 1 , wherein: the one or more child nodes comprise one or more application programming interfaces (APIs) called during measurement of the time-series performance metric from a monitored system, and the parent value represents an overall value of the component during an event associated with the one or more APIs called in the one or more child nodes. 9. An apparatus, comprising: one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to: analyze a plurality of time-series performance metrics related to operation of an application to identify deviations from corresponding baseline values for the time-series performance metrics, wherein each deviation represents a corresponding anomaly in performance of the application; in response to identifying a given deviation in a time-series performance metric, obtain a component of the time-series performance metric associated with a server-side root cause of the corresponding anomaly; obtain a call graph representation of the component, wherein the call graph representation comprises a parent node having a parent value of the component and a set of child nodes of the parent node, each child node having a corresponding child value of the component; analyze the call graph representation to identify one or more of the child nodes as sources of the anomaly by: using the call graph representation to generate a first regression model that estimates the parent value based on the set of child values; using a historic call graph representation of the component to generate a second regression model that estimates a baseline parent value of the component for the parent node based on a set of baseline child values of the component for the set of child nodes; and comparing a first set of coefficients from the first regression model to a second set of coefficients from the second regression model to identify the one or more of the child nodes as the sources of the anomaly; and output an alert that identifies the sources of the anomaly. 10. The apparatus of claim 9 , wherein analyzing the call graph representation of the component to identify one or more of the child nodes as sources of the anomaly further comprises: for each child node in the set of child nodes: determining a correlation between the parent value and the corresponding child value; and when the correlation exceeds a threshold, identifying the child node as a source of the anomaly. 11. The apparatus of claim 10 , wherein analyzing the call graph representation of the component to identify one or more of the child nodes as sources of the anomaly further comprises: for each child node identified as a source of the anomaly: calculating a difference between the corresponding child value and a baseline value of the component for the child node; and updating the sources of the anomaly based on the difference. 12. The apparatus of claim 9 , wherein the one or more of the child nodes comprise a child node with a first coefficient in the first regression model that is higher than a second coefficient for the child node in the second regression model. 13. A system, comprising: an analysis module comprising a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the system to: analyze a plurality of time-series performance metrics related to operation of an application to identify deviations from corresponding baseline values for the time-series performance metrics, wherein each deviation represents a corresponding anomaly in performance of the application; in response to identifying a given deviation in a time-series performance metric, obtain a component of the time-series performance metric associated with a server-side root cause of the corresponding anomaly; obtain a call graph representation of the component, wherein the call graph representation comprises a parent node having a parent value of the component and a set of child nodes of the parent node, each child node having a corresponding child value of the component; analyze the call graph representation to identify one or more of the child nodes as sources of the anomaly by: using the call graph represen
involving logical or physical relationship, e.g. grouping and hierarchies · CPC title
using time frame reporting · CPC title
involving time analysis · CPC title
for graphical visualisation of monitoring data · CPC title
Testing arrangements · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.