Automated performance debugging of production applications

US10915425B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10915425-B2
Application numberUS-201616326217-A
CountryUS
Kind codeB2
Filing dateSep 9, 2016
Priority dateSep 9, 2016
Publication dateFeb 9, 2021
Grant dateFeb 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Performance anomalies in production applications can be analyzed to determine the dynamic behavior over time of hosting processes on the same or different computers. Problematic call sites (call sites that are performance bottlenecks or that are causing hangs) can be identified. Instead of relying on static code analysis and development phase load testing to identify a performance bottleneck or application hang, a lightweight sampling strategy collects predicates representing key performance data in production scenarios. Performance predicates provide information about the subject (e.g., what the performance issue is, what caused the performance issue, etc.). The data can be fed into a model based on a decision tree to identify critical threads running the problematic call sites. The results along with the key performance data can be used to build a call graph prefix binary tree for analyzing call stack patterns. Data collection, analysis and visualizations of result can be performed.

First claim

Opening claim text (preview).

What is claimed: 1. A computing device comprising: a processor; and a memory connected to the processor; the processor configured to: provide a program module that collects key performance data for an application executing in a production environment using a lightweight sampling strategy; using a trained machine learning model, analyze the key performance data to identify a particular critical thread of the application, wherein the trained machine learning model has been trained using training cases to designate threads as critical or normal based at least on a corresponding likelihood that the threads have call stack information relating to causes of performance anomalies; analyze a call stack pattern of the particular critical thread; and provide analysis results comprising debug information for the application executing in the production environment. 2. The computing device of claim 1 , wherein the key performance data comprises data from execution of the application. 3. The computing device of claim 1 , wherein the lightweight sampling strategy comprises injecting a data collection agent into a process executing the application. 4. The computing device of claim 1 , wherein the trained machine learning model comprises a decision tree model, and the processor is further configured to: feed the key performance data into the decision tree model to identify a plurality of critical threads of the application, individual critical threads running call sites causing a performance slowdown of the application. 5. The computing device of claim 1 , wherein the processor is further configured to: identify busy threads between any two points in time within a sampling duration; and input the key performance data for the busy threads to the trained machine learning model. 6. The computing device of claim 1 , wherein the processor is further configured to: perform a ranking of a plurality of busy threads of the application, wherein the ranking of the plurality of busy threads is used by the trained machine learning model to distinguish the particular critical thread from a normal thread of the application. 7. The computing device of claim 1 , wherein the processor is further configured to: determine respective call stack lengths of multiple threads of the application, wherein respective call stack lengths are used by the trained machine learning model to distinguish the particular critical thread from a normal thread of the application. 8. The computing device of claim 1 , wherein the processor is further configured to: determine whether multiple threads of the application include non-framework call sites, wherein the presence or absence of non-framework call sites is used by the trained machine learning model to distinguish the particular critical thread from a normal thread of the application. 9. The computing device of claim 8 , wherein the particular critical thread starts and ends with framework call sites and has a non-framework call site in between the framework call sites. 10. The computing device of claim 1 , wherein the processor is further configured to: determine whether multiple threads of the application include non-framework call sites in consecutive samples, wherein the presence or absence of non-framework call sites in the consecutive samples is used by the trained machine learning model to distinguish the particular critical thread from a normal thread of the application. 11. A method comprising configuring a processor of a computing device to: receive key performance data for automated debugging of a production application; analyze the key performance data to identify a critical thread, the critical thread a plurality of call sites, at least one of the plurality of call sites causing a performance slowdown of the production application; perform an analysis of the critical thread using a call graph prefix binary tree reflecting call stack patterns of the critical thread with respect to the plurality of call sites; and provide analysis results, the analysis results reflecting the analysis of the critical thread. 12. The method of claim 11 , further comprising configuring the processor to: identify busy threads between any two points in time within a sampling duration; and select the critical thread based at least on a ranking of the busy threads. 13. The method of claim 11 , further comprising configuring the processor to: identify the critical thread using a decision tree model trained to distinguish critical from non-critical threads. 14. The method of claim 13 , further comprising configuring the processor to: train the decision tree model using an existing set of training cases. 15. The method of claim 11 , further comprising configuring the processor to: collect the key performance data using a lightweight sampling technique comprising: create a handle to a process in which the production application executes, inject a thread into the process, inject a data collection agent into the process to collect the key performance data. 16. A computing device comprising: a processor; and a memory; the memory connected to the processor; the processor configured to: receive key performance data for automated debugging of a production application; analyze the key performance data to identify a critical thread having a plurality of call sites; perform an analysis of the critical thread using a call graph tree reflecting call stack patterns of the critical thread with respect to the plurality of call sites, the call graph tree having nodes representing individual call sites of the critical thread; and based at least on the analysis, provide analysis results identifying a particular call site of the production application associated with a performance slowdown. 17. The computing device of claim 16 , the processor further configured to: identify the critical thread using a decision tree based model. 18. The computing device of claim 16 , the processor further configured to: provide a CPU utilization view displaying a busy threads ranking. 19. The computing device of claim 16 , the processor further configured to: provide a call stack pattern view that identifies a hot spot in the production application where the performance slowdown occurs. 20. The computing device of claim 16 , the processor further configured to: output critical thread data for the critical thread.

Assignees

Inventors

Classifications

  • H04L43/022Primary

    by sampling · CPC title

  • Dumping, i.e. gathering error/state information after a fault for later diagnosis · CPC title

  • Arrangements for monitoring or testing data switching networks · CPC title

  • by checking functioning · CPC title

  • Processing captured monitoring data, e.g. for logfile generation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10915425B2 cover?
Performance anomalies in production applications can be analyzed to determine the dynamic behavior over time of hosting processes on the same or different computers. Problematic call sites (call sites that are performance bottlenecks or that are causing hangs) can be identified. Instead of relying on static code analysis and development phase load testing to identify a performance bottleneck or…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04L43/022. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).