Methods and systems for troubleshooting applications using streaming anomaly detection

US11640465B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11640465-B2
Application numberUS-201916682549-A
CountryUS
Kind codeB2
Filing dateNov 13, 2019
Priority dateNov 13, 2019
Publication dateMay 2, 2023
Grant dateMay 2, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computational methods and systems for detecting and troubleshooting anomalous behavior in distributed applications executing in a distributed computing system are described herein. Methods and systems discover nodes comprising the application. Anomaly detection monitors the metrics associated with the nodes for anomalous behavior in order to identify an approximate point in time when anomalous behavior begins to adversely impact performance of the application. Anomaly detection also monitors logs messages associated with the nodes to detect anomalous behavior recorded in the log messages. When anomalous behavior is detected in either the metrics and/or the log messages an alert identifying the anomalous behavior is generated. Troubleshooting guides an administrator and/or application owner to investigate the root cause of the anomalous behavior. Appropriate remedial measures may be determined based on the root cause and automatically or manually executed to correct the problem.

First claim

Opening claim text (preview).

The invention claimed is: 1. A process stored in one or more data-storage devices and executed using one or more processors of a computer system that detects and corrects anomalous behavior of an application executing in a distributed computing system, the process comprising: discovering nodes that comprise the application and execute software of the application based on communication connections that transmit information between nodes; constructing performance models that identify different types of anomalies in multiple streams of metric data associated with the nodes; applying the performance models to the multiple streams of metric data associated with the nodes in a time frame to detect anomalous behavior of the application and an approximate point in time when the anomalous behavior began, the time frame containing most recently generated metric values of the streams of metric data; performing log message analysis on log messages associated with the nodes to detect anomalous behavior in relative frequencies of event types of the log messages generated in the time frame; in response to detecting anomalous behavior in at least one of the one or more streams of metric data and/or the relative frequencies of event types, displaying an alert, the approximate point in time, the streams of metric data and log messages associated with the anomalous behavior of the application in the time frame and one or more recommended remedial measures for correcting the anomalous behavior in a graphical user interface; and executing one or more of the remedial measures in the distributed computing system to correct the anomalous behavior of the application, wherein the remedial measures include increasing amount of usable capacity of a resource to an application node, assigning additional resources to an application node, migrating one or more virtual objects, and creating one or more additional virtual objects from a template of a virtual object. 2. The process of claim 1 wherein discovering the nodes that comprise the application comprises: partitioning nodes executing in the distributed computing system into types based on information streamed from agents within each node; determining which nodes have communications connections; and identifying the nodes that comprise the application and execute software of the application based on the node types and nodes with communication connections. 3. The process of claim 1 wherein performing anomaly detection on the multiple streams of metric data comprises: for each time frame, receiving multiple streams of metric data generated by metric sources of objects executing the nodes, updating a performance model based on most recently received metric values of the streams of metric data, and detecting changes in one or more of the streams of metric data based on the updated performance model. 4. The process of claim 3 wherein updating the performance model comprises: for new metric values of the streams of metric data, computing a mean of the recently received metric values; computing a sample standard deviation of the recently received metric values; and for each new metric value of the streams of metric data, computing a standard-score model based on the recently received metric value, the mean, and the sample standard deviation. 5. The process of claim 3 wherein updating the performance model comprises: computing a mean usage tuple from new metric values of the streams of metric data, each element of the mean-usage tuple corresponding to the mean usage of a resource of the distributed computing system used by the objects; forming a usage tuple from the new metric values of the resources; computing a covariance matrix of the new metric values of the resources; and computing a distance model from the usage tuple to the mean-usage tuple based on the usage tuple, the mean-usage tuple, and the covariance matrix. 6. The process claim 3 wherein updating the performance model comprises: for each stream of the multiple streams of metric data, computing forecast metric values in a forecast interval; and computing a forecast confidence interval for each of the forecast metric values. 7. The process of claim 3 wherein updating the performance model comprises: for each stream of the streams of metric data, determining if the stream of the metric data is a seasonal stream of metric data; if the stream of metric data is a seasonal stream of metric data, computing a principal frequency of the stream of metric data based on new metric values in a current time window; and computing an absolute difference between the principal frequency in the current time window and a principal frequency in a previous time window. 8. The process of claim 1 wherein performing anomaly detection to detect changes in one or more of the streams of metric data based on the updated performance model comprises: determining a threshold based on the performance model; and when one or more streams of the metric data violates the threshold, identifying the resource in the graphical user interface as exhibiting anomalous behavior. 9. The process of claim 1 wherein performing log message analysis syn the log messages comprises: determining an event type for each log message; computed a relative frequency of each event type generated in the time frame; and generating the alert when the relative frequency of one of the event types is greater than an associated relative frequency threshold. 10. A computer system that detects and corrects anomalous behavior of an application executing in a distributed computing system, the system comprising: one or more hardware processors; one or more physical data-storage devices; and machine-readable instructions stored in the one or more physical data-storage devices that when executed using the one or more hardware processors controls the system to performance operations comprising: discovering nodes that comprise the application and execute software of the application based on communication connections that transmit information between nodes; constructing performance models that identify different types of anomalies in multiple streams of metric data associated with the nodes; applying the performance models to the multiple streams of metric data associated with the nodes in a time frame to detect anomalous behavior of the application and an approximate point in time when the anomalous behavior began, the time frame containing most recently generated metric values of the streams of metric data; performing log message analysis on log messages associated with the nodes to detect anomalous behavior in relative frequencies of event types of the log messages generated in the time frame; and in response to detecting anomalous behavior in at least one of the one or more streams of metric data and/or the relative frequencies of event types, displaying an alert, the approximate point in time, the streams of metric data and log messages associated with the anomalous behavior of the application in the time frame and one or more recommended remedial measures for correcting the anomalous behavior in a graphical user interface; and executing one or more of the remedial measures in the distributed computing system to correct the anomalous behavior of the application, wherein the remedial measures include increasing amount of usable capacity of a resource to an application node, assigning additional resources to an application node, migrating one or more virtual objects, and creating one or more additional virtual objects from a template of a virtual object. 11. The system of claim 10 wherein discovering the nodes that compris

Assignees

Inventors

Classifications

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • Test or assess software · CPC title

  • G06F21/57Primary

    Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11640465B2 cover?
Computational methods and systems for detecting and troubleshooting anomalous behavior in distributed applications executing in a distributed computing system are described herein. Methods and systems discover nodes comprising the application. Anomaly detection monitors the metrics associated with the nodes for anomalous behavior in order to identify an approximate point in time when anomalous …
Who is the assignee on this patent?
Vmware Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/566. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 02 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).