Error resolution for interactions with user pages
US-2024320079-A1 · Sep 26, 2024 · US
US10275301B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10275301-B2 |
| Application number | US-201514869129-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 29, 2015 |
| Priority date | Sep 29, 2015 |
| Publication date | Apr 30, 2019 |
| Grant date | Apr 30, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An approach is provided for detecting and analyzing an anomaly in application performance in a client-server connection via a network. A request time and an Internet Protocol (IP) address of the client are determined. Based on the request time and the IP address, log entries relevant to the request are selected. A response code, a round trip latency time (RTT) of the response, and an indication of whether the connection timed out are determined. Based on the status code, the RTT, and the indication of whether connection timed out, the anomaly is detected. Based on temporal and textual analyzes of log entries associated with the anomaly and an environment analysis that determines activity of the client, server, and network, candidate root causes of a failure that resulted in the anomaly are determined.
Opening claim text (preview).
What is claimed is: 1. A method of detecting and analyzing an anomaly in a performance of an application in a connection between client and server computers, the method comprising the steps of: a first computer determining a time of a request from the client computer executing the application and an Internet Protocol (IP) address of the client computer, the request being sent by the client computer to the server computer via a communications network; based on the time of the request from the client computer and the IP address of the client computer, the first computer selecting one or more log entries from a plurality of log entries so that the selected one or more log entries are relevant to the request; the first computer determining a status code of a response from the server computer and determining that the status code is a Hypertext Transfer Protocol (HTTP) status code of 500 through 599, which indicates the server computer did not properly perform a function in response to the request from the client computer, the response being sent by the server computer to the client computer via the network and responsive to the request; the first computer determining that the connection timed out in response to the server computer not responding to the request within a predetermined time period; the first computer calculating values of a round trip latency time (RTT) for multiple client computers having application sessions with the server computer, the values of the RTT including a value of a RTT of the response; the first computer dividing a space of the values of the RTT into buckets of RTT values, the buckets having a fixed size; the first computer computing running counts and means for the values of the RTT in each bucket; the first computer maintaining a boundary value that determines which buckets are in a lower value cluster C 1 employed by a k-means clustering algorithm and which other buckets are in a higher value cluster C 2 employed by the k-means clustering algorithm, wherein k=2; the first computer determining the buckets whose RTT values include respective values of the RTT, assigning the values of the RTT to the respective buckets, re-computing the counts and means for each bucket, and balancing C 1 and C 2 to ensure that (i) values in C 1 are closer to a mean μ 1 of C 1 and (ii) values in C 2 are closer to a mean μ 2 of C 2 ; the first computer computing μ 1 of C 1 , a standard deviation σ 1 of C 1 , μ 2 of C 2 , and a standard deviation σ 2 of C 2 , the first computer computing a threshold value as μ 2 +2σ 2 if μ 1 +σ 1 ≥μ 2 or as μ 1 +2σ 1 if μ 1 +σ 1 <μ 2 ; the first computer determining that the value of the RTT of the response exceeds the threshold value; based on the status code of the response being the HTTP status code of 500 through 599, the value of the RTT exceeding the threshold value, and the connection having timed out in response to the server computer not responding to the request within the predetermined time period, the first computer detecting the anomaly in the performance of the application; and based on a temporal analysis and textual analysis of log entries associated with the anomaly, and based on an environment analysis that determines activity of the client computer, the server computer, and the network, the first computer determining candidate root causes of a failure that resulted in the anomaly, the failure being in the client computer, the server computer, the network, or a combination of the client computer, the server computer, and the network. 2. The method of claim 1 , further comprising the steps of: the first computer determining a period of time relevant to the anomaly; based on the period of time, the first computer selecting relevant entities from among the client computer, the server computer, and components of the communications network; based on the selected relevant entities and the period of time, the first computer selecting log entries from logs provided by the relevant entities; subsequent to the step of selecting the log entries, the first computer filtering the selected log entries based on keywords that specify anomalies; the first computer determining a usage of a central processing unit (CPU) of the server computer, a usage of a memory by the server computer, and an input/output (I/O) activity of the server computer; and based on the filtered log entries, the usage of the CPU, the usage of the memory, and the I/O activity, the first computer determining whether each of the client computer, the server computer, and the components of the communications network was active or inactive at a time of an occurrence of the anomaly, wherein the step of determining the candidate root causes is based in part on whether each of the client computer, the server computer and the components of the communications network is determined to have been active or inactive at the time of the occurrence of the anomaly. 3. The method of claim 2 , further comprising the steps of: the first computer determining one or more components of the server computer were active at the time of the occurrence of the anomaly; and based on the filtered log entries, the usage of the CPU, the usage of the memory, and the I/O activity, the first computer determining whether the one or more components of the server computer were performing tasks relevant to the application or extraneous to the application, wherein the step of determining the candidate root causes is based in part on whether the one or more components of the server computer were performing tasks relevant to the application or extraneous to the application. 4. The method of claim 1 , further comprising the steps of: the first computer determining confidences of the respective candidate root causes, each confidence indicating how likely the respective root cause is an actual root cause of the anomaly; and the first computer presenting the candidate root causes in an order which is based on the confidences. 5. The method of claim 1 , further comprising the steps of: the first computer determining the anomaly specifies a type of an alert; the first computer determining a role of a user; the first computer determining an association between the type of the alert and the role of the user; and based on the association between the type of the alert and the role of the user, the first computer presenting the alert to the user, the alert notifying the user about the anomaly. 6. The method of claim 5 , further comprising the steps of: the first computer collecting attributes of the anomaly and sending the attributes to a machine learning process, the attributes including the RTT, the indication of whether the connection timed out; a delay value of the connection, details of the server computer and the application, details about a function specified by the request, and a uniform resource locator of the server computer; the first computer receiving feedback from the user about whether the anomaly was correctly detected or incorrectly detected; the first computer utilizing the feedback as a label of the machine learning process; based on the collected attributes, the first computer generating a machine learning model for the machine learning process, the machine learning model including rules specifying subsequent anomalies; the first computer updating the machine learning model continuously or at specified time intervals; and based on the machine learning model or the updated machine learning model, the first computer detecting a subsequent anomaly in the performance of the application, wherein the subsequent anomaly is more likely to be accurately detected than the anomaly detected by the prior step of detecting the anomaly. 7. The method of claim 1 ,
in a data processing system embedded in a mobile device, e.g. mobile phones, handheld devices · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
Root cause analysis, i.e. error or fault diagnosis (in a hardware test environment G06F11/22; in a software test environment G06F11/36) · CPC title
Inference or reasoning models · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.