Probability-distribution-based log-file analysis
US-2016277268-A1 · Sep 22, 2016 · US
US11048608B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11048608-B2 |
| Application number | US-201514660461-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 17, 2015 |
| Priority date | Mar 17, 2015 |
| Publication date | Jun 29, 2021 |
| Grant date | Jun 29, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The current document is directed to systems, and methods incorporated within the systems, that carry out probability-distribution-based analysis of log-file entries. A monitoring subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to detect changes in the state of the distributed computer system. A log-file-analysis subsystem within a distributed computer system uses probability-distribution-based analysis of log-file entries to identify subsets of log-file entries that predict anomalies and impending problems in the distributed computer system. In many implementations, a numerical comparison of probability distributions of log-file-entry types is used to detect state changes in the distributed computer system.
Opening claim text (preview).
The invention claimed is: 1. A log-file analysis subsystem within a computer system having one or more processors, one or more memories, and computer instructions, stored in one or more of the one or more memories that, when executed by one or more of the one or more processors, control the log-file analysis system to monitor a state of the computer system by repeatedly: generating, for one or more log files, each having multiple entries that are each associated with an event type, a probability distribution of all or a subset of the event types in the one or more log files for a time interval to represent the state of a monitored computer system for the time interval; storing the generated probability distribution in association with an indication of the time interval; and after generating and storing each probability distribution following generation and storing of an initial set of probability distributions, computing a divergence metric from the two most recently generated and stored probability distributions; distributions, and when the divergence metric is greater than a threshold value, raising an alarm to indicate, or displaying an indication of, a significant system-state change. 2. The log-file analysis subsystem of claim 1 wherein monitories the state of the computer system by the log-file analysis system further includes: using the stored probability distributions collected over a first time interval spanning multiple shorter, secondary time intervals to generate a typical probability distribution for each of a set of time intervals selected from the multiple shorter, secondary time intervals; and at subsequent secondary time intervals, generating a probability distribution for the event types of log entries selected from the most recently completed secondary time interval, computing a Jensen-Shannon divergence metric for the probability distribution generated from the most recently completed secondary time interval and the typical probability distribution for the most recently completed secondary time interval, and when the Jensen-Shannon divergence metric is greater than a threshold value, raising an alarm to indicate, or displaying an indication of, a system-state change. 3. The log-file analysis subsystem of claim 1 wherein monitoring the state of the computer system by the log-file analysis system further includes: for each of a number of different subsets of the event types for which the log-file analysis subsystem has generated and stored probability distributions for different time intervals, computing a Jensen-Shannon divergence metric for the probability distributions for different pairs of time intervals, and computing a measure of the variance of the Jensen-Shannon divergence metrics computed for the probability distributions for different pairs of the time intervals; and selecting, as a basis for a monitoring fingerprint, a subset of the event types having the greatest computed variance. 4. A method that monitors a state of a distributed computer system that includes multiple, network interconnected discrete computer systems, each having one or more processors, one or more memories, and one or more data-storage devices, one or more of the discrete computer systems including computer instructions, stored in one or more of the one or more memories of the discrete computer system, that, when executed by one or more of the one or more processors, control the discrete computer system to carry out the method comprising: repeatedly generating, for one or more log files, each having multiple entries that are each associated with an event type, a probability distribution of all or a subset of the event types in the one or more log files for a time interval to represent the state of a monitored computer system for the time interval, storing the generated probability distribution in association with an indication of the time interval in one or more of one or more memories and/or data-storage devices, and after generating and storing each probability distribution following generation and storing of an initial set of probability distributions, computing a divergence metric from the two most recently generated and stored probability distributions, and when the divergence metric is greater than a threshold value, raising an alarm to indicate, or displaying an indication of, a system-state change. 5. The method of claim 4 wherein the divergence metric is the Jensen-Shannon divergence metric. 6. The method of claim 4 further including: using the stored probability distributions collected over a first time interval spanning multiple shorter, secondary time intervals to generate a typical probability distribution for each of a set of time intervals selected from the multiple shorter, secondary time intervals; and at subsequent secondary time intervals, generating a probability distribution for the event types of log entries selected from the most recently completed secondary time interval, computing a divergence metric for the probability distribution generated from the most recently completed secondary time interval and the typical probability distribution for the most recently completed secondary time interval, and when the divergence metric is greater than a threshold value, raising an alarm to indicate, or displaying an indication of, a system-state change. 7. The method of claim 6 wherein the divergence metric is the Jensen-Shannon divergence metric. 8. The method of claim 4 further including: for each of a number of different subsets of the event types for which the log-file analysis subsystem has generated and stored probability distributions for different time intervals, computing a divergence metric for the probability distributions for different pairs of time intervals, and computing a measure of the variance of the divergence metrics computed for the probability distributions for different pairs of the time intervals; and selecting, as a basis for a monitoring fingerprint, a subset of the event types having the greatest computed variance. 9. The method of claim 8 wherein the divergence metric is the Jensen-Shannon divergence metric.
using statistical or mathematical methods · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level · CPC title
Data acquisition and logging (for input to computer G06F3/00) · CPC title
Dumping, i.e. gathering error/state information after a fault for later diagnosis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.