Cluster-based processing of unstructured log messages
US-2018102938-A1 · Apr 12, 2018 · US
US10353756B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10353756-B2 |
| Application number | US-201715416571-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 26, 2017 |
| Priority date | Oct 11, 2016 |
| Publication date | Jul 16, 2019 |
| Grant date | Jul 16, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some embodiments relate to assigning individual log messages to clusters. An initial cluster assignment may be performed by applying a hash function to one or more non-variable components of the message to generate an initial cluster identifier. Subsequently, clustering may be further refined (e.g., by determining whether to merge clusters based on similarity values). An interface can present a representative message of each cluster and indicate which portions of the message correspond to a variable component. Particular inputs detected at the input corresponding to one of these components can cause other values for the component to be presented. For a given cluster, timestamps of assigned messages can be used to generate a time series, which can facilitate grouping of clusters (with similar or complementary shapes) and/or triggering alerts (with a condition corresponding to a temporal aspect).
Opening claim text (preview).
What is claimed is: 1. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to perform actions including: receiving a plurality of log messages; for each log message of the plurality of log messages: parsing the log message into a plurality of components, each component of the plurality of components corresponding to a part of the log message; determining, for each component of the plurality of components, whether the component is a variable component or a non-variable component; wherein, when the component is identified as a variable component, a cluster that identifies any messages matching the component is defined such that a value for the component is allowed to differ across log messages in the cluster while sharing a same cluster identity; or wherein, when the component is identified as a non-variable component, a cluster that identifies any messages matching the component is defined such that a value for the component must be the same across log messages in the cluster to share the same cluster identity; determining, for each of one or more non-variable components of the plurality of components determined to be a non-variable component, a value for the non-variable component from the log message; and assigning the log message to a cluster of a set of clusters based at least in part on: one or more values of the one or more non-variable components; and one or more rules; and storing a message identifier of the log message in association with a cluster identifier corresponding to the cluster. 2. The computer-program product as recited in claim 1 , wherein assigning the log message to the cluster includes: defining a skeleton of the log message based on values for the one or more non-variable components, wherein a value for each of the one or more non-variable components is not included in the skeleton; and using a deterministic function to transform the skeleton of the log message into the cluster identifier, the one or more rules including the deterministic function. 3. The computer-program product as recited in claim 1 , wherein parsing the log message into a plurality of components includes applying one or more grammar rules. 4. The computer-program product as recited in claim 1 , wherein the actions further include: receiving a query for log data; identifying a set of message identifiers that correspond to the query; identifying a subset of the set of clusters based on the cluster identifiers stored in association with the message identifiers, wherein, for each cluster in the subset, at least some messages of the set of message identifiers is associated with a cluster identifier corresponding to the cluster; and generating a response to the query, the response including a representation of each cluster in the subset. 5. The computer-program product as recited in claim 4 , wherein the message identifiers are stored in association with the cluster identifiers prior to receiving the query. 6. The computer-program product as recited in claim 4 , wherein, for each log message of the plurality of log messages, the log message is assigned to the cluster at an ingest time in response to receiving the log message from a source, and wherein the ingest time is prior to receiving the query. 7. The computer-program product as recited in claim 4 , wherein the actions further include, for each cluster in the subset of the set of clusters: identifying, from amongst the at least some messages associated with the cluster identifier corresponding to the cluster, one or more representative log messages of the cluster, the one or more representative log messages being an incomplete subset of the at least some messages associated with the cluster identifier, wherein the representation of the cluster includes the one or more representative log messages. 8. The computer-program product as recited in claim 4 , wherein the actions further include, for each cluster in the set of clusters: identifying, from amongst the at least some messages associated with the cluster identifier corresponding to the cluster, one or more representative log messages of the cluster, the one or more representative log messages being an incomplete subset of the at least some messages associated with the cluster identifier; and performing a comparison processing to determine a similarity value representing a similarity between one or more representative log messages of a first cluster of the subset and one or more representative log messages of a second cluster of the subset; and determining, based on the comparison processing, whether to merge the first cluster with the second cluster in the subset. 9. The computer-program product as recited in claim 4 , wherein, for each of at least some of the plurality of log messages, assigning the log message to the cluster includes: using a deterministic function to transform the one or more values of the one or more non-variable components into a preliminary cluster identifier at an ingest time in response to receiving the log message from a source, the one or more rules including the deterministic function; storing, prior to receiving the query, the message identifier of the log message in association with the preliminary cluster identifier, the preliminary cluster identifier; and subsequent to receiving the query, using a merging rule that merges multiple clusters together to assign the log message to the cluster, the one or more rules including the deterministic function. 10. A computer-implemented method comprising: receiving a plurality of log messages; for each log message of the plurality of log messages: parsing the log message into a plurality of components, each component of the plurality of components corresponding to a part of the log message; determining, for each component of the plurality of components, whether the component is a variable component or a non-variable component; wherein, when the component is identified as a variable component, a cluster that identifies any messages matching the component is defined such that a value for the component is allowed to differ across log messages in the cluster while sharing a same cluster identity; or wherein, when the component is identified as a non-variable component, a cluster that identifies any messages matching the component is defined such that a value for the component must be the same across log messages in the cluster to share the same cluster identity; determining, for each of one or more non-variable components of the plurality of components determined to be a non-variable component, a value for the non-variable component from the log message; and assigning the log message to a cluster of a set of clusters based at least in part on: one or more values of the one or more non-variable components; and one or more rules; and storing a message identifier of the log message in association with a cluster identifier corresponding to the cluster. 11. The computer-implemented method as recited in claim 10 , wherein assigning the log message to the cluster includes: defining a skeleton of the log message based on values for the one or more non-variable components, wherein a value for each of the one or more non-variable components is not included in the skeleton; and using a deterministic function to transform the skeleton of the log message into the cluster identifier, the one or more rules including the deterministic function. 12. The computer-implemented method as recited in claim 10 , wherein parsing the log message into a plurality of compone
Dumping, i.e. gathering error/state information after a fault for later diagnosis · CPC title
Threshold · CPC title
Data acquisition and logging (for input to computer G06F3/00) · CPC title
where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting · CPC title
Event-based monitoring · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.