Analyzing Activity Data of an Information Management System
US-2016342805-A1 · Nov 24, 2016 · US
US12423170B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12423170-B2 |
| Application number | US-202217578692-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 19, 2022 |
| Priority date | Jan 19, 2022 |
| Publication date | Sep 23, 2025 |
| Grant date | Sep 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides systems and methods for generation of parsing scripts or rules for unstructured or semi-structured system log messages, including systems and methods for identifying and clustering of same or substantially similar system log messages using machine learning. Patterns indicative of the same or substantially similar types system log messages can be generated based on the clustering of the system log messages and calculated similarities of attributes or distances between common features/fields of the system log messages, with the results of the clustering presented for analysis and development or adjustment of parsing scripts.
Opening claim text (preview).
What is claimed is: 1. A system for generation of parsing scripts or rules for system logs, comprising: an event management center including at least one processor and memory configured to: receive a plurality of system log messages in real-time from a plurality of monitored devices, the plurality of system log messages including a plurality of different types of unstructured or semi-structured system log messages; determine whether one or more parsing scripts or rules are available to parse or normalize at least some of the plurality of system log messages; and if one or more parsing scripts or rules are available to parse or normalize at least some of the plurality of system log messages, apply the one or more parsing scripts or rules thereto; and if one or more of the plurality of system log messages are in an unrecognized format or a parsing script or rule is not available to parse or normalize the system log messages: submit the one or more of the plurality of system log messages to at least one clustering model stored in a memory of or accessible by the at least one processor to form clusters of system log messages of the one or more of the plurality system log messages that are of substantially a same type, wherein the system log messages of a cluster are separated by a distance based on differences between characters in each of the system log messages of the cluster, search for patterns within the system log messages of the cluster, remove one or more variable attributes from each of the plurality of system log messages, determine, via the at least one clustering model, a probability that the patterns between each of the system log messages of the cluster indicates a high confidence or low confidence of relationship between the system log messages, remove one or more of the plurality of the system log messages in each of the clusters based on an indication of low confidence, generate a pattern template for each of the plurality of system log messages within each of the clusters, and generate a new parsing script based on the pattern template. 2. The system of claim 1 , wherein the event management center comprises a data center of a managed security service provider. 3. The system of claim 1 , wherein the event management center comprises a network server. 4. The system of claim 1 , wherein the at least one clustering model is further configured to identify patterns within at least two of the plurality of system log messages of one of the clusters and develop a vocabulary of most commonly used attributes thereof. 5. The system of claim 4 , wherein the at least one clustering model is further configured to determine a distance between each of the plurality of system log messages within each cluster based upon a number of non-varying attributes present in each of the plurality of system log messages and clustering the each of the plurality of system log messages based upon a selected distance. 6. The system of claim 1 , wherein the event management center is further configured to apply one or more training data sets to the at least one clustering model to form the clusters, the one or more training data sets including historically identified features or attributes indicative of identifiable ones of the plurality of system log messages received by the event management center. 7. The system of claim 1 , wherein the at least one clustering model is further configured to group the system log messages into the clusters based upon two or more selected parameters including a selected number of messages, a size of a vocabulary of commonly used attributes, a selected attribute length, a maximum distance between system log messages, and a minimum number of system log messages per cluster. 8. A method of generating parsing scripts or rules for security log data, comprising: receiving security log data in real-time comprising a plurality of different types of unstructured or semi-structured system log messages from a plurality of monitored devices; applying a probabilistic model to identify system log messages having a series of common attributes indicating the system log messages are of a same or substantially same type; clustering system log messages of the same or substantially same type to form clusters; determining a confidence level of matching for each of the system log messages of each of the clusters with other system log messages in a corresponding cluster; removing system log messages with a level of confidence below a selected threshold from the corresponding cluster; and generating one or more regex pattern scripts configured to match an identified type of system log messages based on the clustered system log messages and corresponding confidence levels of each of the clustered system log messages. 9. The method of claim 8 , further comprising generating training data sets for training the probabilistic model. 10. The method of claim 9 , further comprising updating the training data sets with security log data processed by the probabilistic model. 11. The method of claim 8 , further comprising determining whether one or more parsing scripts or rules are available for parsing and/or normalization of the system log messages, and if one or more parsing scripts or rules are available to parse or normalize unstructured data in one or more of the system log messages, applying at least one selected parsing script or rule to the unstructured data for parsing or normalization of the unstructured data into a normalized log. 12. The method of claim 8 , further comprising applying historical patterns to the system log messages.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Clustering; Classification · CPC title
using probabilistic model · CPC title
Storage of error reports, e.g. persistent data storage, storage using memory protection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.