Automatic online log template mining

US12450244B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12450244-B2
Application numberUS-202016867118-A
CountryUS
Kind codeB2
Filing dateMay 5, 2020
Priority dateMay 5, 2020
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for log message aggregation include determining a first similarity distance score for a first incoming message by comparing the first incoming message to one or more stored templates. It is determined that the first incoming message imperfectly matches a matched template of the one or more stored templates, based on the first similarity distance score. A token in the imperfectly matched template is replaced with a wildcard, to reduce the first similarity distance score.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for log message aggregation, comprising: tokenizing a first incoming message to generate tokens; comparing the tokens of the first incoming message to a plurality of stored templates using a token-based edit distance to generate a plurality of respective first similarity distance scores; normalizing the token-based edit distance by dividing by a number of tokens in a shorter of the first incoming message and the one or more stored templates; locating an imperfectly matched template of the plurality of stored templates, based on a corresponding one of the first similarity distance scores being non-zero and less than a threshold distance from the imperfectly matched template and further being less than any other of the plurality of first similarity distance scores; replacing a token in the imperfectly matched template with a wildcard, to reduce the corresponding first similarity distance score; determining anomalous activity according to a pattern of messages matching the imperfectly matched template; and automatically performing a corrective action responsive to the anomalous activity, selected from the group consisting of denying security accesses from an anomalous system, rebooting or restarting a system that has failed, and changing an alert sensitivity for future anomalous behavior. 2. The method of claim 1 , wherein the wildcard matches any token in a same position of the first incoming message. 3. The method of claim 1 , further comprising pre-processing the first incoming message to replace one or more tokens in the first incoming message with a pre-processing wildcard. 4. The method of claim 1 , further comprising: determining a second similarity distance score for a second incoming message by comparing the second incoming message to the one or more stored templates; determining that the second similarity distance score does not match any of the one or more stored templates; and storing a new template that is based on the second incoming message. 5. The method of claim 4 , wherein determining that the second incoming message does not match any of the one or more stored templates includes determining that the second incoming message has a second similarity distance score for each of the one or more stored templates that is greater than the threshold distance. 6. The method of claim 4 , wherein determining the first similarity distance score and the second similarity distance score are processed in parallel. 7. The method of claim 6 , further comprising adding the second message to a shared queue after determining that the second similarity distance score does not match any of the one or more stored templates and before storing the new template. 8. A non-transitory computer readable storage medium comprising a computer readable program for log message aggregation, wherein the computer readable program when executed on a computer causes the computer to perform the steps: tokenizing a first incoming message to generate tokens; comparing the tokens of the first incoming message to a plurality of stored templates using a token-based edit distance to generate a plurality of respective first similarity distance score; normalizing the token-based edit distance by dividing by a number of tokens in a shorter of the first incoming message and the one or more stored templates; locating an imperfectly matched template of the plurality of stored templates, based on a corresponding one of the first similarity distance score being non-zero and less than a threshold distance from the imperfectly matched template and further being less than any other of the plurality of first similarity distance scores; replacing a token in the imperfectly matched template with a wildcard, to reduce the first corresponding similarity distance score; determining anomalous activity according to a pattern of messages matching the imperfectly matched template; and automatically performing a corrective action responsive to the anomalous activity, selected from the group consisting of denying security accesses from an anomalous system, rebooting or restarting a system that has failed, and changing an alert sensitivity for future anomalous behavior. 9. A log aggregation system, comprising: a hardware processor; a memory, configured to store a plurality of templates and program code that, when executed by the hardware processor, is configured to: tokenize a first incoming message to generate tokens; compare the tokens of the first incoming message to the plurality of stored templates using a token-based edit distance to generate a plurality of respective first similarity distance scores; normalizing the token-based edit distance by dividing by a number of tokens in a shorter of the first incoming message and the one or more stored templates; locate an imperfectly matched template of the plurality of stored templates, based on a corresponding one of the first similarity distance score being non-zero and less than a threshold distance from the imperfectly matched template and further being less than any other of the plurality of first similarity distance scores; replace a token in the imperfectly matched template with a wildcard, to reduce the corresponding first similarity distance score determine anomalous activity according to a pattern of messages matching the imperfectly matched template; and automatically perform a corrective action responsive to the anomalous activity, selected from the group consisting of denying security accesses from an anomalous system, rebooting or restarting a system that has failed, and changing an alert sensitivity for future anomalous behavior. 10. The log aggregation system of claim 9 , wherein the template updater is further configured to replace a token in the matched template with a new wildcard. 11. The log aggregation system of claim 9 , further comprising a message pre-processor, configured to replace one or more tokens in the first incoming message with a pre-processing wildcard. 12. The log aggregation system of claim 9 , wherein the template matcher is further configured to determine a second similarity distance score for a second incoming message by comparing the second incoming message to the one or more stored templates, and to determine that the second similarity distance score does not match any of the one or more stored templates, and further comprising a template creator, configured to store a new template that is based on the second incoming message. 13. The log aggregation system of claim 12 , wherein the template matcher is further configured to determine that the second incoming message has a second similarity distance score for each of the one or more stored templates that is greater than the threshold distance. 14. The log aggregation system of claim 12 , further comprising parallel instances, each having a separate template matcher configured to determine the first similarity distance score and the second similarity distance score in parallel. 15. The log aggregation system of claim 14 , wherein the template matcher that determines the second similarity distance score is further configured to add the second message to a shared queue after determining that the second similarity distance score does not match any of the one or more stored templates. 16. The method of claim 1 , wherein the token-based edit distance is a Levenshtein metric.

Assignees

Inventors

Classifications

  • Binary matching operations · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Calculation of difference between files · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • G06F40/186Primary

    Templates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450244B2 cover?
Methods and systems for log message aggregation include determining a first similarity distance score for a first incoming message by comparing the first incoming message to one or more stored templates. It is determined that the first incoming message imperfectly matches a matched template of the one or more stored templates, based on the first similarity distance score. A token in the imperfe…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2465. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).