Context-based identification of anomalous log data

US11853415B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11853415-B1
Application numberUS-202017116419-A
CountryUS
Kind codeB1
Filing dateDec 9, 2020
Priority dateDec 12, 2019
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are methods, systems, and processes for context-based identification of anomalous log data. Log data with multiple original logs is received at an anomalous log data identification system. A context associated training dataset is generated by splitting a string in a log into multiple split strings, generating a context association between each split string and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch that includes I/O string data for each split string in the log by training each split string against every other split string in the log. A context-based anomalous log data identification model is then trained according to a machine learning technique using the I/O string data batch that includes a list of unique strings in the context associated training dataset. The training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the multiple original logs as anomalous.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: performing, by one or more hardware processors with associated memory that implement a context-based anomalous log data identification system: receiving log data comprising a plurality of logs; generating a context associated training dataset, comprising splitting a string in a log of the plurality of logs into a plurality of split strings, generating a context association between each of the plurality of split strings and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch comprising I/O string data for each split string in the log by training each split string against every other split string of the plurality of split strings in the log; and training a context-based anomalous log data identification model using the I/O string data batch comprising a list of unique strings in the context associated training dataset and according to a machine learning technique, wherein the training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the plurality of logs as anomalous, training the context-based anomalous log data identification model to perform cluster analysis is based on whether an executable that is part of the process information is a good executable that is part of a bad path, and the good executable and the bad path are pre-identified based at least on a classifier prior to performing the cluster analysis. 2. The computer-implemented method of claim 1 , further comprising: generating a dense vector for the log. 3. The computer-implemented method of claim 2 , wherein generating the dense vector for the log comprises: accessing the list of unique split strings, and averaging a plurality of vectors comprising at least one vector for each unique split string in the list of unique split strings, and the dense vector indicates a mapping of each unique split string in the list of unique split strings to the dense vector being trained. 4. The computer-implemented method of claim 3 , further comprising: training the context-based anomalous log data identification model with additional I/O string data generated by the context-based anomalous log data identification system for each log of the plurality of logs. 5. The computer-implemented method of claim 1 , wherein the log data comprises process information associated with one or more computing systems generating the log data, and the process information comprises a plurality of process names/hashes. 6. The computer-implemented method of claim 5 , wherein training the context-based anomalous log data identification model to perform cluster analysis is based at least on a number of occurrences of a process name/hash of the plurality of process names/hashes in the log. 7. A non-transitory computer readable storage medium comprising program instructions executable to: perform, by one or more hardware processors with associated memory that implement a context-based anomalous log data identification system: receive log data comprising a plurality of logs; generate a context associated training dataset, comprising splitting a string in a log of the plurality of logs into a plurality of split strings, generating a context association between each of the plurality of split strings and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch comprising I/O string data for each split string in the log by training each split string against every other split string of the plurality of split strings in the log; and train a context-based anomalous log data identification model using the I/O string data batch comprising a list of unique strings in the context associated training dataset and according to a machine learning technique, wherein the training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the plurality of logs as anomalous, training the context-based anomalous log data identification model to perform cluster analysis is based on whether an executable that is part of the process information is a good executable that is part of a bad path, and the good executable and the bad path are pre-identified based at least on a classifier prior to performing the cluster analysis. 8. The non-transitory computer readable storage medium of claim 7 , further comprising: generating a dense vector for the log. 9. The non-transitory computer readable storage medium of claim 8 , wherein generating the dense vector for the log comprises: accessing the list of unique split strings, and averaging a plurality of vectors comprising at least one vector for each unique split string in the list of unique split strings, and the dense vector indicates a mapping of each unique split string in the list of unique split strings to the dense vector being trained. 10. The non-transitory computer readable storage medium of claim 9 , further comprising: training the context-based anomalous log data identification model with additional I/O string data generated by the context-based anomalous log data identification system for each log of the plurality of logs. 11. The non-transitory computer readable storage medium of claim 7 , wherein the log data comprises process information associated with one or more computing systems generating the log data, and the process information comprises a plurality of process names/hashes. 12. The non-transitory computer readable storage medium of claim 11 , wherein training the context-based anomalous log data identification model to perform cluster analysis is further based at least on a number of occurrences of a process name/hash of the plurality of process names/hashes in the log. 13. A system comprising: one or more processors; and a memory coupled to the one or more processors, wherein the memory stores program instructions executable by the one or more processors to: perform, by one or more hardware processors with associated memory that implement a context-based anomalous log data identification system: receive log data comprising a plurality of logs; generate a context associated training dataset, comprising splitting a string in a log of the plurality of logs into a plurality of split strings, generating a context association between each of the plurality of split strings and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch comprising I/O string data for each split string in the log by training each split string against every other split string of the plurality of split strings in the log; and train a context-based anomalous log data identification model using the I/O string data batch comprising a list of unique strings in the context associated training dataset and according to a machine learning technique, wherein the training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the plurality of logs as anomalous, training the context-based anomalous log data identification model to perform cluster analysis is based on whether an executable that is part of the process information is a good executable that is part of a bad path, and the good executable and the bad path are pre-identified based at least on a classifier prior to performing the cluster analysis. 14. The system of claim 13 , fu

Assignees

Inventors

Classifications

  • based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Transformation · CPC title

  • by using string matching techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11853415B1 cover?
Disclosed herein are methods, systems, and processes for context-based identification of anomalous log data. Log data with multiple original logs is received at an anomalous log data identification system. A context associated training dataset is generated by splitting a string in a log into multiple split strings, generating a context association between each split string and a unique key that…
Who is the assignee on this patent?
Rapid7 Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/552. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).