What technology area does this patent fall under?

Primary CPC classification G06F21/552. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Context-based identification of anomalous log data

US11853415B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11853415-B1
Application number	US-202017116419-A
Country	US
Kind code	B1
Filing date	Dec 9, 2020
Priority date	Dec 12, 2019
Publication date	Dec 26, 2023
Grant date	Dec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are methods, systems, and processes for context-based identification of anomalous log data. Log data with multiple original logs is received at an anomalous log data identification system. A context associated training dataset is generated by splitting a string in a log into multiple split strings, generating a context association between each split string and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch that includes I/O string data for each split string in the log by training each split string against every other split string in the log. A context-based anomalous log data identification model is then trained according to a machine learning technique using the I/O string data batch that includes a list of unique strings in the context associated training dataset. The training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the multiple original logs as anomalous.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: performing, by one or more hardware processors with associated memory that implement a context-based anomalous log data identification system: receiving log data comprising a plurality of logs; generating a context associated training dataset, comprising splitting a string in a log of the plurality of logs into a plurality of split strings, generating a context association between each of the plurality of split strings and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch comprising I/O string data for each split string in the log by training each split string against every other split string of the plurality of split strings in the log; and training a context-based anomalous log data identification model using the I/O string data batch comprising a list of unique strings in the context associated training dataset and according to a machine learning technique, wherein the training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the plurality of logs as anomalous, training the context-based anomalous log data identification model to perform cluster analysis is based on whether an executable that is part of the process information is a good executable that is part of a bad path, and the good executable and the bad path are pre-identified based at least on a classifier prior to performing the cluster analysis. 2. The computer-implemented method of claim 1 , further comprising: generating a dense vector for the log. 3. The computer-implemented method of claim 2 , wherein generating the dense vector for the log comprises: accessing the list of unique split strings, and averaging a plurality of vectors comprising at least one vector for each unique split string in the list of unique split strings, and the dense vector indicates a mapping of each unique split string in the list of unique split strings to the dense vector being trained. 4. The computer-implemented method of claim 3 , further comprising: training the context-based anomalous log data identification model with additional I/O string data generated by the context-based anomalous log data identification system for each log of the plurality of logs. 5. The computer-implemented method of claim 1 , wherein the log data comprises process information associated with one or more computing systems generating the log data, and the process information comprises a plurality of process names/hashes. 6. The computer-implemented method of claim 5 , wherein training the context-based anomalous log data identification model to perform cluster analysis is based at least on a number of occurrences of a process name/hash of the plurality of process names/hashes in the log. 7. A non-transitory computer readable storage medium comprising program instructions executable to: perform, by one or more hardware processors with associated memory that implement a context-based anomalous log data identification system: receive log data comprising a plurality of logs; generate a context associated training dataset, comprising splitting a string in a log of the plurality of logs into a plurality of split strings, generating a context association between each of the plurality of split strings and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch comprising I/O string data for each split string in the log by training each split string against every other split string of the plurality of split strings in the log; and train a context-based anomalous log data identification model using the I/O string data batch comprising a list of unique strings in the context associated training dataset and according to a machine learning technique, wherein the training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the plurality of logs as anomalous, training the context-based anomalous log data identification model to perform cluster analysis is based on whether an executable that is part of the process information is a good executable that is part of a bad path, and the good executable and the bad path are pre-identified based at least on a classifier prior to performing the cluster analysis. 8. The non-transitory computer readable storage medium of claim 7 , further comprising: generating a dense vector for the log. 9. The non-transitory computer readable storage medium of claim 8 , wherein generating the dense vector for the log comprises: accessing the list of unique split strings, and averaging a plurality of vectors comprising at least one vector for each unique split string in the list of unique split strings, and the dense vector indicates a mapping of each unique split string in the list of unique split strings to the dense vector being trained. 10. The non-transitory computer readable storage medium of claim 9 , further comprising: training the context-based anomalous log data identification model with additional I/O string data generated by the context-based anomalous log data identification system for each log of the plurality of logs. 11. The non-transitory computer readable storage medium of claim 7 , wherein the log data comprises process information associated with one or more computing systems generating the log data, and the process information comprises a plurality of process names/hashes. 12. The non-transitory computer readable storage medium of claim 11 , wherein training the context-based anomalous log data identification model to perform cluster analysis is further based at least on a number of occurrences of a process name/hash of the plurality of process names/hashes in the log. 13. A system comprising: one or more processors; and a memory coupled to the one or more processors, wherein the memory stores program instructions executable by the one or more processors to: perform, by one or more hardware processors with associated memory that implement a context-based anomalous log data identification system: receive log data comprising a plurality of logs; generate a context associated training dataset, comprising splitting a string in a log of the plurality of logs into a plurality of split strings, generating a context association between each of the plurality of split strings and a unique key that corresponds to the log, and generating an input/output (I/O) string data batch comprising I/O string data for each split string in the log by training each split string against every other split string of the plurality of split strings in the log; and train a context-based anomalous log data identification model using the I/O string data batch comprising a list of unique strings in the context associated training dataset and according to a machine learning technique, wherein the training tunes the context-based anomalous log data identification model to classify or cluster a vector associated with a new string in a new log that is not part of the plurality of logs as anomalous, training the context-based anomalous log data identification model to perform cluster analysis is based on whether an executable that is part of the process information is a good executable that is part of a bad path, and the good executable and the bad path are pre-identified based at least on a classifier prior to performing the cluster analysis. 14. The system of claim 13 , fu

Assignees

Rapid7 Inc

Inventors

Wainer Douglas George

Classifications

G06F18/2415
based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06F40/151
Transformation · CPC title
G06F16/90344
by using string matching techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 88196138

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11853415B1 cover?: Disclosed herein are methods, systems, and processes for context-based identification of anomalous log data. Log data with multiple original logs is received at an anomalous log data identification system. A context associated training dataset is generated by splitting a string in a log into multiple split strings, generating a context association between each split string and a unique key that…
Who is the assignee on this patent?: Rapid7 Inc
What technology area does this patent fall under?: Primary CPC classification G06F21/552. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Automatically generating malware definitions using word-level analysis

Analysis of Malware

Cookies watermarking in malware analysis

Processing log files using a database system

Data aggregation and analysis system

Anomaly detection for online endorsement event

Frequently asked questions