Word embeddings for anomaly classification from event logs

US10530795B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10530795-B2
Application numberUS-201715650399-A
CountryUS
Kind codeB2
Filing dateJul 14, 2017
Priority dateMar 17, 2017
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the present disclosure describe systems and methods for rapidly detecting threats or other security breaches in enterprise networks. In particular, all enterprise network communications may be monitored to detect anomalous events. In one example, each event log in a collection of event logs may be evaluated, wherein an event log having one or more features is monitored and identified as being anomalous based on identifying one or more anomalous features therein. Anomalous features are identified as being anomalous based on the existence of one or more features in the event log that deviate from characteristic contextual features. Rules or models may thereafter applied to each event log containing the anomalous feature.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for training and applying a model to detect and classify anomalies in event logs, the method comprising: building a vocabulary of one or more unique features across a collection of event logs; for each event log of a plurality of event logs included in the collection of event logs: generating a matrix of feature-context pairs for each unique feature, the matrix of feature-context pairs comprising a matrix of the features represented as vectors; and generating a unique vector representation of each feature for each feature-context pair by: initializing each feature to an N-dimensional vector; and generating a V×N-dimension matrix that stores the vector representation of each feature, wherein the unique vector representation of each feature is a vector of size V; training the model using the vector representation of each feature to identify a contextual likelihood of each possible feature-context pair; applying the trained model to a second collection of event logs to generate a classification score for each feature within each event log, the classification score representing a contextual likelihood of the feature appearing within the context included in that event log; and based on the classification score of a feature within a particular event log being outside a predetermined threshold: identifying the particular event log having the feature as containing an anomaly; and classifying the feature as being anomalous. 2. The method of claim 1 , wherein the vector representation of each feature stored in the V×N-dimension matrix is represented in a column corresponding to a position of the feature in the event log. 3. The method of claim 1 , wherein training the model is performed using a Continuous Bag of Words language model. 4. The method of claim 1 , wherein training the model is performed using a Skip-Gram language model. 5. The method of claim 1 , wherein training and applying the model are performed on a server. 6. The method of claim 1 , wherein training is performed on a first server and applying the model is performed on a second server different from the first server. 7. The method of claim 1 , further comprising: applying one or more rules to each identified event log having the feature as containing an anomaly; and based on identifying a threat from applying the one or more rules, generating an alert. 8. A system for training and applying a model to detect and classify anomalies in event logs, the system comprising: a computing device including a processor, a memory communicatively coupled to the processor, and a content output device, the memory storing instructions executable by the processor to: build a vocabulary of one or more unique features across a collection of event logs; for each event log of a plurality of event logs included in the collection of event logs: generate a matrix of feature-context pairs for each unique feature, the matrix of feature-context pairs comprising a matrix of the features represented as vectors; and generate a unique vector representation of each feature for each feature-context pair by: initializing each feature to an N-dimensional vector; and generating a V×N-dimension matrix that stores the vector representation of each feature, wherein the unique vector representation of each feature is a vector of size V; train the model using the vector representation of each feature to identify a contextual likelihood of each possible feature-context pair; apply the trained model to a second collection of event logs to generate a classification score for each feature within each event log of the second collection, the classification score representing a contextual likelihood of the feature appearing within the context included in that event log; based on the classification score of a feature within a particular event log being outside a predetermined threshold: identify the particular event log having the feature as containing an anomaly; and classify the feature as being anomalous; verify each identified event log containing an anomaly; and based on an identification of a threat from application of the one or more rules, generate an alert. 9. The system of claim 8 , wherein verifying each identified event log containing an anomaly comprises applying at least one of a rule and a model to each identified event log. 10. The system of claim 9 , wherein applying the one or more rules is performed on a second server separate from the first server. 11. The system of claim 8 , wherein the vector representation of each feature stored in the V×N-dimension matrix is represented in a column corresponding to a position of the feature in the event log. 12. The system of claim 8 , wherein training the model is performed using a Continuous Bag of Words language model. 13. The system of claim 8 , wherein training the model is performed using a Skip-Gram language model. 14. A method for training and applying a model to detect and classify anomalies in event logs, the method comprising: building a vocabulary of one or more unique features across a collection of event logs; for each event log of a plurality of event logs included in the collection of event logs: generating a matrix of feature-context pairs for each unique feature, the matrix of feature-context pairs comprising a matrix of the features represented as vectors; and generating a unique vector representation of each feature for each feature-context pair by: initializing each feature to an N-dimensional vector; and generating a V×N-dimension matrix that stores the vector representation of each feature, wherein the unique vector representation of each feature is a vector of size V; training the model using the vector representation of each feature to identify a contextual likelihood of each possible feature-context pair; applying the trained model to a second collection of event logs to generate a classification score for each feature within each event log, the classification score representing a contextual likelihood of the feature appearing within the context included in that event log; and based on the classification score of a feature within a particular event log being outside a predetermined threshold: identifying the particular event log having the feature as containing an anomaly; and classifying the feature as being anomalous; verifying each identified event log having the feature as containing an anomaly; and based on identifying a threat from applying the one or more rules, generating an alert.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Machine learning · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10530795B2 cover?
Aspects of the present disclosure describe systems and methods for rapidly detecting threats or other security breaches in enterprise networks. In particular, all enterprise network communications may be monitored to detect anomalous events. In one example, each event log in a collection of event logs may be evaluated, wherein an event log having one or more features is monitored and identified…
Who is the assignee on this patent?
Target Brands Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).