Computational modeling and classification of data streams
US-2018197089-A1 · Jul 12, 2018 · US
US10530795B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10530795-B2 |
| Application number | US-201715650399-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 14, 2017 |
| Priority date | Mar 17, 2017 |
| Publication date | Jan 7, 2020 |
| Grant date | Jan 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects of the present disclosure describe systems and methods for rapidly detecting threats or other security breaches in enterprise networks. In particular, all enterprise network communications may be monitored to detect anomalous events. In one example, each event log in a collection of event logs may be evaluated, wherein an event log having one or more features is monitored and identified as being anomalous based on identifying one or more anomalous features therein. Anomalous features are identified as being anomalous based on the existence of one or more features in the event log that deviate from characteristic contextual features. Rules or models may thereafter applied to each event log containing the anomalous feature.
Opening claim text (preview).
The invention claimed is: 1. A method for training and applying a model to detect and classify anomalies in event logs, the method comprising: building a vocabulary of one or more unique features across a collection of event logs; for each event log of a plurality of event logs included in the collection of event logs: generating a matrix of feature-context pairs for each unique feature, the matrix of feature-context pairs comprising a matrix of the features represented as vectors; and generating a unique vector representation of each feature for each feature-context pair by: initializing each feature to an N-dimensional vector; and generating a V×N-dimension matrix that stores the vector representation of each feature, wherein the unique vector representation of each feature is a vector of size V; training the model using the vector representation of each feature to identify a contextual likelihood of each possible feature-context pair; applying the trained model to a second collection of event logs to generate a classification score for each feature within each event log, the classification score representing a contextual likelihood of the feature appearing within the context included in that event log; and based on the classification score of a feature within a particular event log being outside a predetermined threshold: identifying the particular event log having the feature as containing an anomaly; and classifying the feature as being anomalous. 2. The method of claim 1 , wherein the vector representation of each feature stored in the V×N-dimension matrix is represented in a column corresponding to a position of the feature in the event log. 3. The method of claim 1 , wherein training the model is performed using a Continuous Bag of Words language model. 4. The method of claim 1 , wherein training the model is performed using a Skip-Gram language model. 5. The method of claim 1 , wherein training and applying the model are performed on a server. 6. The method of claim 1 , wherein training is performed on a first server and applying the model is performed on a second server different from the first server. 7. The method of claim 1 , further comprising: applying one or more rules to each identified event log having the feature as containing an anomaly; and based on identifying a threat from applying the one or more rules, generating an alert. 8. A system for training and applying a model to detect and classify anomalies in event logs, the system comprising: a computing device including a processor, a memory communicatively coupled to the processor, and a content output device, the memory storing instructions executable by the processor to: build a vocabulary of one or more unique features across a collection of event logs; for each event log of a plurality of event logs included in the collection of event logs: generate a matrix of feature-context pairs for each unique feature, the matrix of feature-context pairs comprising a matrix of the features represented as vectors; and generate a unique vector representation of each feature for each feature-context pair by: initializing each feature to an N-dimensional vector; and generating a V×N-dimension matrix that stores the vector representation of each feature, wherein the unique vector representation of each feature is a vector of size V; train the model using the vector representation of each feature to identify a contextual likelihood of each possible feature-context pair; apply the trained model to a second collection of event logs to generate a classification score for each feature within each event log of the second collection, the classification score representing a contextual likelihood of the feature appearing within the context included in that event log; based on the classification score of a feature within a particular event log being outside a predetermined threshold: identify the particular event log having the feature as containing an anomaly; and classify the feature as being anomalous; verify each identified event log containing an anomaly; and based on an identification of a threat from application of the one or more rules, generate an alert. 9. The system of claim 8 , wherein verifying each identified event log containing an anomaly comprises applying at least one of a rule and a model to each identified event log. 10. The system of claim 9 , wherein applying the one or more rules is performed on a second server separate from the first server. 11. The system of claim 8 , wherein the vector representation of each feature stored in the V×N-dimension matrix is represented in a column corresponding to a position of the feature in the event log. 12. The system of claim 8 , wherein training the model is performed using a Continuous Bag of Words language model. 13. The system of claim 8 , wherein training the model is performed using a Skip-Gram language model. 14. A method for training and applying a model to detect and classify anomalies in event logs, the method comprising: building a vocabulary of one or more unique features across a collection of event logs; for each event log of a plurality of event logs included in the collection of event logs: generating a matrix of feature-context pairs for each unique feature, the matrix of feature-context pairs comprising a matrix of the features represented as vectors; and generating a unique vector representation of each feature for each feature-context pair by: initializing each feature to an N-dimensional vector; and generating a V×N-dimension matrix that stores the vector representation of each feature, wherein the unique vector representation of each feature is a vector of size V; training the model using the vector representation of each feature to identify a contextual likelihood of each possible feature-context pair; applying the trained model to a second collection of event logs to generate a classification score for each feature within each event log, the classification score representing a contextual likelihood of the feature appearing within the context included in that event log; and based on the classification score of a feature within a particular event log being outside a predetermined threshold: identifying the particular event log having the feature as containing an anomaly; and classifying the feature as being anomalous; verifying each identified event log having the feature as containing an anomaly; and based on identifying a threat from applying the one or more rules, generating an alert.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Machine learning · CPC title
Traffic logging, e.g. anomaly detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.