Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
US-2019114417-A1 · Apr 18, 2019 · US
US11301563B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11301563-B2 |
| Application number | US-201916351718-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 13, 2019 |
| Priority date | Mar 13, 2019 |
| Publication date | Apr 12, 2022 |
| Grant date | Apr 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Mechanisms are provided for detecting abnormal system call sequences in a monitored computing environment. The mechanisms receive, from a computing system resource of the monitored computing environment, a system call of an observed system call sequence for evaluation. A trained recurrent neural network (RNN), trained to predict system call sequences, processes the system call to generate a prediction of a subsequent system call in a predicted system call sequence. Abnormal call sequence logic compares the subsequent system call in the predicted system call sequence to an observed system call in the observed system call sequence and identifies a difference between the predicted system call sequence and the observed system call sequence based on results of the comparing. The abnormal call sequence logic generates an alert notification in response to identifying the difference.
Opening claim text (preview).
What is claimed is: 1. A method for detecting abnormal system call sequences in a monitored computing environment, the method comprising: receiving, from a computing system resource of the monitored computing environment, a system call of an observed system call sequence for evaluation; processing, by a trained recurrent neural network (RNN) trained to predict system call sequences, the system call to generate a prediction of a subsequent system call in a predicted system call sequence; comparing, by abnormal call sequence logic, the subsequent system call in the predicted system call sequence to an observed system call in the observed system call sequence; identifying, by the abnormal call sequence logic, a difference between the predicted system call sequence and the observed system call sequence based on results of the comparing; and generating, by the abnormal call sequence logic, an alert notification in response to identifying the difference, wherein processing the system call comprises converting the system call into a vector representation of the system call by performing a first embedding operation on a system call feature of the system call and a separate second embedding operation on one or more argument features of the system call to generate a system call feature embedding comprising machine learned embedding values and one or more argument feature embeddings comprising machine learned embedding values. 2. The method of claim 1 , wherein processing the system call further comprises inputting the vector representation of the system call into a long short term memory (LSTM) cell such that the RNN generates, for each system call feature of a plurality of system call features, and each argument feature of a plurality of argument features, probabilities that the corresponding system call feature or the corresponding argument feature is part of a subsequent system call in the predicted system call sequence. 3. The method of claim 2 , wherein the prediction of the subsequent system call is generated at least by: generating a plurality of combinations of system call features and argument features from the plurality of system call features and plurality of argument features and, for each combination in the plurality of combinations, combining probabilities of each system call feature and each argument feature of the combination to generate a probability for the combination; and selecting a combination from the plurality of combinations to represent the predicted subsequent system call based on the combined probabilities for the combinations in the plurality of combinations. 4. The method of claim 1 , wherein converting the system call into the vector representation of the system call comprises: converting the system call into a tokenized representation of the system call by mapping a system call feature of the system call to a first token and one or more argument features of the system call to one or more second tokens based on a system call feature mapping data structure and an argument feature mapping data structure. 5. The method of claim 4 , wherein processing the system call comprises: converting the tokenized representation of the system call to a vector representation of the system call by using the first token to index into a system call feature embedding matrix data structure and retrieving a system call feature embedding corresponding to the first token, and using the at least one or more second tokens to index into an argument feature embedding matrix data structure and retrieving corresponding argument feature embeddings corresponding to the one or more second tokens; and concatenating the system call feature embedding and the one or more argument feature embeddings to generate the vector representation of the system call. 6. The method of claim 1 , wherein identifying, by the abnormal call sequence logic, a difference between the predicted system call sequence and the observed system call sequence based on results of the comparing further comprises: identifying the difference as an anomaly; maintaining, over a predetermined period of time, a count of a number of anomalies identified during the predetermined period of time; comparing the count of the number of anomalies to a threshold number of anomalies; and determining that the alert notification is to be generated in response to the number of anomalies being equal to or greater than the threshold number of anomalies. 7. The method of claim 1 , wherein identifying, by the abnormal call sequence logic, a difference between the predicted system call sequence and the observed system call sequence based on results of the comparing further comprises: comparing a probability of the predicted system call sequence to a threshold probability value; and in response to the probability of the predicted system call sequence being equal to or greater than the threshold probability value, and the existence of the difference between the predicted system call sequence and the observed system call sequence, determining that the alert notification is to be generated. 8. The method of claim 1 , further comprising: automatically performing a responsive action in response to identifying the difference between the predicted system call sequence and the observed system call sequence, wherein the responsive action comprises at least one of quarantining a process that submitted the observed system call sequence, blocking or filtering future system calls from the process that submitted the observed system call sequence, collecting data about the process that submitted the observed system call sequence, or terminating the process that submitted the observed system call sequence. 9. The method of claim 1 , further comprising: initializing a system call feature embedding data structure to an initial state; initializing an argument call feature embedding data structure to an initial state; and training the RNN based on a training dataset comprising a plurality of system call sequences, wherein the training of the RNN comprises iteratively modifying embedding values in at least one of the system call feature embedding data structure or the argument call feature embedding data structure to generate trained embedding values in the system call feature embedding data structure and the argument call feature embedding data structure. 10. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to specifically configure the data processing system to: receive, from a computing system resource of the monitored computing environment, a system call of an observed system call sequence for evaluation; process, by a trained recurrent neural network (RNN) of the data processing system, trained to predict system call sequences, the system call to generate a prediction of a subsequent system call in a predicted system call sequence; compare, by abnormal call sequence logic of the data processing system, the subsequent system calls in the predicted system call sequence to an observed system call in the observed system call sequence; identify, by the abnormal call sequence logic, a difference between the predicted system call sequence and the observed system call sequence based on results of the comparing; and generate, by the abnormal call sequence logic, an alert notification in response to identifying the difference, wherein the computer readable program further configures the data processing system to process the system call at least by converting the system call into a vector representation o
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.