Deep learning-based analysis of signals for threat detection

US12141280B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12141280-B2
Application numberUS-202016917177-A
CountryUS
Kind codeB2
Filing dateJun 30, 2020
Priority dateJun 30, 2020
Publication dateNov 12, 2024
Grant dateNov 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide systems, methods, and non-transitory computer storage media for identifying malicious behavior using a trained deep learning model. At a high level, embodiments of the present disclosure utilize a trained deep learning model that takes a sequence of ordered signals as input to generate a score that indicates whether the sequence is malicious or benign. Initially, process data is collected from a client. After the data is collected, a virtual process tree is generated based on parent and child relationships associated with the process data. Subsequently, embodiments of the present disclosure aggregate signal data with the process data such that each signal is associated with a corresponding process in a chronologically ordered sequence of events. The ordered sequence of events is vectorized and fed into the trained deep learning model to generate a score indicating the level of maliciousness of the sequence of events.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving process data from a client computer system based on activity and behaviors of the client computer system; generating a process tree based on parent and child relationships associated with the process data, wherein the process data is associated with a plurality of processes; associating, in the process tree, each of the plurality of processes with a corresponding signal associated with signal data; based on the process tree comprising the parent and child relationships and a chronology of execution of the plurality processes having corresponding signals, generating a vector of a sequence of events, wherein the vector is associated with scoring a probability that the sequence of events from the vector is malicious, the vector is a representation of the process tree comprising a first process that produces a first signal at a first time and a second process that produces a second signal at a second time, the first process and the second process have a parent and child relationship; inputting the vector into a trained model associated with registry-related features of a plurality of sequences of events that indicate malicious activity, the registry-related features of the plurality of sequences of events correspond to registry-related features in training process data, training signal data, and training chronology of execution and relationship between processes data, the trained model is configured to evaluate the vector for potentially malicious activity; based on inputting the vector into the trained model, generating a score that indicates whether the sequence of events represented by the vector is malicious, wherein the score is generated using the trained model, the parent and child relationships, chronology of execution of the plurality processes represented in the in the sequence of events of the vector, and registry-related features associated with the sequence of events; and based on the score satisfying an alert threshold, causing a security risk mitigation action. 2. The method of claim 1 , wherein the signal data is further comprised of at least one of raw signals or signals generated by human-generated logic based on analyzing activity performed by the client. 3. The method of claim 1 , wherein the signal data is received from a signal repository comprised of filtered signals based on activity associated with the signals. 4. The method of claim 1 , wherein a registry-related feature is associated with a registry modification of a registry key, the registry-related feature is identifiable in the plurality of sequences of events that indicate malicious activity and in training process data, training signal data, or training chronology of execution and relationship between processes data. 5. The method of claim 1 , wherein the trained model is comprised of: an embedding layer; two convolutional neural networks; and a bidirectional long short-term memory recurrent neural network. 6. The method of claim 5 , wherein the embedding layer compresses the sequence of events into low-dimensional vectors that are further processed by the trained model. 7. The method of claim 1 , wherein the score indicates the probability of the sequence of events being malicious. 8. The method of claim 1 , wherein the alert threshold is determined based on an indication of a degree of malicious activity or threat to detect on the client computer system. 9. The method of claim 1 , wherein predicting whether the sequence of events is malicious is based on using the trained model in combination with a plurality of other models. 10. A behavior scoring computer system comprising: one or more hardware processors; and one or more computer-readable media having executable instructions embodied thereon, which, when executed by the one or more processors, cause the one or more hardware processors to execute: a signal scoring model configured to: receive process data from a client computer based on activity and behaviors of the client computer system; generate a process tree based on parent and child relationships associated with the process data, wherein the process data is associated with a plurality of processes; associate, in the process tree, each of the plurality of processes with a corresponding signal associated with signal data; based on the process tree comprising the parent and child relationships and a chronology of execution of the plurality processes having corresponding signals, generate a vector of a sequence of events, wherein the vector is associated with scoring a probability that the sequence of events from the vector is malicious, the vector is a representation of the process tree comprising a first process that produces a first signal at a first time and a second process that produces a second signal at a second time, the first process and the second process have a parent and child relationship; input the vector into a trained model associated with registry-related features of a plurality of sequences of events that indicate malicious activity, the registry-related features of the plurality of sequences of events correspond to registry-related features in training process data, training signal data, and training chronology of execution and relationship between processes data, the trained model is configured to evaluate the vector for potentially malicious activity; based on inputting the vector into the trained model, generate a score that indicates whether the sequence of events represented by the vector is malicious, wherein the score is generated using the trained model, the parent and child relationships, chronology of execution of the plurality processes represented in the in the sequence of events of the vector, and registry-related features associated with the sequence of events; and based on the score satisfying an alert threshold, causing a security risk mitigation action. 11. The system of claim 10 , wherein the signal data is further comprised of at least one of raw signals or signals generated by human-generated logic based on analyzing activity performed by the client computer. 12. The system of claim 10 , wherein the signal data is received from a signal repository comprised of filtered signals based on activity associated with the signals. 13. The system of claim 10 , wherein a registry-related feature is associated with a registry modification of a registry key, the registry-related feature is identifiable in the plurality of sequences of events that indicate malicious activity and in training process data. 14. The system of claim 10 further comprising, wherein predicting whether the sequence of events is malicious is based on using the trained model in combination with a plurality of other models. 15. One or more computer storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform operations comprising: receiving process data from a client computer system based on activity and behaviors of the client computer system; generating a process tree based on parent and child relationships associated with the process data, wherein the process data is associated with a plurality of processes; associating, in the process tree, each of the plurality of processes with a corresponding signal associated with signal data; based on the process tree comprising the parent and child relationships and a chronology of execution of the plurality processes having corresponding signals, generating a vector of a sequence of events, wherein t

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Learning methods · CPC title

  • for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12141280B2 cover?
Embodiments of the present disclosure provide systems, methods, and non-transitory computer storage media for identifying malicious behavior using a trained deep learning model. At a high level, embodiments of the present disclosure utilize a trained deep learning model that takes a sequence of ordered signals as input to generate a score that indicates whether the sequence is malicious or beni…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F21/56. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).