Recurrent neural networks for malware analysis

US10691799B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10691799-B2
Application numberUS-201615566687-A
CountryUS
Kind codeB2
Filing dateApr 15, 2016
Priority dateApr 16, 2015
Publication dateJun 23, 2020
Grant dateJun 23, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Using a recurrent neural network (RNN) that has been trained to a satisfactory level of performance, highly discriminative features can be extracted by running a sample through the RNN, and then extracting a final hidden state hh where i is the number of instructions of the sample. This resulting feature vector may then be concatenated with the other hand-engineered features, and a larger classifier may then be trained on hand-engineered as well as automatically determined features. Related apparatus, systems, techniques and articles are also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving or accessing executable code comprising instructions; disassembling the executable code to generate a trace of the instructions; applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector; generating a concatenation of the feature vector with hand-engineered features extracted from the executable code; determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and disallowing, based on the determining, the code from executing; wherein the classifier is different from the RNN. 2. The method of claim 1 , wherein the applying further comprises: dividing the trace into a plurality of regions; determining an entropy of each of the plurality of regions; and ignoring each region with a low entropy. 3. The method of claim 1 , wherein the disassembling further comprises: determining an entry point of the executable code; and generating a time-based trace of the instructions based on the entry point. 4. The method of claim 1 , wherein an input to the RNN is set to a fixed length of 4 or 8 bytes per instruction. 5. The method of claim 1 , wherein an instruction set of the executable code comprises an x86 instruction set. 6. The method of claim 1 , wherein the RNN is at least one of an Elman network, a long short-term memory network, a clockwork RNN, or an echo-state network. 7. The method of claim 1 , wherein applying the recurrent neural network further comprises applying backpropagation through time (BPTT). 8. The method of claim 1 , wherein applying the recurrent neural network further comprises deobfuscating or decompressing the trace. 9. A system comprising: one or more data processors having memory storing instructions, which when executed result in operations comprising: receiving or accessing executable code comprising instructions; disassembling the executable code to generate a trace of the instructions; applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector; generating a concatenation of the feature vector with hand-engineered features extracted from the executable code; determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and disallowing, based on the determining, the code from executing; wherein the classifier is different from the RNN. 10. The system of claim 9 , wherein the applying further comprises: dividing the trace into a plurality of regions; determining an entropy of each of the plurality of regions; and ignoring each region with a low entropy. 11. The system of claim 9 , wherein the disassembling further comprises: determining an entry point of the executable code; and generating a time-based trace of the instructions based on the entry point. 12. The system of claim 9 , wherein an input to the RNN is set to a fixed length of 4 or 8 bytes per instruction. 13. The system of claim 9 , wherein an instruction set of the executable code comprises an x86 instruction set. 14. The system of claim 9 , wherein the RNN is at least one of an Elman network, a long short-term memory network, a clockwork RNN, or an echo-state network. 15. The system of claim 9 , wherein applying the recurrent neural network further comprises applying backpropagation through time (BPTT). 16. The system of claim 9 , wherein applying the recurrent neural network further comprises deobfuscating or decompressing the trace. 17. A non-transitory computer readable storage medium storing one or more programs configured to be executed by one or more data processors, the one or more programs comprising instructions, the instructions comprising: receiving executable code; disassembling the executable code; generating a hidden state for each of a plurality of instructions by applying a recurrent neural network (RNN) to the disassembled executable code to generate a feature vector; and determining, using a classifier, a likelihood that the executable code comprises malicious code based on the feature vector; wherein the classifier is different from the RNN. 18. The non-transitory computer readable storage medium of claim 17 , wherein the applying further comprises: dividing the trace into a plurality of regions; determining an entropy of each of the plurality of regions; and ignoring each region with a low entropy. 19. The non-transitory computer readable storage medium of claim 17 , wherein the disassembling further comprises: determining an entry point of the executable code; and generating a time-based trace of the instructions based on the entry point. 20. The non-transitory computer readable storage medium of claim 17 , wherein applying the recurrent neural network further comprises deobfuscating or decompressing the trace.

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Learning methods · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • G06F21/564Primary

    by virus signature recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10691799B2 cover?
Using a recurrent neural network (RNN) that has been trained to a satisfactory level of performance, highly discriminative features can be extracted by running a sample through the RNN, and then extracting a final hidden state hh where i is the number of instructions of the sample. This resulting feature vector may then be concatenated with the other hand-engineered features, and a larger class…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/564. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 23 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).