Wavelet decomposition of software entropy to identify malware
US-2016292418-A1 · Oct 6, 2016 · US
US10691799B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10691799-B2 |
| Application number | US-201615566687-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 15, 2016 |
| Priority date | Apr 16, 2015 |
| Publication date | Jun 23, 2020 |
| Grant date | Jun 23, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Using a recurrent neural network (RNN) that has been trained to a satisfactory level of performance, highly discriminative features can be extracted by running a sample through the RNN, and then extracting a final hidden state hh where i is the number of instructions of the sample. This resulting feature vector may then be concatenated with the other hand-engineered features, and a larger classifier may then be trained on hand-engineered as well as automatically determined features. Related apparatus, systems, techniques and articles are also described.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving or accessing executable code comprising instructions; disassembling the executable code to generate a trace of the instructions; applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector; generating a concatenation of the feature vector with hand-engineered features extracted from the executable code; determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and disallowing, based on the determining, the code from executing; wherein the classifier is different from the RNN. 2. The method of claim 1 , wherein the applying further comprises: dividing the trace into a plurality of regions; determining an entropy of each of the plurality of regions; and ignoring each region with a low entropy. 3. The method of claim 1 , wherein the disassembling further comprises: determining an entry point of the executable code; and generating a time-based trace of the instructions based on the entry point. 4. The method of claim 1 , wherein an input to the RNN is set to a fixed length of 4 or 8 bytes per instruction. 5. The method of claim 1 , wherein an instruction set of the executable code comprises an x86 instruction set. 6. The method of claim 1 , wherein the RNN is at least one of an Elman network, a long short-term memory network, a clockwork RNN, or an echo-state network. 7. The method of claim 1 , wherein applying the recurrent neural network further comprises applying backpropagation through time (BPTT). 8. The method of claim 1 , wherein applying the recurrent neural network further comprises deobfuscating or decompressing the trace. 9. A system comprising: one or more data processors having memory storing instructions, which when executed result in operations comprising: receiving or accessing executable code comprising instructions; disassembling the executable code to generate a trace of the instructions; applying a recurrent neural network (RNN) to the trace to generate a hidden state corresponding to each instruction to form a feature vector; generating a concatenation of the feature vector with hand-engineered features extracted from the executable code; determining, using a classifier and the concatenation, a likelihood that the executable code comprises malicious code; and disallowing, based on the determining, the code from executing; wherein the classifier is different from the RNN. 10. The system of claim 9 , wherein the applying further comprises: dividing the trace into a plurality of regions; determining an entropy of each of the plurality of regions; and ignoring each region with a low entropy. 11. The system of claim 9 , wherein the disassembling further comprises: determining an entry point of the executable code; and generating a time-based trace of the instructions based on the entry point. 12. The system of claim 9 , wherein an input to the RNN is set to a fixed length of 4 or 8 bytes per instruction. 13. The system of claim 9 , wherein an instruction set of the executable code comprises an x86 instruction set. 14. The system of claim 9 , wherein the RNN is at least one of an Elman network, a long short-term memory network, a clockwork RNN, or an echo-state network. 15. The system of claim 9 , wherein applying the recurrent neural network further comprises applying backpropagation through time (BPTT). 16. The system of claim 9 , wherein applying the recurrent neural network further comprises deobfuscating or decompressing the trace. 17. A non-transitory computer readable storage medium storing one or more programs configured to be executed by one or more data processors, the one or more programs comprising instructions, the instructions comprising: receiving executable code; disassembling the executable code; generating a hidden state for each of a plurality of instructions by applying a recurrent neural network (RNN) to the disassembled executable code to generate a feature vector; and determining, using a classifier, a likelihood that the executable code comprises malicious code based on the feature vector; wherein the classifier is different from the RNN. 18. The non-transitory computer readable storage medium of claim 17 , wherein the applying further comprises: dividing the trace into a plurality of regions; determining an entropy of each of the plurality of regions; and ignoring each region with a low entropy. 19. The non-transitory computer readable storage medium of claim 17 , wherein the disassembling further comprises: determining an entry point of the executable code; and generating a time-based trace of the instructions based on the entry point. 20. The non-transitory computer readable storage medium of claim 17 , wherein applying the recurrent neural network further comprises deobfuscating or decompressing the trace.
Recurrent networks, e.g. Hopfield networks · CPC title
Learning methods · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
by virus signature recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.