Neural attention mechanisms for malware analysis
US-9705904-B1 · Jul 11, 2017 · US
US2022391496A9 · US · A9
| Field | Value |
|---|---|
| Publication number | US-2022391496-A9 |
| Application number | US-202117448327-A |
| Country | US |
| Kind code | A9 |
| Filing date | Sep 21, 2021 |
| Priority date | May 20, 2019 |
| Publication date | Dec 8, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are systems and methods for enabling the automatic detection of executable code from a stream of bytes. In some embodiments, the stream of bytes can be sourced from the hidden areas of files that traditional malware detection solutions ignore. In some embodiments, a machine learning model is trained to detect whether a particular stream of bytes is executable code. Other embodiments described herein disclose systems and methods for automatic feature extraction using a neural network. Given a new file, the systems and methods may preprocess the code to be inputted into a trained neural network. The neural network may be used as a “feature generator” for a malware detection model. Other embodiments herein are directed to systems and methods for identifying, flagging, and/or detecting threat actors which attempt to obtain access to library functions independently.
Opening claim text (preview).
1 . (canceled) 2 . A computer-implemented method for programmatically identifying executable code within a file, the method comprising: accessing, by a computer system, a sequence of bytes from a portion of the file; extracting, by the computer system from the sequence of bytes, a predetermined number of n-grams, wherein each n-gram comprises a contiguous series of bytes in the sequence of bytes, and wherein each contiguous series of bytes in each n-gram comprises n number of bytes; generating, by the computer system, an array of counters, each counter of the array associated with one of the n-grams, wherein each counter comprises an integer value, the integer value generated based on the frequency of occurrence of the associated n-gram within the sequence of bytes; providing, by the computer system, the array of counters as an input feature for a predictive machine learning model; and determining, by the predictive machine learning model, a model probability value that the sequence of bytes comprises executable code, wherein the computer system comprises a computer processor and an electronic storage medium. 3 . The method of claim 2 , wherein the executable code is programmatically identified without executing the sequence of bytes on the computer system. 4 . The method of claim 2 , further comprising flagging, by the computer system, the sequence of bytes or the file for further analysis by a malware detection system when the model probability value that the sequence of bytes comprises executable code is above a predetermined threshold. 5 . The method of claim 2 , wherein the file comprises an executable file format. 6 . The method of claim 5 , wherein the file comprises a portable executable (PE) file. 7 . The method of claim 6 , wherein the portion of the file comprises one or more of a resource, a string, a variable, an overlay, or a section. 8 . The method of claim 2 , wherein the portion of the file does not comprise executable permissions. 9 . The method of claim 2 , wherein the n-grams comprise bi-grams. 10 . The method of claim 2 , wherein n is between 2 and 500. 11 . The method of claim 2 , wherein the n-grams comprise: a first set of n-grams, wherein n is a first integer for the first set of n-grams; and a second set of n-grams, wherein n is a second integer for the second set of n-grams, and wherein the first integer is different from the second integer. 12 . The method of claim 2 , wherein the predetermined number of n-grams is 500. 13 . The method of claim 2 , wherein the predetermined number of n-grams is between 50 and 10,000. 14 . The method of claim 2 , further comprising normalizing, by the computer system, each counter by the data length of the sequence of bytes. 15 . The method of claim 2 , wherein the predictive machine learning model comprises a plurality of separate models, each model corresponding to a different machine architecture code. 16 . The method of claim 15 , wherein the machine architecture code comprises .NET, x86, and/or x64. 17 . The method of claim 2 , wherein the predictive machine learning model comprises at least one learning algorithm selected from the group of: support vector machines (SVM), linear regression, K-nearest neighbor (KNN) algorithm, logistic regression, naïve Bayes, linear discriminant analysis, decision trees, neural networks, or similarity learning. 18 . The method of claim 2 , wherein the predictive machine learning model comprises a random forest. 19 . The method of claim 18 , wherein the random forest comprises a plurality of decision trees, each decision tree trained independently on a training set of bytes. 20 . The method of claim 19 , wherein the model probability value is determined by averaging a plurality of decision tree probability values, wherein each decision tree probability value is generated by traversal of the sequence of bytes through each individual decision tree of the plurality of decision trees. 21 . A computer system for programmatically identifying executable code within a file, the system comprising: one or more computer readable storage devices configured to store a plurality of computer executable instructions; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the plurality of computer executable instructions in order to cause the system to: access a sequence of bytes from a part of the file; extract, from the sequence of bytes, a predetermined number of n-grams, wherein each n-gram comprises a contiguous series of bytes in the sequence of bytes, and wherein each contiguous series of bytes in each n-gram comprises n number of bytes; generate an array of counters, each counter of the array associated with one of the n-grams, wherein each counter comprises an integer value, the integer value generated based on the frequency of occurrence of the associated n-gram within the sequence of bytes; provide the array of counters as an input feature for a predictive machine learning model; and determine, by the predictive machine learning model, a model probability value that the sequence of bytes comprises executable code.
by virus signature recognition · CPC title
Test or assess software · CPC title
Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title
by adding security routines or objects to programs · CPC title
Assessing vulnerabilities and evaluating computer system security · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.