What technology area does this patent fall under?

Primary CPC classification G06F21/54. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for executable code detection, automatic feature extraction and position independent code detection

US2024184884A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2024184884-A1
Application number	US-202318487657-A
Country	US
Kind code	A1
Filing date	Oct 16, 2023
Priority date	May 20, 2019
Publication date	Jun 6, 2024
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems and methods for enabling the automatic detection of executable code from a stream of bytes. In some embodiments, the stream of bytes can be sourced from the hidden areas of files that traditional malware detection solutions ignore. In some embodiments, a machine learning model is trained to detect whether a particular stream of bytes is executable code. Other embodiments described herein disclose systems and methods for automatic feature extraction using a neural network. Given a new file, the systems and methods may preprocess the code to be inputted into a trained neural network. The neural network may be used as a “feature generator” for a malware detection model. Other embodiments herein are directed to systems and methods for identifying, flagging, and/or detecting threat actors which attempt to obtain access to library functions independently.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for programmatically identifying executable code within a file, the method comprising: accessing a sequence of bytes from a portion of the file; extracting, from the sequence of bytes, a number of n-grams, wherein each n-gram comprises a contiguous series of bytes in the sequence of bytes, and wherein the contiguous series of bytes of each respective n-gram comprises n number of bytes; generating, an array of counters, each counter of the array associated with one of the n-grams, wherein each counter comprises an integer value based on a frequency of occurrence of the associated n-gram within the sequence of bytes; and applying a predictive model to the array of counters to determine a probability that the sequence of bytes comprises executable code. 2 . The computer-implemented method of claim 1 , wherein the executable code is programmatically identified without executing the sequence of bytes on a computer system. 3 . The computer-implemented method of claim 1 , further comprising flagging the sequence of bytes of the file for further analysis by a malware detection system when the probability that the sequence of bytes comprises executable code is above a predetermined threshold. 4 . The computer-implemented method of claim 1 , wherein the file comprises an executable file format. 5 . The computer-implemented method of claim 4 , wherein the file comprises a portable executable (PE) file. 6 . The computer-implemented method of claim 5 , wherein the portion of the file comprises one or more of a resource, a string, a variable, an overlay, or a section. 7 . The computer-implemented method of claim 1 , wherein the portion of the file does not comprise executable permissions. 8 . The computer-implemented method of claim 1 , wherein the n-grams comprise bi-grams. 9 . The computer-implemented method of claim 1 , wherein n is between 2 and 500. 10 . The computer-implemented method of claim 1 , wherein the number of n-grams corresponds to every n-gram present in the sequence of bytes. 11 . The computer-implemented method of claim 1 , wherein the n number of bytes in the contiguous series of bytes of each respective n-gram is selected based on the number of n-grams. 12 . The computer-implemented method of claim 1 , wherein the predetermined number of n-grams is between 50 and 10,000. 13 . The computer-implemented method of claim 1 , further comprising normalizing each counter by the data length of the sequence of bytes. 14 . The computer-implemented method of claim 1 , wherein the predictive model comprises a plurality of models, each model of the plurality of models corresponding to a different machine architecture code. 15 . The computer-implemented method of claim 14 , wherein the machine architecture code comprises .NET, x86, and/or x64. 16 . The computer-implemented method of claim 1 , wherein the predictive model comprises at least one learning algorithm selected from the group of: support vector machines (SVM), linear regression, K-nearest neighbor (KNN) algorithm, logistic regression, naïve Bayes, linear discriminant analysis, decision trees, neural networks, or similarity learning. 17 . The computer-implemented method of claim 1 , wherein the model comprises a random forest. 18 . The computer-implemented method of claim 17 , wherein the random forest comprises a plurality of decision trees, each decision tree trained independently on a training set of bytes. 19 . A non-transitory computer readable medium containing program instructions for causing a computer to perform the method of: accessing a sequence of bytes from a portion of the file; extracting, from the sequence of bytes, a number of n-grams, wherein each n-gram comprises a contiguous series of bytes in the sequence of bytes, and wherein the contiguous series of bytes of each respective n-gram comprises n number of bytes; generating, an array of counters, each counter of the array associated with one of the n-grams, wherein each counter comprises an integer value based on a frequency of occurrence of the associated n-gram within the sequence of bytes; and applying a predictive model to the array of counters to determine a probability that the sequence of bytes comprises executable code. 20 . A computer system for programmatically identifying executable code within a file, the system comprising: one or more computer readable storage devices configured to store a plurality of computer executable instructions; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the plurality of computer executable instructions in order to cause the system to: access a sequence of bytes from a part of the file; extract, from the sequence of bytes, a number of n-grams, wherein each n-gram comprises a contiguous series of bytes in the sequence of bytes, and wherein the contiguous series of bytes of each respective n-gram comprises n number of bytes; generate an array of counters, each counter of the array associated with one of the n-grams, wherein each counter comprises an integer value based on a frequency of occurrence of the associated n-gram within the sequence of bytes; and apply a predictive model to the array of counters to determine a probability that the sequence of bytes comprises executable code.

Assignees

Sentinel Labs Israel Ltd

Inventors

Classifications

G06F21/54Primary
by adding security routines or objects to programs · CPC title
G06F21/566
Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title
G06F2221/033
Test or assess software · CPC title
G06F21/564Primary
by virus signature recognition · CPC title
G06F21/577
Assessing vulnerabilities and evaluating computer system security · CPC title

Patent family

Related publications grouped by family.

View patent family 72241774

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024184884A1 cover?: Disclosed herein are systems and methods for enabling the automatic detection of executable code from a stream of bytes. In some embodiments, the stream of bytes can be sourced from the hidden areas of files that traditional malware detection solutions ignore. In some embodiments, a machine learning model is trained to detect whether a particular stream of bytes is executable code. Other embodi…
Who is the assignee on this patent?: Sentinel Labs Israel Ltd
What technology area does this patent fall under?: Primary CPC classification G06F21/54. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 06 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).