Automatic threat detection of executable files based on static data analysis

US11409869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11409869-B2
Application numberUS-202016791649-A
CountryUS
Kind codeB2
Filing dateFeb 14, 2020
Priority dateMay 12, 2015
Publication dateAug 9, 2022
Grant dateAug 9, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the present disclosure relate to threat detection of executable files. A plurality of static data points may be extracted from an executable file without decrypting or unpacking the executable file. The executable file may then be analyzed without decrypting or unpacking the executable file. Analysis of the executable file may comprise applying a classifier to the plurality of extracted static data points. The classifier may be trained from data comprising known malicious executable files, known benign executable files and known unwanted executable files. Based upon analysis of the executable file, a determination can be made as to whether the executable file is harmful.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying, by a knowledge module, static data points that may be indicative of either a harmful or benign executable file; associating, by the knowledge module, the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting, by the knowledge module, a plurality of static data points from the identified executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and providing the generated feature vector to one or more support vector machines to build a probabilistic model that indicates whether the executable file fits into one of the categories of files. 2. The computer-implemented method according to claim 1 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 3. The computer-implemented method according to claim 1 , wherein the one or more support vector machines builds the probabilistic model by performing data analysis and pattern recognition on the one or more feature vectors. 4. The computer-implemented method according to claim 1 , wherein the probabilistic model indicates whether the executable file is harmful. 5. The computer-implemented method according to claim 1 , wherein the executable file is identified in response to a detected condition. 6. The computer-implemented method according to claim 5 , wherein the detected condition is user request for a file download. 7. The computer-implemented method according to claim 5 , wherein the detected condition is the detection of a new file attempting to execute. 8. The computer-implemented method according to claim 1 , wherein the plurality of static data points represent predefined character strings in the executable file. 9. The computer-implemented method according to claim 1 , wherein a determination of whether the executable file is harmful is used to retrain the classifier. 10. A system comprising: at least one memory; and at least one processor operatively connected with the memory and configured to perform operation of: identifying static data points that may be indicative of either a harmful or benign executable file; associating the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting a plurality of static data points from the identified executable file; and generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and providing the generated feature vector to one or more support vector machines to build a probabilistic model that indicates whether the executable file fits into one of the categories of files. 11. The system according to claim 10 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 12. The system according to claim 10 , wherein the one or more support vector machines builds the probabilistic model by performing data analysis and pattern recognition on the one or more feature vectors. 13. The system according to claim 10 , wherein the probabilistic model indicates whether the executable file is harmful. 14. The system according to claim 10 , wherein the plurality of static data points represent predefined character strings in the executable file. 15. A computer-readable storage device containing instructions, that when executed on at least one processor, causing the processor to execute a process comprising: identifying static data points that may be indicative of either a harmful or benign executable file; associating the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting a plurality of static data points from the identified executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and providing the generated feature vector to one or more support vector machines to build a probabilistic model that indicates whether the executable file fits into one of the categories of files. 16. The computer-readable storage device according to claim 15 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 17. The computer-readable storage device according to claim 15 , wherein the plurality of static data points represent predefined character strings in the executable file. 18. A computer-implemented method comprising: identifying static data points that may be indicative of either a harmful or benign executable file; associating the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting a plurality of static data points from the executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and evaluating the feature vector using a machine learning model to determine whether the executable file fits into one of the categories of files. 19. The computer-implemented method according to claim 18 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 20. The computer-implemented method according to claim 18 , wherein the machine learning model comprises an artificial neural network. 21. The computer-implemented method according to claim 18 , wherein the machine learning model comprises a support vector machine. 22. The computer-implemented method according to claim 18 , wherein the machin

Assignees

Inventors

Classifications

  • Static detection · CPC title

  • G06N20/10Primary

    using kernel methods, e.g. support vector machines [SVM] · CPC title

  • G06F21/565Primary

    by checking file integrity · CPC title

  • Decompilation; Disassembly · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11409869B2 cover?
Aspects of the present disclosure relate to threat detection of executable files. A plurality of static data points may be extracted from an executable file without decrypting or unpacking the executable file. The executable file may then be analyzed without decrypting or unpacking the executable file. Analysis of the executable file may comprise applying a classifier to the plurality of extrac…
Who is the assignee on this patent?
Webroot Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).