Automatic threat detection of executable files based on static data analysis
US-10599844-B2 · Mar 24, 2020 · US
US11409869B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11409869-B2 |
| Application number | US-202016791649-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 14, 2020 |
| Priority date | May 12, 2015 |
| Publication date | Aug 9, 2022 |
| Grant date | Aug 9, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects of the present disclosure relate to threat detection of executable files. A plurality of static data points may be extracted from an executable file without decrypting or unpacking the executable file. The executable file may then be analyzed without decrypting or unpacking the executable file. Analysis of the executable file may comprise applying a classifier to the plurality of extracted static data points. The classifier may be trained from data comprising known malicious executable files, known benign executable files and known unwanted executable files. Based upon analysis of the executable file, a determination can be made as to whether the executable file is harmful.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: identifying, by a knowledge module, static data points that may be indicative of either a harmful or benign executable file; associating, by the knowledge module, the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting, by the knowledge module, a plurality of static data points from the identified executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and providing the generated feature vector to one or more support vector machines to build a probabilistic model that indicates whether the executable file fits into one of the categories of files. 2. The computer-implemented method according to claim 1 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 3. The computer-implemented method according to claim 1 , wherein the one or more support vector machines builds the probabilistic model by performing data analysis and pattern recognition on the one or more feature vectors. 4. The computer-implemented method according to claim 1 , wherein the probabilistic model indicates whether the executable file is harmful. 5. The computer-implemented method according to claim 1 , wherein the executable file is identified in response to a detected condition. 6. The computer-implemented method according to claim 5 , wherein the detected condition is user request for a file download. 7. The computer-implemented method according to claim 5 , wherein the detected condition is the detection of a new file attempting to execute. 8. The computer-implemented method according to claim 1 , wherein the plurality of static data points represent predefined character strings in the executable file. 9. The computer-implemented method according to claim 1 , wherein a determination of whether the executable file is harmful is used to retrain the classifier. 10. A system comprising: at least one memory; and at least one processor operatively connected with the memory and configured to perform operation of: identifying static data points that may be indicative of either a harmful or benign executable file; associating the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting a plurality of static data points from the identified executable file; and generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and providing the generated feature vector to one or more support vector machines to build a probabilistic model that indicates whether the executable file fits into one of the categories of files. 11. The system according to claim 10 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 12. The system according to claim 10 , wherein the one or more support vector machines builds the probabilistic model by performing data analysis and pattern recognition on the one or more feature vectors. 13. The system according to claim 10 , wherein the probabilistic model indicates whether the executable file is harmful. 14. The system according to claim 10 , wherein the plurality of static data points represent predefined character strings in the executable file. 15. A computer-readable storage device containing instructions, that when executed on at least one processor, causing the processor to execute a process comprising: identifying static data points that may be indicative of either a harmful or benign executable file; associating the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting a plurality of static data points from the identified executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and providing the generated feature vector to one or more support vector machines to build a probabilistic model that indicates whether the executable file fits into one of the categories of files. 16. The computer-readable storage device according to claim 15 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 17. The computer-readable storage device according to claim 15 , wherein the plurality of static data points represent predefined character strings in the executable file. 18. A computer-implemented method comprising: identifying static data points that may be indicative of either a harmful or benign executable file; associating the identified static data points with one of a plurality of categories of files, the plurality of categories of files including harmful files and benign files; identifying an executable file to be evaluated; extracting a plurality of static data points from the executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the static data points based on training data, the training data comprising files known to fit into one of the plurality of categories of files, wherein one or more features of the feature vector are selectively turned on or off based at least in part on evaluation of whether a value of one of the plurality of static data points is within a predetermined range; and evaluating the feature vector using a machine learning model to determine whether the executable file fits into one of the categories of files. 19. The computer-implemented method according to claim 18 , wherein the plurality of static data points are extracted without decrypting or unpacking the executable file. 20. The computer-implemented method according to claim 18 , wherein the machine learning model comprises an artificial neural network. 21. The computer-implemented method according to claim 18 , wherein the machine learning model comprises a support vector machine. 22. The computer-implemented method according to claim 18 , wherein the machin
Static detection · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
by checking file integrity · CPC title
Decompilation; Disassembly · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.