Machine learning based exploit detection
US-2018365573-A1 · Dec 20, 2018 · US
US10599844B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10599844-B2 |
| Application number | US-201514709875-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 12, 2015 |
| Priority date | May 12, 2015 |
| Publication date | Mar 24, 2020 |
| Grant date | Mar 24, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects of the present disclosure relate to threat detection of executable files. A plurality of static data points may be extracted from an executable file without decrypting or unpacking the executable file. The executable file may then be analyzed without decrypting or unpacking the executable file. Analysis of the executable file may comprise applying a classifier to the plurality of extracted static data points. The classifier may be trained from data comprising known malicious executable files, known benign executable files and known unwanted executable files. Based upon analysis of the executable file, a determination can be made as to whether the executable file is harmful.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: extracting a plurality of static data points from an executable file without decrypting or unpacking the executable file, wherein the plurality of static data points represent predefined character strings in the executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the plurality of static data points based on a collection of data comprising known malicious executable files, known benign executable files, and known unwanted executable files, wherein the collection of data comprises at least a portion of the plurality of static data points, and wherein one or more features of the feature vector are selectively turned on or off based on whether a value of one or more static data points from the plurality of extracted static data points is within a predetermined range; and evaluating the feature vector using support vector processing to determine whether the executable file is harmful. 2. The computer-implemented method according to claim 1 , wherein the extracting the plurality of static data points further comprises classifying the extracted static data points according to a type of data extracted from the executable file, and encoding the extracted static data points for the classifier based on the classification. 3. The computer-implemented method according to claim 2 , wherein generating the at least one feature vector comprises selectively setting features of the classifier based on the plurality of extracted static data points. 4. The computer-implemented method according to claim 2 , wherein the classifying the at least one static data point comprises classifying the plurality of extracted static data points into categories comprising numeric values, nominal values, string or byte sequences, and Boolean values. 5. The computer-implemented method according to claim 4 , wherein analyzing the executable file further comprises generating at least one feature vector from the plurality of extracted static data points, wherein features of the generated feature vector are weighted based at least upon the classified categories and the plurality of extracted static data points. 6. The computer-implemented method according to claim 1 , wherein determining whether the executable file is harmful further comprises preventing execution of the executable file when a determined probability value that the executable file is harmful exceeds a threshold value. 7. The computer-implemented method according to claim 6 , wherein the threshold value is set based on predetermined false positive range data. 8. The computer-implemented method according to claim 1 , wherein the plurality of static data points is extracted using a machine learning technique. 9. The computer-implemented method according to claim 1 , wherein a determination of whether the executable file is harmful is used to retrain the classifier. 10. A computer-readable storage device containing instructions, that when executed on at least one processor, causing the processor to execute a process comprising: extracting a plurality of static data points from an executable file without decrypting or unpacking the executable file, wherein the plurality of static data points represent predefined character strings in the executable file; generating a feature vector from the plurality of static data points using a classifier trained to classify the plurality of static data points based on a collection of data comprising known malicious executable files, known benign executable files and known unwanted executable files, wherein the collection of data comprises at least a portion of the plurality of static data points, and wherein one or more features of the feature vector are selectively turned on or off based on whether one or more values of one or more static data points from the plurality of extracted static data points is within a predetermined range; and evaluating the feature vector using support vector processing to determine whether the executable file is harmful. 11. The computer-readable storage device according to claim 10 , wherein the extracting of the plurality of static data points executed by the processor further comprises classifying the extracted static data points according to a type of data extracted from the executable file, and encoding the extracted static data points for the classifier based on the classifying. 12. The computer-readable storage device according to claim 11 , wherein generating the at least one feature vector comprises selectively setting features of the classifier based on the plurality of extracted static data points. 13. The computer-readable storage device according to claim 11 , wherein the classifying the at least one static data point comprises classifying the plurality of extracted static data points into categories comprising numeric values, nominal values, string or byte sequences, and Boolean values. 14. The computer-readable storage device according to claim 13 , wherein analyzing the executable file further comprises generating a feature vector from the plurality of extracted static data points, wherein features of the generated feature vector are weighted based at least upon the classified categories and the plurality of extracted static data points. 15. The computer-readable storage device according to claim 10 , wherein determining whether the executable file is harmful further comprises preventing execution of the executable file when a determined probability value that the executable file is harmful exceeds a threshold value. 16. The computer-readable storage device according to claim 15 , wherein the threshold value is set based on predetermined false positive range data. 17. A system comprising: at least one memory; and at least one processor operatively connected with the memory and configured to perform operation of: extracting a plurality of predefined character strings from an executable file without decrypting or unpacking the executable file; generating a feature vector from the plurality of predefined character strings using a classifier trained to classify the plurality of predefined character strings based on a collection of data comprising known malicious executable files, known benign executable files and known unwanted executable files, wherein the collection of data comprises at least a portion of one or more of the plurality of predefined character strings, and wherein one or more features of the feature vector are selectively turned on or off based on whether a value of one or more predefined character strings from the plurality of predefined character strings is within a predetermined range; and evaluating the feature vector using support vector processing to determine whether the executable file is harmful. 18. The system according to claim 17 , wherein the determining further comprises preventing execution of the executable file when a determined probability value that the executable file is harmful exceeds a threshold value, and wherein the threshold value is set based on predetermined false positive range data. 19. The system according to claim 17 , wherein the extracting the plurality of predefined character strings further comprises classifying the extracted plurality of predefined character strings according to a type of data extracted from the executable file, and encoding the extracted plurality of predefined character strings for the classifier based on the classification.
Static detection · CPC title
Decompilation; Disassembly · CPC title
Machine learning · CPC title
Test or assess software · CPC title
by checking file integrity · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.