Systems and methods for detecting malware
US-10133865-B1 · Nov 20, 2018 · US
US10754948B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10754948-B2 |
| Application number | US-201715490797-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 18, 2017 |
| Priority date | Apr 18, 2017 |
| Publication date | Aug 25, 2020 |
| Grant date | Aug 25, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Under one aspect, a method is provided for protecting a device from a malicious file. The method can be implemented by one or more data processors forming part of at least one computing device and can include extracting from the file, by at least one data processor, sequential data comprising discrete tokens. The method also can include generating, by at least one data processor, n-grams of the discrete tokens. The method also can include generating, by at least one data processor, a vector of weights based on respective frequencies of the n-grams. The method also can include determining, by at least one data processor and based on a statistical analysis of the vector of weights, that the file is likely to be malicious. The method also can include initiating, by at least one data processor and responsive to determining that the file is likely to be malicious, a corrective action.
Opening claim text (preview).
What is claimed: 1. A method for protecting a device from a malicious file, the method being implemented by one or more data processors forming part of at least one computing device and comprising: identifying one or more suitable sections associated with the file for analysis based on characteristic codes therein, the file being a portable executable (PE) file, the suitable sections comprising at least one of: an entry point function of the PE file or a Nullsoft scriptable install system (NSIS) associated with the PE file; extracting from the identified one or more suitable sections of the file, by at least one data processor, sequential data comprising discrete tokens, the discrete tokens being Nullsoft scriptable install system (NSIS) opcodes; generating, by at least one data processor, n-grams of the discrete tokens; generating, by at least one data processor using a bag of words algorithm, a vector of weights based on respective frequencies of the n-grams; determining, by at least one data processor and based on a statistical analysis of the vector of weights, that the file is likely to be malicious; and initiating, by at least one data processor and responsive to determining that the file is likely to be malicious, a corrective action. 2. The method of claim 1 , wherein the extracted sequential data comprises operation code. 3. The method of claim 2 , wherein the discrete tokens respectively comprise syllables of machine language instructions within the operation code. 4. The method of claim 1 , wherein generating the vector of weights comprises: determining, by at least one data processor, a term frequency of each of the n-grams among the other n-grams. 5. The method of claim 4 , wherein generating the vector of weights further comprises: determining, by at least one data processor, an inverse document frequency of each of the n-grams within a corpus. 6. The method of claim 5 , wherein generating the vector of weights further comprises: generating, by at least one data processor, a dot product of the term frequency and the inverse document frequency for each of the n-grams. 7. The method of claim 1 , wherein the statistical analysis of the vector of weights comprises performing a logistic regression on the vector of weights. 8. The method of claim 1 , wherein the statistical analysis of the vector of weights comprises inputting the vector of weights to a machine learning model. 9. The method of claim 8 , wherein the machine learning model is selected from the group consisting of generalized linear models, ordinary least squares, ridge regression, lasso, multi-task lasso, elastic net, multi-task elastic net, least angle regression, LARS lasso, orthogonal matching pursuit, Bayesian regression, naive Bayesian, logistic regression, stochastic gradient descent, neural networks, Perceptron, passive aggressive algorithms, robustness regression, Huber regression, polynomial regression, linear and quadratic discriminant analysis, kernel ridge regression, support vector machines, stochastic gradient descent, nearest neighbor, Gaussian processes, cross-decomposition, decision trees, random forest, and ensemble methods. 10. The method of claim 1 , wherein n is at least two. 11. The method of claim 1 , wherein the corrective action is selected from the group consisting of quarantining the file, stopping execution of the file, notifying the user that the file likely is malicious, flagging the file, storing the file, generating a hash of the file, transmitting the file or a hash of the file, and reverting to an earlier version of the file or device software. 12. A system for protecting a device from a malicious file, the system comprising: a data processor; and memory storing instructions which, when executed by the data processor, result in operations comprising: identifying one or more suitable sections associated with the file for analysis based on characteristic codes therein, the file being a portable executable (PE) file, the suitable sections comprising at least one of: an entry point function of the PE file or a Nullsoft scriptable install system (NSIS) associated with the PE file; extracting from the one or more suitable sections of the file sequential data comprising discrete tokens; generating n-grams of the discrete tokens, the discrete tokens being JAVASCRIPT tokens; generating, using a bag of words algorithm, a vector of weights based on respective frequencies of the n-grams; based on a statistical analysis of the vector of weights, determining that the file is likely to be malicious; and initiating, responsive to determining that the file is likely to be malicious, a corrective action. 13. The system of claim 12 , wherein the extracted sequential data comprises operation code. 14. The system of claim 13 , wherein the discrete tokens respectively comprise syllables of machine language instructions within the operation code. 15. The system of claim 12 , wherein generating the vector of weights comprises: determining a term frequency of each of the n-grams among the other n-grams. 16. The system of claim 15 , wherein generating the vector of weights further comprises: determining an inverse document frequency of each of the n-grams within a corpus. 17. The system of claim 16 , wherein generating the vector of weights further comprises: generating a dot product of the term frequency and the inverse document frequency for each of the n-grams. 18. The system of claim 12 , wherein the statistical analysis of the vector of weights comprises performing a logistic regression on the vector of weights. 19. The system of claim 12 , wherein the statistical analysis of the vector of weights comprises inputting the vector of weights to a machine learning model. 20. The system of claim 19 , wherein the machine learning model is selected from the group consisting of generalized linear models, ordinary least squares, ridge regression, lasso, multi-task lasso, elastic net, multi-task elastic net, least angle regression, LARS lasso, orthogonal matching pursuit, Bayesian regression, naive Bayesian, logistic regression, stochastic gradient descent, neural networks, Perceptron, passive aggressive algorithms, robustness regression, Huber regression, polynomial regression, linear and quadratic discriminant analysis, kernel ridge regression, support vector machines, stochastic gradient descent, nearest neighbor, Gaussian processes, cross-decomposition, decision trees, random forest, and ensemble methods. 21. The system of claim 12 , wherein n is at least two. 22. The system of claim 12 , wherein the corrective action is selected from the group consisting of quarantining the file, stopping execution of the file, notifying the user that the file likely is malicious, flagging the file, storing the file, generating a hash of the file, transmitting the file or a hash of the file, and reverting to an earlier version of the file or device software. 23. A non-transitory computer program product storing instructions which, when executed by a data processor forming part of a computing device, result in operations comprising: identifying one or more suitable sections associated with a file for analysis based on characteristic codes therein, the file being a portable executable (PE) file, the suitable sections comprising at least an entry point function of the PE file or a Nullsoft scriptable install system (NSIS) associated with the PE file; extracting from the o
Machine learning · CPC title
by checking file integrity · CPC title
by source code analysis · CPC title
Test or assess a computer or a system · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.