Systems and methods for trichotomous malware classification

US10366233B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10366233-B1
Application numberUS-201615356526-A
CountryUS
Kind codeB1
Filing dateNov 18, 2016
Priority dateNov 18, 2016
Publication dateJul 30, 2019
Grant dateJul 30, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed computer-implemented method for trichotomous malware classification may include (1) identifying a sample potentially representing malware, (2) selecting a machine learning model trained on a set of samples to distinguish between malware samples and benign samples, (3) analyzing the sample using a plurality of stochastically altered versions of the machine learning model to produce a plurality of classification results, (4) calculating a variance of the plurality of classification results, and (5) classifying the sample based at least in part on the variance of the plurality of classification results. Various other methods, systems, and computer-readable media are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for trichotomous malware classification, at least a portion of the method being performed by one or more computing devices comprising at least one processor, the method comprising: identifying a sample potentially representing malware; selecting a machine learning model trained on a set of samples to distinguish between malware samples and benign samples, the machine learning model including one or more independent processing units; analyzing the sample using a plurality of stochastically altered versions of the machine learning model to produce a plurality of classification results, wherein analyzing the sample includes applying the selected machine learning model through a filter that modifies the operation of the processing units of the machine learning model dynamically as the processing units are applied to the sample; calculating a variance of the plurality of classification results; adjusting the calculated variance by accessing a precision value associated with the machine learning model and adding an inverse of the precision value to the calculated variance to derive a predictive variance of the machine learning model for the sample; and trichotomously classifying the sample based at least in part on the predictive variance of the plurality of classification results. 2. The computer-implemented method of claim 1 , further comprising performing a security action to protect the one or more computing devices in response to classifying the sample. 3. The computer-implemented method of claim 1 , wherein classifying the sample based at least in part on the variance of the plurality of classification results comprises classifying the sample as an uncertain sample rather than as a malware sample or a benign sample based on the variance exceeding a predetermined threshold. 4. The computer-implemented method of claim 1 , wherein classifying the sample based at least in part on the variance of the plurality of classification results comprises: analyzing the sample using the machine learning model to produce a probability that the sample is a malware sample; and determining that the probability that the sample is a malware sample falls within a probability window that is defined at least in part based on the variance of the plurality of classification results. 5. The computer-implemented method of claim 1 , wherein the machine learning model comprises a neural network. 6. The computer-implemented method of claim 5 , wherein training the neural network comprises applying dropout regularization when training the neural network. 7. The computer-implemented method of claim 5 , wherein analyzing the sample using the plurality of stochastically altered versions of the machine learning model to produce the plurality of classification results comprises generating the plurality of stochastically altered versions of the machine learning model by applying, for each stochastically altered version of the machine learning model within the plurality of stochastically altered versions of the machine learning model, a dropout mask randomly generated for the stochastically altered version of the machine learning model. 8. The computer-implemented method of claim 1 , wherein the machine learning model comprises a gradient tree boosting model. 9. The computer-implemented method of claim 8 , wherein analyzing the sample using the plurality of stochastically altered versions of the machine learning model to produce the plurality of classification results comprises generating the plurality of stochastically altered versions of the machine learning model by, for each stochastically altered version of the machine learning model within the plurality of stochastically altered versions of the machine learning model, randomly masking a subset of features within the stochastically altered version of the machine learning model. 10. The computer-implemented method of claim 1 , wherein the machine learning model comprises a random forest model. 11. The computer-implemented method of claim 10 , wherein analyzing the sample using the plurality of stochastically altered versions of the machine learning model to produce the plurality of classification results comprises: normalizing each feature within the random forest model to have zero mean and to have unit variance; and for each stochastically altered version of the machine learning model within the plurality of stochastically altered versions of the machine learning model, randomly determining, for at least one split, to replace the use of a feature at the split with the use of a different feature within the machine learning model. 12. A system for trichotomous malware classification, the system comprising: an identification module, stored in a memory, that identifies a sample potentially representing malware; a selection module, stored in the memory, that selects a machine learning model trained on a set of samples to distinguish between malware samples and benign samples, the machine learning model including one or more independent processing units; an analysis module, stored in the memory, that analyzes the sample using a plurality of stochastically altered versions of the machine learning model to produce a plurality of classification results, wherein analyzing the sample includes applying the selected machine learning model through a filter that modifies the operation of the processing units of the machine learning model dynamically as the processing units are applied to the sample; a calculation module, stored in the memory, that calculates a variance of the plurality of classification results and adjusts the calculated variance by accessing a precision value associated with the machine learning model and adding an inverse of the precision value to the calculated variance to derive a predictive variance of the machine learning model for the sample; a classifying module, stored in the memory, that trichotomously classifies the sample based at least in part on the predictive variance of the plurality of classification results; and at least one physical processor configured to execute the identification module, the selection module, the analysis module, the calculation module, and the classifying module. 13. The system of claim 12 , further comprising a performing module, stored in memory, that performs a security action in response to classifying the sample. 14. The system of claim 12 , wherein the classifying module classifies the sample based at least in part on the variance of the plurality of classification results by classifying the sample as an uncertain sample rather than as a malware sample or a benign sample based on the variance exceeding a predetermined threshold. 15. The system of claim 12 , wherein the classifying module classifies the sample based at least in part on the variance of the plurality of classification results by: analyzing the sample using the machine learning model to produce a probability that the sample is a malware sample; and determining that the probability that the sample is a malware sample falls within a probability window that is defined at least in part based on the variance of the plurality of classification results. 16. The system of claim 12 , wherein the machine learning model comprises a neural network. 17. The system of claim 16 , wherein the selection module further trains the neural network by applying dropout regularization when training the neural network. 18. The system of claim 16 , wherein the analysis module analyzes the sa

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Ensemble learning · CPC title

  • G06F21/562Primary

    Static detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10366233B1 cover?
The disclosed computer-implemented method for trichotomous malware classification may include (1) identifying a sample potentially representing malware, (2) selecting a machine learning model trained on a set of samples to distinguish between malware samples and benign samples, (3) analyzing the sample using a plurality of stochastically altered versions of the machine learning model to produce…
Who is the assignee on this patent?
Symantec Corp
What technology area does this patent fall under?
Primary CPC classification G06F21/562. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).