Anomaly based malware detection

US11210394B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11210394-B2
Application numberUS-201916661933-A
CountryUS
Kind codeB2
Filing dateOct 23, 2019
Priority dateNov 21, 2016
Publication dateDec 28, 2021
Grant dateDec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one respect, there is provided a system for training a neural network adapted for classifying one or more scripts. The system may include at least one processor and at least one memory. The memory may include program code that provides operations when executed by the at least one processor. The operations may include: reducing a dimensionality of a plurality of features representative of a file set; determining, based at least on a reduced dimensional representation of the file set, a distance between a file and the file set; and determining, based at least on the distance between the file and the file set, a classification for the file. Related methods and articles of manufacture, including computer program products, are also provided.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system, comprising: at least one processor; and at least one memory including program code which when executed by the at least one processor provides operations comprising: reducing a dimensionality of a plurality of features representative of a file set that are not distributed in a Gaussian manner, the dimensionality being reduced by generating a random projection of the plurality of features that merges, into a single dimension, two or more of the plurality of features such that the files in the file set conform to a mixture of Gaussian distributions as a result of the random projection, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing; detecting the presence of a plurality of clusters by applying hierarchical Dirichlet processes to the random projection of the plurality of features, wherein each cluster of the plurality of clusters corresponds to a probability distribution of the file set; determining a generalized distance between a reduced dimension representation of a file and the clusters; determining, based at least on the determined generalized distance, a classification for the file that indicates whether the file is malicious or benign; and preventing the file from being accessed when the classification indicates that the file is not safe. 2. The system of claim 1 , wherein the generalized distance is a Mahalanobis distance. 3. The system of claim 2 , wherein the generalized distance corresponds to an amount of deviation between features of the file and the plurality of features representative of the file set. 4. The system of claim 3 , wherein the file set includes at least one file that is known to be a benign file. 5. The system of claim 4 , wherein the file is determined to be a malware file when the generalized distance exceeds a threshold value, and wherein the file is determined to be a benign file when the generalized distance does not exceed the threshold value. 6. The system of claim 3 , wherein the file set includes at least one file that is known to be a malware file and/or a specific type or family of malware file. 7. The system of claim 6 , wherein the file is determined to be a malware file and/or a specific type or family of malware file when the generalized distance does not exceed a threshold value, and wherein the file is determined to be a benign file when the generalized distance exceeds the threshold value. 8. A non-transitory computer-readable storage medium including program code which when executed by at least one processor causes operations comprising: reducing a dimensionality of a plurality of features representative of a file set that are not distributed in a Gaussian manner, the dimensionality being reduced by generating a random projection of the plurality of features that merges, into a single dimension, two or more of the plurality of features such that the files in the file set conform to a mixture of Gaussian distributions as a result of the random projection, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing; detecting the presence of a plurality of clusters by applying hierarchical Dirichlet processes to the random projection of the plurality of features, wherein each cluster of the plurality of clusters corresponds to a probability distribution of the file set; determining a generalized distance between a reduced dimension representation of a file and the clusters; determining, based at least on the determined generalized distance, a classification for the file that indicates whether the file is malicious or benign; and preventing the file from being accessed when the classification indicates that the file is not safe. 9. A method for implementation by one or more computing devices comprising: reducing a dimensionality of a plurality of features representative of a file set that are not distributed in a Gaussian manner, the dimensionality being reduced by generating a random projection of the plurality of features that merges, into a single dimension, two or more of the plurality of features such that the files in the file set conform to a mixture of Gaussian distributions as a result of the random projection, wherein the files in the file set are not distributed in a Gaussian manner prior to the reducing; detecting the presence of a plurality of clusters by applying hierarchical Dirichlet processes to the random projection of the plurality of features, wherein each cluster of the plurality of clusters corresponds to a probability distribution of the file set; determining a generalized distance between a reduced dimension representation of a file and the clusters; determining, based at least on the determined generalized distance, a classification for the file that indicates whether the file is malicious or benign; and preventing the file from being accessed when the classification indicates that the file is not safe. 10. The method of claim 9 , wherein the generalized distance is a Mahalanobis distance. 11. The method of claim 9 , wherein the generalized distance corresponds to an amount of deviation between features of the file and the plurality of features representative of the file set. 12. The method of claim 11 , wherein the file set includes at least one file that is known to be a benign file. 13. The method of claim 12 , wherein the file is determined to be a malware file when the generalized distance exceeds a threshold value, and wherein the file is determined to be a benign file when the generalized distance does not exceed the threshold value. 14. The method of claim 13 , wherein: the file set includes at least one file that is known to be a malware file and/or a specific type or family of malware file. 15. The method of claim 14 , wherein: the file is determined to be a malware file and/or a specific type or family of malware file when the generalized distances between the reduced dimensionality of the plurality of features representative of the file set and the clusters does not exceed a threshold value; and the file is determined to be a benign file, when the generalized distances between the reduced dimensionality of the plurality of features representative of the file set and the exceeds the threshold value.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06F21/563Primary

    by source code analysis · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210394B2 cover?
In one respect, there is provided a system for training a neural network adapted for classifying one or more scripts. The system may include at least one processor and at least one memory. The memory may include program code that provides operations when executed by the at least one processor. The operations may include: reducing a dimensionality of a plurality of features representative of a f…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/563. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).