Training a machine learning model for container file analysis

US10503901B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10503901-B2
Application numberUS-201615345439-A
CountryUS
Kind codeB2
Filing dateNov 7, 2016
Priority dateSep 1, 2016
Publication dateDec 10, 2019
Grant dateDec 10, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one respect, there is provided a system for training a machine learning model to detect malicious container files. The system may include at least one processor and at least one memory. The at least one memory may include program code that provides operations when executed by the at least one processor. The operations may include: training, based on a training data, a machine learning model to enable the machine learning model to determine whether at least one container file includes at least one file rendering the at least one container file malicious; and providing the trained machine learning model to enable the determination of whether the at least one container file includes at least one file rendering the at least one container file malicious. Related methods and articles of manufacture, including computer program products, are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: at least one processor; and at least one non-transitory memory including program code which when executed by the at least one processor provides operations comprising: training, based at least on training data, a machine learning model to enable the machine learning model to determine whether at least one container the includes at least one the rendering the at least one container the malicious, each container file encapsulating a plurality of files; and providing the trained machine learning model to enable a determination of whether at least one subsequently received container the includes at least one the rendering the at least one subsequently received container the malicious, the determination comprising a classification of the at least one subsequently received container the which is used to determine whether to access the plurality of files contained within the at least one subsequently received container file; wherein: the training data comprises a plurality of historical container files at least a portion of which are known to include the at least one the rendering the historical container the malicious; for each historical container file, features from each the contained therein are concatenated to form an extended feature space for use during the training; the extended feature space prevents misclassification by the trained machine learning model for different container files storing identical or similar sets of files in a different order; and the features are selected from a group consisting of: the name, the path or location, size, creator, owner, or embedded Universal Resource Locator (URL). 2. The system of claim 1 , wherein the at least one file rendering the historical container file malicious comprises a malicious file. 3. The system of claim 2 , wherein the malicious file comprises unwanted data, an unwanted portion of a script, and/or an unwanted portion of program code. 4. The system of claim 1 , wherein the at least one file rendering the historical container file malicious comprises a benign file rendering the historical container file malicious when combined with another benign file from the historical container file. 5. The system of claim 1 , wherein the machine learning model comprises a neural network. 6. The system of claim 5 , wherein the neural network comprises a convolutional neural network. 7. The system of claim 1 , wherein the machine learning model comprises a pooling layer configured to apply a maximum pooling function to the training data, and wherein applying the maximum pooling function identifies a maximum feature from a plurality of files included in the training data. 8. The system of claim 7 , wherein the plurality of files includes a first file, a second file, and a third file. 9. The system of claim 8 , wherein the operations further comprise: receiving the training data by at least receiving a first feature vector, a second feature vector, and a third feature vector that include one or more features of the respective first file, the second file, and the third file. 10. The system of claim 9 , wherein the machine learning model comprises a convolution layer configured to generate a first feature map by at least applying a first kernel to a plurality of overlapping groups of feature vectors. 11. The system of claim 10 , wherein a first overlapping group of feature vectors includes the first feature vector and the second feature vector, and wherein a second overlapping group of feature vectors includes the second feature vector and the third feature vector. 12. The system of claim 11 , wherein applying the first kernel includes computing a dot product between features included in the first kernel and features included in the first overlapping group of feature vectors to generate a first entry in the first feature map, and computing another dot product between features included in the first kernel and features included in the second overlapping group of feature vectors to generate a second entry in the first feature map. 13. The system of claim 12 , wherein the computing of the dot product and the other dot product detects a presence of the features included in the first kernel in the first and second overlapping group of feature vectors. 14. The system of claim 10 , wherein the convolution layer is further configured to generate a second feature map by at least applying a second kernel to the plurality of overlapping groups of feature vectors. 15. The system of claim 14 , wherein the first kernel includes a combination of features, and wherein the second kernel includes a different combination of features. 16. The system of claim 1 , wherein training the machine learning model includes processing the training data with the machine learning model to detect a presence of the at least one file in the training data, back propagating an error in the detection of the at least one file, and adjusting one or more weights and/or biases applied by the machine learning model to minimize the error in the detection of the at least one file. 17. The system of claim 16 , wherein the operations further comprise: receiving another training data; and processing the other training data with the machine learning model to detect a presence of at least one file in the other training data rendering the other training data malicious, wherein the training includes readjusting the one or more weights and/or biases applied by the machine learning model to minimize an error in the detection of the at least one file in the other training data. 18. The system of claim 1 , wherein the operations further comprise: determining, using the trained machine learning model, whether the at least one subsequently received container file includes at least one file rendering the at least one subsequently received container file malicious, the determination comprising the classification of the at least one subsequently received container file which is used to determine whether to access the plurality of files contained within the at least one subsequently received container file. 19. A method, comprising: training, based at least on training data, a machine learning model to enable the machine learning model to determine whether at least one container the includes at least one the rendering the at least one container the malicious, each container the encapsulating a plurality of files; and providing the trained machine learning model to enable a determination of whether at least one subsequently received container the includes at least one the rendering the at least one subsequently received container the malicious, the determination comprising a classification of the at least one subsequently received container the which is used to determine whether to access the plurality of files contained within the at least one subsequently received container file; wherein: the training data comprises a plurality of historical container files at least a portion of which are known to include the at least one the rendering the historical container the malicious; for each historical container file, features from each the contained therein are concatenated to form an extended feature space for use during the training; the extended feature space prevents misclassification by the trained machine learning model for different container files storing identical or similar sets of files in a different order; and the features are selected from a group consisting of: the name, the path or location, s

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06F21/562Primary

    Static detection · CPC title

  • Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems · CPC title

  • Test or assess software · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10503901B2 cover?
In one respect, there is provided a system for training a machine learning model to detect malicious container files. The system may include at least one processor and at least one memory. The at least one memory may include program code that provides operations when executed by the at least one processor. The operations may include: training, based on a training data, a machine learning model …
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/562. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 10 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).