Container file analysis using machine learning model

US10637874B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10637874-B2
Application numberUS-201615345444-A
CountryUS
Kind codeB2
Filing dateNov 7, 2016
Priority dateSep 1, 2016
Publication dateApr 28, 2020
Grant dateApr 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one respect, there is provided a system for training a machine learning model to detect malicious container files. The system may include at least one processor and at least one memory. The memory may include program code which when executed by the at least one processor provides operations including: processing a container file with a trained machine learning model, wherein the trained machine learning is trained to determine a classification for the container file indicative of whether the container file includes at least one file rendering the container file malicious; and providing, as an output by the trained machine learning model, an indication of whether the container file includes the at least one file rendering the container file malicious. Related methods and articles of manufacture, including computer program products, are also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: at least one processor; and at least one memory including program code which when executed by the at least one processor provides operations comprising: extracting features from each of a plurality of files in a container file; generating, for each file, a feature vector comprising the corresponding extracted features; processing, using the feature vectors, the container file with a trained machine learning model, wherein the trained machine learning model is trained to determine a classification for the container file indicative of whether the container file includes at least one file rendering the container file malicious; and providing, as an output by the trained machine learning model, an indication of whether the container file includes the at least one file rendering the container file malicious; wherein the trained machine learning model is a convolutional neural network that comprises: at least one convolutional layer (i) concurrently processing the plurality of feature vectors in groups of two or more overlapping feature vectors where each group may include at least one feature vector that is included in one or more other groups and (ii) generate a feature map for each group by at least applying at least one kernel to each group; and a pooling layer configured to apply a maximum pooling function to the feature maps, and wherein applying the maximum pooling function identifies a plurality of maximum features from the plurality of feature maps and the classification is based on such maximum features; wherein: features from each file within a container file used to train the machine learning model are concatenated to form an extended feature space for use during the training; the extended feature space prevents misclassification by the trained machine learning model for different container files storing identical or similar sets of files in a different order; and the features are selected from a group consisting of: file name, file path or location, size, creator, owner, or embedded Universal Resource Locator (URL). 2. The system of claim 1 , wherein the at least one file rendering the container file malicious comprises a malicious file. 3. The system of claim 2 , wherein the malicious file comprises unwanted data, an unwanted portion of a script, and/or an unwanted portion of program code. 4. The system of claim 1 , wherein the at least one file rendering the container file malicious comprises a benign file rendering the container file malicious when combined with another benign file from the container file. 5. The system of claim 1 , wherein applying the at least one kernel includes computing a dot product between features included in each kernel and features included in a first overlapping group of feature vectors to generate a first entry in the corresponding feature map, and computing another dot product between features included in each kernel and features included in a second overlapping group of feature vectors to generate a second entry in such corresponding feature map. 6. A method for implementation by one or more data processors forming part of at least one computing device, the method comprising: extracting features from each of a plurality of files in a container file; generating, for each file, a feature vector comprising the corresponding extracted features; processing, using the feature vectors, the container file with a trained machine learning model, wherein the trained machine learning model is configured to determine a classification qfor the container file indicative of whether the container file includes a plurality of files and at least one file rendering the container file malicious, the processing comprising concatenating features extracted from each of the plurality of files in the container file into a feature space for input into the trained machine learning model; and providing, as an output, an indication of whether the container file includes the at least one file rendering the container file malicious; wherein the trained machine learning model is a convolutional neural network that comprises: at least one convolutional layer (i) concurrently processing the plurality of feature vectors in groups of two or more overlapping feature vectors where each group may include at least one feature vector that is included in one or more other groups and (ii) generate a feature map for each group by at least applying at least one kernel to each group; and a pooling layer configured to apply a maximum pooling function to the feature maps, and wherein applying the maximum pooling function identifies a plurality of maximum features from the plurality of feature maps and the classification is based on such maximum features; wherein: features from each file within a container file used to train the machine learning model are concatenated to form an extended feature space for use during the training; the extended feature space prevents misclassification by the trained machine learning model for different container files storing identical or similar sets of files in a different order; and the features are selected from a group consisting of: file name, file path or location, size, creator, owner, or embedded Universal Resource Locator (URL). 7. The method of claim 6 , wherein the at least one file rendering the container file malicious comprises a malicious file. 8. The method of claim 6 , wherein the at least one file rendering the container file malicious comprises a benign file rendering the container file malicious when combined with another benign file from the container file. 9. The method of claim 6 , wherein applying the at least one kernel includes computing a dot product between features included in each kernel and features included in a first overlapping group of feature vectors to generate a first entry in the corresponding feature map, and computing another dot product between features included in each kernel and features included in a second overlapping group of feature vectors to generate a second entry in such corresponding feature map. 10. The method of claim 6 , wherein the malicious file comprises unwanted data, an unwanted portion of a script, and/or an unwanted portion of program code. 11. A non-transitory computer-readable storage medium including program code which when executed by at least one processor causes operations comprising: extracting features from each of a plurality of files in a container file; generating, for each file, a feature vector comprising the corresponding extracted features; processing, using the feature vectors, the container file with a trained machine learning model, wherein the trained machine learning model is configured to determine a classification for the container file indicative of whether the container file includes a plurality of files and at least one file rendering the container file malicious, the processing comprising concatenating features extracted from each of the plurality of files in the container file into a feature space for input into the trained machine learning model; and providing, as an output, an indication of whether the container file includes the at least one file rendering the container file malicious; wherein the trained machine learning model is a convolutional neural network that comprises: at least one convolutional layer (i) concurrently processing the plurality of feature vectors in groups of two or more overlapping feature vectors where each group may include at least one feature vector that is included in one or more other groups and (ii) generate a feature map for each group by at least applying at least one kernel

Assignees

Inventors

Classifications

  • Static detection · CPC title

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

  • Learning methods · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10637874B2 cover?
In one respect, there is provided a system for training a machine learning model to detect malicious container files. The system may include at least one processor and at least one memory. The memory may include program code which when executed by the at least one processor provides operations including: processing a container file with a trained machine learning model, wherein the trained mach…
Who is the assignee on this patent?
Cylance Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).