What technology area does this patent fall under?

Primary CPC classification G06F21/565. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data-driven identification of malicious files using machine learning and an ensemble of malware detection procedures

US10853489B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10853489-B2
Application number	US-201816165051-A
Country	US
Kind code	B2
Filing date	Oct 19, 2018
Priority date	Oct 19, 2018
Publication date	Dec 1, 2020
Grant date	Dec 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for data-driven ensemble-based malware detection. An exemplary method comprises obtaining a file; extracting metadata from the file; obtaining a plurality of malware detection procedures; selecting a subset of the plurality of malware detection procedures to apply to the file utilizing a likelihood that each of the plurality of malware detection procedures will result in a malware detection for the file based on the extracted metadata; applying the selected subset of the malware detection procedures to the file; and processing results of the subset of malware detection procedures using a machine learning model to determine a probability of the file being malware.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining a file; extracting metadata from the file; obtaining a plurality of malware detection procedures; selecting, using at least one processing device, a subset of the plurality of malware detection procedures to apply to the file utilizing a likelihood that each of the plurality of malware detection procedures will result in a malware detection for the file based on the extracted metadata; applying, using the at least one processing device, the selected subset of the malware detection procedures to the file; and processing, using the at least one processing device, results of the subset of malware detection procedures using a machine learning model to determine a probability of the file being malware. 2. The method of claim 1 , wherein the step of selecting the subset of the malware detection procedures to apply to the file employs a Bayesian model that determines a probability that a given malware detection procedure will detect malware in the given file based on one or more historical executions of the given malware detection procedure and characteristics of historical files on which the given malware detection procedure was previously executed. 3. The method of claim 2 , further comprising the step of updating the Bayesian model as new files are tested by the given malware detection procedure. 4. The method of claim 2 , further comprising the step of obtaining a configuration of one or more of a substantially maximum number of malware detection procedures to be executed for a given file, a detection probability threshold, and one or more metadata features to be used for training the Bayesian model. 5. The method of claim 1 , wherein the step of processing the results of the subset of the malware detection procedures using the machine learning model employs a supervised machine learning model that processes the results of the subset of the malware detection procedures as an input and models relationships within the results to generate a health score indicating whether the file is malware. 6. The method of claim 5 , wherein the supervised machine learning model is trained using a plurality of historical files classified as malware as positive examples and a plurality of historical files classified as non-malicious as negative examples. 7. The method of claim 1 , further comprising the step of generating one or more alerts for a detected malware based on one or more of a user configuration, at least one predefined rule and a predefined policy. 8. A system, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps: obtaining a file; extracting metadata from the file; obtaining a plurality of malware detection procedures; selecting a subset of the plurality of malware detection procedures to apply to the file utilizing a likelihood that each of the plurality of malware detection procedures will result in a malware detection for the file based on the extracted metadata; applying the selected subset of the malware detection procedures to the file; and processing results of the subset of malware detection procedures using a machine learning model to determine a probability of the file being malware. 9. The system of claim 8 , wherein the step of selecting the subset of the malware detection procedures to apply to the file employs a Bayesian model that determines a probability that a given malware detection procedure will detect malware in the given file based on one or more historical executions of the given malware detection procedure and characteristics of historical files on which the given malware detection procedure was previously executed. 10. The system of claim 9 , further comprising the steps of updating the Bayesian model as new files are tested by the given malware detection procedure and obtaining a configuration of one or more of a substantially maximum number of malware detection procedures to be executed for a given file, a detection probability threshold, and one or more metadata features to be used for training the Bayesian model. 11. The system of claim 8 , wherein the step of processing the results of the subset of the malware detection procedures using the machine learning model employs a supervised machine learning model that processes the results of the subset of the malware detection procedures as an input and models relationships within the results to generate a health score indicating whether the file is malware. 12. The system of claim 11 , wherein the supervised machine learning model is trained using a plurality of historical files classified as malware as positive examples and a plurality of historical files classified as non-malicious as negative examples. 13. The system of claim 8 , further comprising the step of generating one or more alerts for a detected malware based on one or more of a user configuration, at least one predefined rule and a predefined policy. 14. A computer program product, comprising a non-transitory machine-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps: obtaining a file; extracting metadata from the file; obtaining a plurality of malware detection procedures; selecting a subset of the plurality of malware detection procedures to apply to the file utilizing a likelihood that each of the plurality of malware detection procedures will result in a malware detection for the file based on the extracted metadata; applying the selected subset of the malware detection procedures to the file; and processing results of the subset of malware detection procedures using a machine learning model to determine a probability of the file being malware. 15. The computer program product of claim 14 , wherein the step of selecting the subset of the malware detection procedures to apply to the file employs a Bayesian model that determines a probability that a given malware detection procedure will detect malware in the given file based on one or more historical executions of the given malware detection procedure and characteristics of historical files on which the given malware detection procedure was previously executed. 16. The computer program product of claim 15 , further comprising the step of updating the Bayesian model as new files are tested by the given malware detection procedure. 17. The computer program product of claim 15 , further comprising the step of obtaining a configuration of one or more of a substantially maximum number of malware detection procedures to be executed for a given file, a detection probability threshold, and one or more metadata features to be used for training the Bayesian model. 18. The computer program product of claim 14 , wherein the step of processing the results of the subset of the malware detection procedures using the machine learning model employs a supervised machine learning model that processes the results of the subset of the malware detection procedures as an input and models relationships within the results to generate a health score indicating whether the file is malware. 19. The computer program product of claim 18 , wherein the supervised machine learning model is trained using a plurality of historical files classified as malware as positive examples and a plurality of historical files classified as non-malicious as negative examples. 20. The

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N20/00
Machine learning · CPC title
G06F2221/033
Test or assess software · CPC title
G06F21/554
involving event detection and direct action · CPC title
G06F21/568
eliminating virus, restoring damaged files · CPC title

Patent family

Related publications grouped by family.

View patent family 70280869

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10853489B2 cover?: Techniques are provided for data-driven ensemble-based malware detection. An exemplary method comprises obtaining a file; extracting metadata from the file; obtaining a plurality of malware detection procedures; selecting a subset of the plurality of malware detection procedures to apply to the file utilizing a likelihood that each of the plurality of malware detection procedures will result in…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06F21/565. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Computing platform security methods and apparatus

Malware detection and analysis

Method and apparatus for retroactively detecting malicious or otherwise undesirable software

Frequently asked questions