Automatically grouping malware based on artifacts

US10230749B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10230749-B1
Application numberUS-201615056976-A
CountryUS
Kind codeB1
Filing dateFeb 29, 2016
Priority dateFeb 29, 2016
Publication dateMar 12, 2019
Grant dateMar 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for automatically grouping malware based on artifacts are disclosed. In some embodiments, a system, process, and/or computer program product for automatically grouping malware based on artifacts includes receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features associated with malware; clustering the plurality of samples based on the extracted features; and performing an action based on the clustering output.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features associated with malware, wherein each of the extracted features corresponds to a line or a sub-line in one or more of the log files determined to be an artifact associated with malware; clustering the plurality of samples based on the extracted features, wherein clustering the plurality of samples based on the extracted features further comprises: selecting one or more of the extracted features and assigning values to each indicator, wherein selecting one or more of the extracted features includes performing a pre-filtering operation to select the extracted features for clustering based on a threshold association between the line or the sub-line in the one or more of the log files and known malware; collecting the assigned values in an array for each of the plurality of samples; comparing the assigned values of the array between two of the plurality of samples; and calculating a distance between the two samples, wherein the samples within a defined threshold of distance are clustered; and performing an action based on an output of clustering the plurality of samples based on the extracted features, wherein the action based on the output of clustering the plurality of samples based on the extracted features further comprises validate the output of clustering the plurality of samples based on the extracted features based on tags to identify previously identified malware groups. 2. The method of claim 1 , wherein the extracted features correspond to high-risk artifacts, and wherein each high-risk artifact is determined to be associated with a malware sample based on the automated malware analysis. 3. The method of claim 1 , wherein performing automated malware analysis includes performing a dynamic analysis. 4. The method of claim 1 , wherein performing automated malware analysis includes performing a static analysis. 5. The method of claim 1 , further comprising: clustering the plurality of samples based on the extracted features using a decision tree clustering process. 6. The method of claim 1 , further comprising: clustering the plurality of samples based on the extracted features using a k-means++ clustering process. 7. A system, comprising: a processor configured to: receive a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; process the log files to extract features associated with malware, wherein each of the extracted features corresponds to a line or a sub-line in one or more of the log files determined to be an artifact associated with malware; cluster the plurality of samples based on the extracted features, wherein clustering the plurality of samples based on the extracted features further comprises: select one or more of the extracted features and assigning values to each indicator, wherein selecting one or more of the extracted features includes performing a pre-filtering operation to select the extracted features for clustering based on a threshold association between the line or the sub-line in the one or more of the log files and known malware; collect the assigned values in an array for each of the plurality of samples; compare the assigned values of the array between two of the plurality of samples; and calculate a distance between the two samples, wherein the samples within a defined threshold of distance are clustered; and perform an action based on an output of clustering the plurality of samples based on the extracted features, wherein the action based on the output of clustering the plurality of samples based on the extracted features further comprises validate the output of clustering the plurality of samples based on the extracted features based on tags to identify previously identified malware groups; and a memory coupled to the processor and configured to provide the processor with instructions. 8. The system recited in claim 7 , wherein the extracted features correspond to high-risk artifacts, and wherein each high-risk artifact is determined to be associated with a malware sample based on the automated malware analysis. 9. The system recited in claim 7 , wherein performing automated malware analysis includes performing a dynamic analysis. 10. The system recited in claim 7 , wherein performing automated malware analysis includes performing a static analysis. 11. The system recited in claim 7 , wherein a log file for a sample comprises one or more lines based on results of the automated malware analysis for the sample. 12. The system recited in claim 7 , wherein the processor is further configured to: cluster the plurality of samples based on the extracted features using a decision tree clustering process. 13. The system recited in claim 7 , wherein the processor is further configured to: cluster the plurality of samples based on the extracted features using a k-means++ clustering process. 14. The system recited in claim 7 , wherein the processor is further configured to: cluster the plurality of samples based on the extracted features using a decision tree clustering process; and cluster the plurality of samples based on the extracted features using a k-means++ clustering process. 15. A computer program product, the computer program product being embodied in a non-transitory tangible computer readable storage medium and comprising computer instructions for: receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features associated with malware, wherein each of the extracted features corresponds to a line or a sub-line in one or more of the log files determined to be an artifact associated with malware; clustering the plurality of samples based on the extracted features, wherein clustering the plurality of samples based on the extracted features further comprises: selecting one or more of the extracted features and assigning values to each indicator, wherein selecting one or more of the extracted features includes performing a pre-filtering operation to select the extracted features for clustering based on a threshold association between the line or the sub-line in the one or more of the log files and known malware; collecting the assigned values in an array for each of the plurality of samples; comparing the assigned values of the array between two of the plurality of samples; and calculating a distance between the two samples, wherein the samples within a defined threshold of distance are clustered; and performing an action based on an output of clustering the plurality of samples based on the extracted features, wherein the action based on the output of clustering the plurality of samples based on the extracted features further comprises validate the output of clustering the plurality of samples based on the extracted features based on tags to identify previously identified malware groups. 16. The computer program product recited in claim 15 , wherein the extracted features correspond to high-risk artifacts, and wherein each high-risk artifact is determined to be associated with a malware sample based on the automated malware analysis. 17. The computer program product recited in claim 15 , wherein performing automated malware analysis includes performing a dynamic anal

Assignees

Inventors

Classifications

  • involving long-term monitoring or reporting · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Static detection · CPC title

  • Test or assess a computer or a system · CPC title

  • Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10230749B1 cover?
Techniques for automatically grouping malware based on artifacts are disclosed. In some embodiments, a system, process, and/or computer program product for automatically grouping malware based on artifacts includes receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features…
Who is the assignee on this patent?
Palo Alto Networks Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).