Identity resolution in data intake stage of machine data processing platform
US-9838410-B2 · Dec 5, 2017 · US
US10230749B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10230749-B1 |
| Application number | US-201615056976-A |
| Country | US |
| Kind code | B1 |
| Filing date | Feb 29, 2016 |
| Priority date | Feb 29, 2016 |
| Publication date | Mar 12, 2019 |
| Grant date | Mar 12, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for automatically grouping malware based on artifacts are disclosed. In some embodiments, a system, process, and/or computer program product for automatically grouping malware based on artifacts includes receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features associated with malware; clustering the plurality of samples based on the extracted features; and performing an action based on the clustering output.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features associated with malware, wherein each of the extracted features corresponds to a line or a sub-line in one or more of the log files determined to be an artifact associated with malware; clustering the plurality of samples based on the extracted features, wherein clustering the plurality of samples based on the extracted features further comprises: selecting one or more of the extracted features and assigning values to each indicator, wherein selecting one or more of the extracted features includes performing a pre-filtering operation to select the extracted features for clustering based on a threshold association between the line or the sub-line in the one or more of the log files and known malware; collecting the assigned values in an array for each of the plurality of samples; comparing the assigned values of the array between two of the plurality of samples; and calculating a distance between the two samples, wherein the samples within a defined threshold of distance are clustered; and performing an action based on an output of clustering the plurality of samples based on the extracted features, wherein the action based on the output of clustering the plurality of samples based on the extracted features further comprises validate the output of clustering the plurality of samples based on the extracted features based on tags to identify previously identified malware groups. 2. The method of claim 1 , wherein the extracted features correspond to high-risk artifacts, and wherein each high-risk artifact is determined to be associated with a malware sample based on the automated malware analysis. 3. The method of claim 1 , wherein performing automated malware analysis includes performing a dynamic analysis. 4. The method of claim 1 , wherein performing automated malware analysis includes performing a static analysis. 5. The method of claim 1 , further comprising: clustering the plurality of samples based on the extracted features using a decision tree clustering process. 6. The method of claim 1 , further comprising: clustering the plurality of samples based on the extracted features using a k-means++ clustering process. 7. A system, comprising: a processor configured to: receive a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; process the log files to extract features associated with malware, wherein each of the extracted features corresponds to a line or a sub-line in one or more of the log files determined to be an artifact associated with malware; cluster the plurality of samples based on the extracted features, wherein clustering the plurality of samples based on the extracted features further comprises: select one or more of the extracted features and assigning values to each indicator, wherein selecting one or more of the extracted features includes performing a pre-filtering operation to select the extracted features for clustering based on a threshold association between the line or the sub-line in the one or more of the log files and known malware; collect the assigned values in an array for each of the plurality of samples; compare the assigned values of the array between two of the plurality of samples; and calculate a distance between the two samples, wherein the samples within a defined threshold of distance are clustered; and perform an action based on an output of clustering the plurality of samples based on the extracted features, wherein the action based on the output of clustering the plurality of samples based on the extracted features further comprises validate the output of clustering the plurality of samples based on the extracted features based on tags to identify previously identified malware groups; and a memory coupled to the processor and configured to provide the processor with instructions. 8. The system recited in claim 7 , wherein the extracted features correspond to high-risk artifacts, and wherein each high-risk artifact is determined to be associated with a malware sample based on the automated malware analysis. 9. The system recited in claim 7 , wherein performing automated malware analysis includes performing a dynamic analysis. 10. The system recited in claim 7 , wherein performing automated malware analysis includes performing a static analysis. 11. The system recited in claim 7 , wherein a log file for a sample comprises one or more lines based on results of the automated malware analysis for the sample. 12. The system recited in claim 7 , wherein the processor is further configured to: cluster the plurality of samples based on the extracted features using a decision tree clustering process. 13. The system recited in claim 7 , wherein the processor is further configured to: cluster the plurality of samples based on the extracted features using a k-means++ clustering process. 14. The system recited in claim 7 , wherein the processor is further configured to: cluster the plurality of samples based on the extracted features using a decision tree clustering process; and cluster the plurality of samples based on the extracted features using a k-means++ clustering process. 15. A computer program product, the computer program product being embodied in a non-transitory tangible computer readable storage medium and comprising computer instructions for: receiving a plurality of samples for performing automated malware analysis to generate log files based on the automated malware analysis; processing the log files to extract features associated with malware, wherein each of the extracted features corresponds to a line or a sub-line in one or more of the log files determined to be an artifact associated with malware; clustering the plurality of samples based on the extracted features, wherein clustering the plurality of samples based on the extracted features further comprises: selecting one or more of the extracted features and assigning values to each indicator, wherein selecting one or more of the extracted features includes performing a pre-filtering operation to select the extracted features for clustering based on a threshold association between the line or the sub-line in the one or more of the log files and known malware; collecting the assigned values in an array for each of the plurality of samples; comparing the assigned values of the array between two of the plurality of samples; and calculating a distance between the two samples, wherein the samples within a defined threshold of distance are clustered; and performing an action based on an output of clustering the plurality of samples based on the extracted features, wherein the action based on the output of clustering the plurality of samples based on the extracted features further comprises validate the output of clustering the plurality of samples based on the extracted features based on tags to identify previously identified malware groups. 16. The computer program product recited in claim 15 , wherein the extracted features correspond to high-risk artifacts, and wherein each high-risk artifact is determined to be associated with a malware sample based on the automated malware analysis. 17. The computer program product recited in claim 15 , wherein performing automated malware analysis includes performing a dynamic anal
involving long-term monitoring or reporting · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Static detection · CPC title
Test or assess a computer or a system · CPC title
Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.