Automatic generation of generic file signatures

US9762593B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9762593-B1
Application numberUS-201414481763-A
CountryUS
Kind codeB1
Filing dateSep 9, 2014
Priority dateSep 9, 2014
Publication dateSep 12, 2017
Grant dateSep 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods to automatically generate signatures used to detect malware are provided. The systems and methods use machine learning techniques, to build an over-trained heuristic model to analyze software, cluster identified patterns, validate the clusters against known reputational metrics, automatically create signatures and, in some examples, deploy such signatures to remote computing devices.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for automatically generating signatures for detecting malware, comprising: collecting a set of static attributes from a malware dataset and a goodware dataset; generating a plurality of decision trees from the set of static attributes, wherein each decision tree in the plurality of decision trees comprises a plurality of terminal nodes; identifying, for each sample in a known-file dataset, a pattern of terminal nodes to which the sample is mapped by the plurality of decision trees, wherein the pattern of terminal nodes of the sample comprises a representation of a terminal node from each decision tree within the plurality of decision trees to which the sample has been mapped; generating a cluster of samples comprising samples in the known file dataset that have identical patterns of terminal nodes; validating the cluster of samples against a reputation value range to determine a purity of the cluster of samples; and generating, based at least in part on the purity of the cluster of samples, a signature for identifying additional files that are similar to the samples in the cluster of samples. 2. The method for automatically generating signatures for detecting malware according to claim 1 , further comprising: detecting a malicious file that satisfies the signature; and performing, in response to detecting the malicious file, a security action on the malicious file. 3. The method for automatically generating signatures for detecting malware according to claim 1 , wherein generating the plurality of decision trees from the set of static attributes comprises over-training the plurality of decision trees without restricting the smallest allowable size of nodes within the plurality of decision trees. 4. The method for automatically generating signatures for detecting malware according to claim 1 , wherein: the known file dataset comprises a plurality of files known to be malicious; validating the cluster of samples against the reputation value range to determine the purity of the cluster of samples comprises determining that the cluster of samples is a bad cluster. 5. The method for automatically generating signatures for detecting malware according to claim 1 , wherein: the known file dataset comprises a plurality of files known to be benign; validating the cluster of samples against the reputation value range to determine the purity of the cluster of samples comprises determining that the cluster of samples is a good cluster. 6. The method for automatically generating signatures for detecting malware according to claim 1 , wherein: the known file dataset comprises a plurality of files known to be malicious; validating the cluster of samples against the reputation value range to determine the purity of the cluster of samples comprises determining that the cluster of samples is a suspected bad cluster. 7. The method for automatically generating signatures for detecting malware according to claim 1 , wherein: the known file dataset comprises a plurality of files known to be benign; validating the cluster of samples against the reputation value range to determine the purity of the cluster of samples comprises determining that the cluster of samples is a suspected good cluster. 8. A system to automatically generate signatures used to detect malware, comprising: an attribute collection module, stored in memory, that collects a set of static attributes from a malware dataset and a goodware dataset; a heuristic module, stored in memory, that generates a plurality of decision trees from the set of static attributes, wherein each decision tree in the plurality of decision trees comprises a plurality of terminal nodes; a clustering module, stored in memory, that: identifies, for each sample in a known-file dataset, a pattern of terminal nodes to which the sample is mapped by the plurality of decision trees, wherein the pattern of terminal nodes of the sample comprises a representation of a terminal node from each decision tree within the plurality of decision trees to which the sample has been mapped; and generates a cluster of samples comprising samples in the known file dataset that have identical patterns of terminal nodes; a cluster validation module, stored in memory, that validates the cluster of samples against a reputation value range to determine a purity of the cluster of samples; a signature creation module, stored in memory, that creates, based at least in part on the purity of the cluster of samples, a signature for identifying additional files that are similar to the samples in the cluster of samples; and at least one physical processor that executes the attribute collection module, the heuristic module, the clustering module, the cluster validation module, and the signature creation module. 9. The system according to claim 8 , further comprising a security module that: detects a malicious file that satisfies the signature; and performs, in response to detecting the malicious file, a security action on the malicious file. 10. The system according to claim 8 , wherein the heuristic module generates the plurality of decision trees from the set of static attributes by over-training the plurality of decision trees without restricting the smallest allowable size of nodes within the plurality of decision trees. 11. The system according to claim 8 , wherein: the known file dataset comprises a plurality of files known to be malicious; the cluster validation module validates the cluster of samples against the reputation value range to determine the purity of the cluster of samples by determining that the cluster of samples is a bad cluster. 12. The system according to claim 8 , wherein: the known file dataset comprises a plurality of files known to be benign; the cluster validation module validates the cluster of samples against the reputation value range to determine the purity of the cluster of samples by determining that the cluster of samples is a good cluster. 13. The system according to claim 8 , wherein: the known file dataset comprises a plurality of files known to be malicious; the cluster validation module validates the cluster of samples against the reputation value range to determine the purity of the cluster of samples by determining that the cluster of samples is a suspected bad cluster. 14. The system according to claim 8 , wherein: the known file dataset comprises a plurality of files known to be benign; the cluster validation module validates the cluster of samples against the reputation value range to determine the purity of the cluster of samples by determining that the cluster of samples is a suspected good cluster. 15. A non-transitory computer-readable medium comprising computer executable instructions that when executed by at least one processor of a computing device, cause the computing device to: collect a set of static attributes from a malware dataset and a goodware dataset; generate a plurality of decision trees from the set of static attributes, wherein each decision tree in the plurality of decision trees comprises a plurality of terminal nodes; identify, for each sample in a known-file dataset, a pattern of terminal nodes to which the sample is mapped by the plurality of decision trees, wherein the patter of terminal nodes of the sample comprises a representation of a terminal node from each decision tree within the plurality of decision trees to which the sample has been mapped; generate a cluster of samples comprising samples in the known file dataset that have identical patterns of terminal nodes; valida

Assignees

Inventors

Classifications

  • Event detection, e.g. attack signature detection · CPC title

  • at the network layer · CPC title

  • the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title

  • Protecting data integrity, e.g. using checksums, certificates or signatures · CPC title

  • Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9762593B1 cover?
Systems and methods to automatically generate signatures used to detect malware are provided. The systems and methods use machine learning techniques, to build an over-trained heuristic model to analyze software, cluster identified patterns, validate the clusters against known reputational metrics, automatically create signatures and, in some examples, deploy such signatures to remote computing…
Who is the assignee on this patent?
Symantec Corp
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).