Systems and methods for identifying associations between malware samples

US9405905B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9405905-B2
Application numberUS-201414524325-A
CountryUS
Kind codeB2
Filing dateOct 27, 2014
Priority dateAug 18, 2011
Publication dateAug 2, 2016
Grant dateAug 2, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for identifying associations between binary samples, such as e-mail files and their attachments or a document and an executable program associated with the document. In one implementation, the method includes receiving a plurality of binary samples, and extracting metadata from the plurality of binary samples. The metadata for a binary sample from the plurality of binary samples includes a set of attributes of the binary sample. The method further includes identifying a set of associations between the plurality of binary samples based on the extracted metadata. Each association is characterized by at least one attribute the associated binary samples have in common, and each association has a confidence level indicative of a strength of the association. The method also includes identifying associations with a confidence level that exceeds a predefined threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, performed by a processor, for identifying associations between binary samples, comprising: receiving a plurality of binary samples; determining one or more file types associated with the plurality of binary samples; extracting type-specific metadata from the plurality of binary samples, the type-specific metadata for a binary sample from the plurality of binary samples including a set of attributes of the binary sample that are unique for a file type associated with the binary sample; identifying a set of associations between the plurality of binary samples based on the extracted metadata, each association characterized by at least one attribute in the set of attributes that the associated binary samples have in common; receiving a reference sample corresponding to a known malware sample; identifying that the reference sample is associated with at least one binary sample among the plurality of binary samples; generating data corresponding to a malware alert in response to identifying that the reference sample is associated with the at least one binary sample; communicating the data to a front-end system; and generating, at the front-end system, a display corresponding to the malware alert using the data. 2. The method of claim 1 , wherein the set of associations include at least one of a parent-child association, an email source association, or a command and control association. 3. The method of claim 1 , wherein the set of attributes include a Hash of the binary sample, and wherein the set of associations include at least one of a binary level fuzzy Hashing association or a string set fuzzy Hashing association. 4. The method of claim 1 , further comprising: identifying multiple associations between two binary samples, the multiple associations having respective individual confidence levels; and generating a cumulative confidence level for the two binary samples, by adding up the individual confidence levels of the multiple associations. 5. The method of claim 1 , further comprising: identifying that at least one binary sample among the plurality of binary samples is associated with a malware sample; and generating data corresponding to a malware alert indicating that the at least one binary sample is associated with the malware sample. 6. A system for identifying associations between binary samples, comprising: a controller configured to cause a processor to: receive a plurality of binary samples from one or more sample providers; and receive a reference sample corresponding to a known malware sample; one or more processing nodes each configured to: determine one or more file types associated with the plurality of binary samples; extract type-specific metadata from the plurality of binary samples, the type-specific metadata for a binary sample from the plurality of binary samples including a set of attributes of the binary sample that are unique for a file type associated with the binary sample; identify a set of associations between the plurality of binary samples based on the extracted metadata, each association characterized by at least one attribute in the set of attributes that the associated binary samples have in common; identify that the reference sample is associated with at least one binary sample among the plurality of binary samples; generate data corresponding to a malware alert in response to identifying that the reference sample is associated with the at least one binary sample; and communicate the data to a front-end system; and a front-end system configured to cause a processor to generate a display corresponding to the malware alert using the data. 7. The system of claim 6 , wherein the set of associations include at least one of a parent-child association, an email source association, or a command and control association. 8. The system of claim 6 , wherein the set of attributes include a Hash of the binary sample, and wherein the set of associations include at least one of a binary level fuzzy Hashing association or a string set fuzzy Hashing association. 9. The system of claim 6 , wherein the one or more processing nodes are each further configured to: identify multiple associations between two binary samples, the multiple associations having respective individual confidence levels; and generate a cumulative confidence level for the two binary samples, by adding up the individual confidence levels of the multiple associations. 10. The system of claim 6 , further comprising a storage device configured to store the plurality of binary samples, the metadata of the plurality of binary samples, the set of associations identified between the plurality of binary samples, and a respective confidence level for each association in the set of the associations, wherein the respective confidence level for each association in the set of associations is indicative of a strength of the association. 11. The system of claim 9 , wherein the front-end system is further configured to: receive a selection of a subset of binary samples; retrieve the subset of binary samples and the corresponding associations between the subset of binary samples from the storage device; and generate data used to display the retrieved associations between the subset of binary samples. 12. The system of claim 6 , wherein the controller is configured to distribute the plurality of binary samples among the one or more processing nodes through a first in, first out (FIFO) processing queue, for extracting the metadata from the plurality of binary samples. 13. The system of claim 12 , wherein the controller is configured to redistribute the plurality of binary samples among the one or more processing nodes based on availability of the one or more processing nodes, for identifying the set of associations between the plurality of binary samples. 14. The system of claim 6 , wherein the controller is further configured to receive a malware sample, and wherein the one or more processing nodes are each further configured to: identify that at least one binary sample among the plurality of binary samples is associated with the malware sample; and generate data corresponding to a malware alert indicating that the at least one binary sample is associated with the malware sample. 15. A non-transitory computer-readable medium having stored thereon instructions that, when executed by a processor, performs a method for identifying associations between binary samples, the method comprising: receiving a plurality of binary samples; determining one or more file types associated with the plurality of binary samples; extracting type-specific metadata from the plurality of binary samples, the type-specific metadata for a binary sample from the plurality of binary samples including a set of attributes of the binary sample that are unique for a file type associated with the binary sample; identifying a set of associations between the plurality of binary samples based on the extracted metadata, each association characterized by at least one attribute in the set of attributes that the associated binary samples have in common; receiving a reference sample corresponding to a known malware sample; identifying that the reference sample is associated with at least one binary sample among the plurality of binary samples; generating data corresponding to a malware alert in response to identifying that the reference sample is associated with the at least one binary sample; communicating the data to a front-end system; and generating, at the front-end system, a display corresponding to th

Assignees

Inventors

Classifications

  • Presentation of query results · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • by virus signature recognition · CPC title

  • Clustering; Classification · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9405905B2 cover?
Systems and methods are disclosed for identifying associations between binary samples, such as e-mail files and their attachments or a document and an executable program associated with the document. In one implementation, the method includes receiving a plurality of binary samples, and extracting metadata from the plurality of binary samples. The metadata for a binary sample from the plurality…
Who is the assignee on this patent?
Verisign Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/565. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 02 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).