Fuzzy hash of behavioral results

US9294501B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9294501-B2
Application numberUS-201314042454-A
CountryUS
Kind codeB2
Filing dateSep 30, 2013
Priority dateSep 30, 2013
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computerized method is described in which a received object is analyzed by a malicious content detection (MCD) system to determine whether the object is malware or non-malware. The analysis may include the generation of a fuzzy hash based on a collection of behaviors for the received object. The fuzzy hash may be used by the MCD system to determine the similarity of the received object with one or more objects in previously classified/analyzed clusters. Upon detection of a “similar” object, the suspect object may be associated with the cluster and classified based on information attached to the cluster. This similarity matching provides 1) greater flexibility in analyzing potential malware objects, which may share multiple characteristics and behaviors but are also slightly different from previously classified objects and 2) a more efficient technique for classifying/assigning attributes to objects.

First claim

Opening claim text (preview).

What is claimed: 1. A computerized method for classifying objects in a malware system, comprising: receiving, by a malicious content detection (MCD) system from a client device, an object to be classified; detecting behaviors of the received object, wherein the behaviors are detected after processing the received object; generating a fuzzy hash for the received object based on the detected behaviors, the generating of the fuzzy hash comprises (i) obtaining a reduced amount of data associated with the detected behaviors by retaining a portion of the data associated with the detected behaviors that corresponds to one or more operations conducted during processing of the received object, and removing metadata associated with the one or more operations conducted during the processing of the received object, the metadata including at least one or more identifiers of processes called during the processing of the received object, and (ii) performing a hash operation on the reduced amount of data associated with the detected behaviors; comparing the fuzzy hash for the received object with a fuzzy hash of an object in a preexisting cluster to generate a similarity measure; associating the received object with the preexisting cluster in response to determining that the similarity measure is above a predefined threshold value; creating a new cluster for the received object in response to determining that the similarity measure is below the predefined threshold value; and reporting, by the MCD system, results of either (i) the associating of the received object with the preexisting cluster or (ii) the creating of the new cluster. 2. The computerized method of claim 1 , wherein the received object is at least one of a file, a uniform resource locator, a web object, a capture of network traffic for a user over time, and an email message. 3. The computerized method of claim 1 , wherein the removed metadata associated with the corresponding operations includes metadata associated with one or more of (1) network calls, (2) modifications to a registry, (3) modifications to a file system, or (4) an application program interface call. 4. The computerized method of claim 1 , further comprising: generating a preliminary malware score for the received object based on a comparison of the reduced amount of data associated with the detected behaviors with data associated with known malware behaviors, wherein the preliminary malware score indicates the probability the received object is malware; and generating a final malware score for the received object based on the cluster the received object is associated, wherein the final malware score is greater than the preliminary malware score when the received object is associated with a cluster of objects classified as malware and the final malware score is less than the preliminary malware score when the received object is associated with a cluster of objects classified as non-malware. 5. The computerized method of claim 1 , wherein the removing of the metadata associated with the one or more operations comprises removing data that does not identify the received object. 6. The computerized method of claim 5 , wherein the removing of the metadata further comprises removing at least a portion of values written to a registry by the received object. 7. The computerized method of claim 1 , further comprising: transmitting, by the MCD system, the new cluster or the preexisting cluster with the newly associated received object to another MCD system. 8. The computerized method of claim 1 , further comprising: classifying the received object as malware, non-malware, or with an unknown status to match a classification of the preexisting cluster, when the received object is assigned to the preexisting cluster. 9. The computerized method of claim 1 , further comprising: assigning a malware family name to the received object to match a malware family name of the preexisting cluster, when the received object is assigned to the preexisting cluster. 10. The computerized method of claim 1 , wherein the generating of the fuzzy hash further comprises at least one of (a) retaining one or more image paths in an associated file system corresponding to a location of a file that is generated or modified during the processing of the received object or (b) removing a file name prior to performing the hash operation on the data associated with the detected behaviors. 11. The computerized method of claim 1 , wherein the generating of the fuzzy hash comprises retaining only the one or more image paths corresponding to operations conducted during processing of the received object as part of the data associated with the detected behaviors. 12. A non-transitory storage medium including instructions that, when executed by one or more hardware processors, performs a plurality of operations, comprising: detecting behaviors of a received object, wherein the behaviors are detected after processing the received object; generating a fuzzy hash for the received object based on the detected behaviors, the generating of the fuzzy hash comprises (i) obtaining a reduced amount of data associated with the detected behaviors by retaining a portion of the data associated with the detected behaviors that corresponds to one or more operations conducted during processing of the received object, and removing metadata associated with the one or more operations conducted during the processing of the received object, the metadata including at least one or more identifiers of processes called during the processing of the received object metadata, and (ii) performing a hash operation on the reduced amount of data associated with the detected behaviors; comparing the fuzzy hash for the received object with a fuzzy hash of an object in a preexisting cluster to generate a similarity measure; associating the received object with the preexisting cluster in response to determining that the similarity measure is above a predefined threshold value; creating a new cluster for the received object in response to determining that the similarity measure is below the predefined threshold value; and reporting results of either (i) the associating of the received object with the preexisting cluster or (ii) the creating of the new cluster. 13. The non-transitory storage medium of claim 12 , wherein the received object is one of a file, a uniform resource locator, a web object, a capture of network traffic for a user over time, and an email message. 14. The non-transitory storage medium of claim 12 , wherein the removed metadata associated with the one or more operations includes metadata associated with one or more of (1) network calls, (2) modifications to a registry, (3) modifications to a file system, or (4) an application program interface call. 15. The non-transitory storage medium of claim 12 further includes instructions that, when executed by the one or more hardware processors, perform a plurality of operations comprising: generating a preliminary malware score for the received object based on a comparison of the reduced amount of data associated with the detected behaviors with data associated with known malware behaviors, wherein the preliminary malware score indicates the probability the received object is malware; and generating a final malware score for the received object based on the cluster the received object is associated, wherein the final malware score is greater than the preliminary malware score when the received object is associated with a cluster of objects classified as malware and the final malware score is less than the pr

Assignees

Inventors

Classifications

  • G06F21/566Primary

    Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities · CPC title

  • H04L63/145Primary

    the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title

  • Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

  • by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title

  • Test or assess a computer or a system · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9294501B2 cover?
A computerized method is described in which a received object is analyzed by a malicious content detection (MCD) system to determine whether the object is malware or non-malware. The analysis may include the generation of a fuzzy hash based on a collection of behaviors for the received object. The fuzzy hash may be used by the MCD system to determine the similarity of the received object with o…
Who is the assignee on this patent?
Fireeye Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/566. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).