Malware data clustering

US11848760B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11848760-B2
Application numberUS-202217658893-A
CountryUS
Kind codeB2
Filing dateApr 12, 2022
Priority dateMar 15, 2013
Publication dateDec 19, 2023
Grant dateDec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: one or more computer readable storage devices configured to store a plurality of beaconing malware-related data items; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute computer readable instructions to cause the computer system to: determine, based on at least some of the beaconing malware-related data items, a plurality of connection pairs, each of the connection pairs indicating communications between a particular internal source within an internal network and a particular external destination that is not within the internal network; identify a plurality of connection pairs having a common internal source and a common external destination; generate a time series of the identified plurality of connection pairs; compute a mean of the time series; based on a determination that the mean satisfies a particular threshold, designate a connection pair associated with the time series as a seed, the designated connection pair including the common internal source and the common external source; and generate a data item cluster based on the designated seed. 2. The computer system of claim 1 , wherein the beaconing malware-related data items include at least one of: data items associated with captured communications between an internal network and an external network, users of particular computerized devices, internal Internet Protocol addresses, external Internet Protocol addresses, external domains, internal computerized devices, external computerized devices, data feed items, or host-based events. 3. The computer system of claim 1 , wherein the internal source includes at least one of an Internet Protocol address, a range of Internet Protocol addresses, a network address, a computing device, a group of computing devices, or a domain. 4. The computer system of claim 1 , wherein generating a data item cluster comprises: adding the designated seed to the data item cluster; adding to the data item cluster, based on a clustering strategy, one or more beaconing malware-related data items determined to be associated with the designated seed; and iteratively adding to the cluster, based on the clustering strategy, one or more additional beaconing malware-related data items associated with one or more previously added beaconing malware-related data items. 5. The computer system of claim 4 , wherein generating a data item cluster further comprises: determining the one or more beaconing malware-related data items associated with the designated seed, wherein said determining comprises determining a particular beaconing malware-related data item and the seed are both associated with a common property value. 6. The computer system of claim 4 , wherein generating a data item cluster further comprises: for each particular added beaconing malware-related data item: determining a property value associated with the particular added data item; based on the determined property value, determining additional beaconing malware-related data items having a similar property value; and adding the additional beaconing malware-related data items to the cluster. 7. The computer system of claim 6 , wherein the determined property value includes at least one of a username, a domain, an Internet Protocol address, a computing device identifier, or an event identifier. 8. The computer system of claim 6 , wherein generating a data item cluster further comprises: determining a property value associated with one of the additional beaconing malware-related data items; based on the determined property value associated with the one of the additional beaconing malware-related data items, determining secondary additional beaconing malware-related data items having a similar property value; and adding the secondary additional beaconing malware-related data items to the cluster. 9. The computer system of claim 4 , wherein generating a data item cluster further comprises: for each particular added beaconing malware-related data item: in response to determining that another previously generated cluster includes the same particular beaconing malware-related data item, merging the other previously generated cluster into the cluster. 10. The computer system of claim 1 , wherein the one or more hardware computer processors are configured to execute the computer executable instructions to further cause the computer system to: filter out noise from the time series to generate a filtered time series, wherein the mean is computed for the filtered time series. 11. The computer system of claim 10 , wherein filtering out noise from the time series comprises removing connection pairs determined to have a low probability of being related to beaconing malware, and wherein the low probability is determined by at least one of: a frequency of the connection pair, a time period during which the connection pair has occurred, a connection to a known legitimate external domain, a connection made by known legitimate software. 12. The computer system of claim 1 , wherein the one or more hardware computer processors are configured to execute the computer executable instructions to further cause the computer system to: provide a user interface including: a list of generated clusters, each of the generated clusters in the list selectable by a user; a list of cluster scores associated with a selected one or more of the generated clusters; and a graph including detailed information related to the selected one or more of the cluster scores. 13. A computer-implemented method comprising: by one or more processors executing program instructions: accessing one or more computer readable storage devices configured to store a plurality of beaconing malware-related data items; determining, based on at least some of the beaconing malware-related data items, a plurality of connection pairs, each of the connection pairs indicating communications between a particular internal source within an internal network and a particular external destination that is not within the internal network; identifying a plurality of connection pairs having a common internal source and a common external destination; generating a time series of the identified plurality of connection pairs; computing a mean of the time series; based on a determination that the mean satisfies a particular threshold, designating a connection pair associated with the time series as a seed, the designated connection pair including the common internal source and the common external source; and generating, by the one or more processors, a data item cluster based on the designated seed. 14. The computer-implemented method of claim 13 , wherein generating a data item cluster comprises: adding the designated seed to the data item cluster; adding to the data item cluster, based on a clustering strategy, one or more beaconing malware-related data items determined to be associated with the designated seed; and iteratively adding to the cluster, based on the clustering strategy, one or more additional beaconing malware-related data items associated with one or more previously added beaconing malware-related data items. 15. The computer-implemented method of claim 14 , wherein generating a data item cluster further comprises: for each particular added beaconing malware-related data item: determining a property value associated with the particular added data item; based on the determined property value, determining additional beaconing malware-related data item

Assignees

Inventors

Classifications

  • H04L63/145Primary

    the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title

  • Updating · CPC title

  • Grouping and aggregation · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • using ranking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11848760B2 cover?
In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/145. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).