Malware data clustering
US-11336681-B2 · May 17, 2022 · US
US11848760B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11848760-B2 |
| Application number | US-202217658893-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 12, 2022 |
| Priority date | Mar 15, 2013 |
| Publication date | Dec 19, 2023 |
| Grant date | Dec 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.
Opening claim text (preview).
What is claimed is: 1. A computer system comprising: one or more computer readable storage devices configured to store a plurality of beaconing malware-related data items; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute computer readable instructions to cause the computer system to: determine, based on at least some of the beaconing malware-related data items, a plurality of connection pairs, each of the connection pairs indicating communications between a particular internal source within an internal network and a particular external destination that is not within the internal network; identify a plurality of connection pairs having a common internal source and a common external destination; generate a time series of the identified plurality of connection pairs; compute a mean of the time series; based on a determination that the mean satisfies a particular threshold, designate a connection pair associated with the time series as a seed, the designated connection pair including the common internal source and the common external source; and generate a data item cluster based on the designated seed. 2. The computer system of claim 1 , wherein the beaconing malware-related data items include at least one of: data items associated with captured communications between an internal network and an external network, users of particular computerized devices, internal Internet Protocol addresses, external Internet Protocol addresses, external domains, internal computerized devices, external computerized devices, data feed items, or host-based events. 3. The computer system of claim 1 , wherein the internal source includes at least one of an Internet Protocol address, a range of Internet Protocol addresses, a network address, a computing device, a group of computing devices, or a domain. 4. The computer system of claim 1 , wherein generating a data item cluster comprises: adding the designated seed to the data item cluster; adding to the data item cluster, based on a clustering strategy, one or more beaconing malware-related data items determined to be associated with the designated seed; and iteratively adding to the cluster, based on the clustering strategy, one or more additional beaconing malware-related data items associated with one or more previously added beaconing malware-related data items. 5. The computer system of claim 4 , wherein generating a data item cluster further comprises: determining the one or more beaconing malware-related data items associated with the designated seed, wherein said determining comprises determining a particular beaconing malware-related data item and the seed are both associated with a common property value. 6. The computer system of claim 4 , wherein generating a data item cluster further comprises: for each particular added beaconing malware-related data item: determining a property value associated with the particular added data item; based on the determined property value, determining additional beaconing malware-related data items having a similar property value; and adding the additional beaconing malware-related data items to the cluster. 7. The computer system of claim 6 , wherein the determined property value includes at least one of a username, a domain, an Internet Protocol address, a computing device identifier, or an event identifier. 8. The computer system of claim 6 , wherein generating a data item cluster further comprises: determining a property value associated with one of the additional beaconing malware-related data items; based on the determined property value associated with the one of the additional beaconing malware-related data items, determining secondary additional beaconing malware-related data items having a similar property value; and adding the secondary additional beaconing malware-related data items to the cluster. 9. The computer system of claim 4 , wherein generating a data item cluster further comprises: for each particular added beaconing malware-related data item: in response to determining that another previously generated cluster includes the same particular beaconing malware-related data item, merging the other previously generated cluster into the cluster. 10. The computer system of claim 1 , wherein the one or more hardware computer processors are configured to execute the computer executable instructions to further cause the computer system to: filter out noise from the time series to generate a filtered time series, wherein the mean is computed for the filtered time series. 11. The computer system of claim 10 , wherein filtering out noise from the time series comprises removing connection pairs determined to have a low probability of being related to beaconing malware, and wherein the low probability is determined by at least one of: a frequency of the connection pair, a time period during which the connection pair has occurred, a connection to a known legitimate external domain, a connection made by known legitimate software. 12. The computer system of claim 1 , wherein the one or more hardware computer processors are configured to execute the computer executable instructions to further cause the computer system to: provide a user interface including: a list of generated clusters, each of the generated clusters in the list selectable by a user; a list of cluster scores associated with a selected one or more of the generated clusters; and a graph including detailed information related to the selected one or more of the cluster scores. 13. A computer-implemented method comprising: by one or more processors executing program instructions: accessing one or more computer readable storage devices configured to store a plurality of beaconing malware-related data items; determining, based on at least some of the beaconing malware-related data items, a plurality of connection pairs, each of the connection pairs indicating communications between a particular internal source within an internal network and a particular external destination that is not within the internal network; identifying a plurality of connection pairs having a common internal source and a common external destination; generating a time series of the identified plurality of connection pairs; computing a mean of the time series; based on a determination that the mean satisfies a particular threshold, designating a connection pair associated with the time series as a seed, the designated connection pair including the common internal source and the common external source; and generating, by the one or more processors, a data item cluster based on the designated seed. 14. The computer-implemented method of claim 13 , wherein generating a data item cluster comprises: adding the designated seed to the data item cluster; adding to the data item cluster, based on a clustering strategy, one or more beaconing malware-related data items determined to be associated with the designated seed; and iteratively adding to the cluster, based on the clustering strategy, one or more additional beaconing malware-related data items associated with one or more previously added beaconing malware-related data items. 15. The computer-implemented method of claim 14 , wherein generating a data item cluster further comprises: for each particular added beaconing malware-related data item: determining a property value associated with the particular added data item; based on the determined property value, determining additional beaconing malware-related data item
the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title
Updating · CPC title
Grouping and aggregation · CPC title
Query processing support for facilitating data mining operations in structured databases · CPC title
using ranking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.