Machine learning to determine domain reputation, content classification, phishing sites, and command and control sites
US-2021377303-A1 · Dec 2, 2021 · US
US2023114721A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023114721-A1 |
| Application number | US-202117500018-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 13, 2021 |
| Priority date | Oct 13, 2021 |
| Publication date | Apr 13, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for classifying domains to malware families includes identifying a corpus of malicious domains, identifying one or more suspicious domains, extracting a timeframe corresponding to the one or more suspicious domains, calculating a rank coefficient between the one or more suspicious domains and a current seed domain of the corpus of malicious domains, determining whether the rank correlation coefficient exceeds a rank threshold for the one or more suspicious domains, comparing a number of suspicious domains whose correlation coefficients exceed the rank threshold to a relation threshold, and responsive to determining the number of suspicious domains whose correlation coefficients exceed the rank threshold exceeds the relation threshold, applying a tag to the suspicious domains indicating that the one or more suspicious domains correspond to a same malware family as the current seed domain.
Opening claim text (preview).
What is claimed is: 1 . A computer implemented for classifying domains to malware families, the method comprising: identifying a corpus of malicious domains; identifying one or more suspicious domains; extracting a timeframe corresponding to the one or more suspicious domains; calculating a rank coefficient between the one or more suspicious domains and a current seed domain of the corpus of malicious domains; determining whether the rank correlation coefficient exceeds a rank threshold for the one or more suspicious domains; comparing a number of suspicious domains whose correlation coefficients exceed the rank threshold to a relation threshold; and responsive to determining the number of suspicious domains whose correlation coefficients exceed the rank threshold exceeds the relation threshold, applying a tag to the suspicious domains indicating that the one or more suspicious domains correspond to a same malware family as the current seed domain. 2 . The computer implemented method of claim 1 , further comprising incrementing a counter corresponding to a number of times the rank correlation coefficient for a domain exceeds a rank threshold. 3 . The computer implemented method of claim 2 , wherein comparing a number of suspicious domains whose correlation coefficient exceeds the rank threshold to a relation threshold includes comparing a current count corresponding to the counter to the relation threshold. 4 . The computer implemented method of claim 1 , further comprising constructing one or more feature vectors corresponding to the one or more suspicious domains. 5 . The computer implemented method of claim 4 , further comprising clustering the feature vectors. 6 . The computer implemented method of claim 5 , further comprising determining a distance from a suspicious domain's feature vector to one or more cluster centers corresponding to the clustered feature vectors. 7 . The computer implemented method of claim 6 , further comprising determining a cluster center to which the one or more feature vectors are closest. 8 . A computer program product for classifying domains to malware families, the computer program product comprising: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising instructions to: identify a corpus of malicious domains; identify one or more suspicious domains; extract a timeframe corresponding to the one or more suspicious domains; calculate a rank coefficient between the one or more suspicious domains and a current seed domain of the corpus of malicious domains; determine whether the rank correlation coefficient exceeds a rank threshold for the one or more suspicious domains; compare a number of suspicious domains whose correlation coefficients exceed the rank threshold to a relation threshold; and responsive to determining the number of suspicious domains whose correlation coefficients exceed the rank threshold exceeds the relation threshold, apply a tag to the suspicious domains indicating that the one or more suspicious domains correspond to a same malware family as the current seed domain. 9 . The computer program product of claim 8 , further comprising instructions to increment a counter corresponding to a number of times the rank correlation coefficient for a domain exceeds a rank threshold. 10 . The computer program product of claim 9 , wherein comparing a number of suspicious domains whose correlation coefficient exceeds the rank threshold to a relation threshold includes comparing a current count corresponding to the counter to the relation threshold. 11 . The computer program product of claim 8 , further comprising instructions to construct one or more feature vectors corresponding to the one or more suspicious domains. 12 . The computer program product of claim 11 , further comprising instructions to cluster the one or more feature vectors. 13 . The computer program product of claim 12 , further comprising instructions to determine a distance from a suspicious domain's feature vector to one or more cluster centers corresponding to the clustered feature vectors. 14 . The computer program product of claim 13 , further comprising instructions to determine a cluster center to which the one or more feature vectors are closest. 15 . A computer system for, the computer system comprising: one or more computer processors; one or more computer-readable storage media; program instructions stored on the computer-readable storage media for execution by at least one of the one or more processors, the program instructions comprising instructions to: identify a corpus of malicious domains; identify one or more suspicious domains; extract a timeframe corresponding to the one or more suspicious domains; calculate a rank coefficient between the one or more suspicious domains and a current seed domain of the corpus of malicious domains; determine whether the rank correlation coefficient exceeds a rank threshold for the one or more suspicious domains; compare a number of suspicious domains whose correlation coefficients exceed the rank threshold to a relation threshold; and responsive to determining the number of suspicious domains whose correlation coefficients exceed the rank threshold exceeds the relation threshold, apply a tag to the suspicious domains indicating that the one or more suspicious domains correspond to a same malware family as the current seed domain. 16 . The computer system of claim 15 , further comprising instructions to increment a counter corresponding to a number of times the rank correlation coefficient for a domain exceeds a rank threshold. 17 . The computer system of claim 15 , further comprising instructions to construct one or more feature vectors corresponding to the one or more suspicious domains. 18 . The computer system of claim 17 , further comprising instructions to cluster the one or more feature vectors. 19 . The computer system of claim 18 , further comprising instructions to determine a distance from a suspicious domain's feature vector to one or more cluster centers corresponding to the clustered feature vectors. 20 . The computer system of claim 19 , further comprising instructions to determine a cluster center to which the one or more feature vectors are closest.
Traffic logging, e.g. anomaly detection · CPC title
Clustering techniques · CPC title
for managing network security; network security policies in general (filtering policies H04L63/0227) · CPC title
the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms · CPC title
Event detection, e.g. attack signature detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.