Sinkholing bad network domains by registering the bad network domains on the internet
US-9405903-B1 · Aug 2, 2016 · US
US9756063B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9756063-B1 |
| Application number | US-201414553879-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 25, 2014 |
| Priority date | Nov 25, 2014 |
| Publication date | Sep 5, 2017 |
| Grant date | Sep 5, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Host name raw data from access logs of computers is grouped into distinct groups. At least one feature, an alphanumeric or alphabetic-only digest, is extracted from each group and its characters are ordered depending upon their frequency of use. Sampling is performed upon host names from a database of known normal host names to generate groups of randomly selected host names. Similar digests are also extracted from these groups. The digest from the raw data is compared to each of the digests from the normal host names using a string matching algorithm to determine a value. If the value is above a threshold then it is likely that the host names from the raw data group are domain-generated. The suspect host names are used to reference the raw data access log in order to determine which user computers have accessed these host names and these user computers are alerted.
Opening claim text (preview).
I claim: 1. A method of detecting host names generated by a domain generation algorithm, said method comprising: grouping a suspect set of host names obtained from a raw access log of an endpoint computer into a plurality of distinct suspect groups by at least one of a destination IP address and a sub-parent domain, wherein said raw access log reflects Web sites accessed by said endpoint computer over an access period of time and identifies said endpoint computer; extracting, from one of said suspect groups, a suspect alphanumeric digest string in which characters are ordered by frequency of use within said one suspect group; grouping a normal set of host names known to not have been generated randomly into a plurality of distinct normal groups wherein said host names in said normal set were generated by humans; for each of said normal groups, extracting a normal alphanumeric digest string in which characters are ordered by frequency of use within said each normal group; calculating a distance measure between said suspect alphanumeric digest string and said normal alphanumeric digest strings from said normal groups; determining that said one suspect group includes host names generated by a domain generation algorithm, indicative of an opportunity for the endpoint computer to be compromised by malicious software, when said distance measure is above a threshold; identifying said endpoint computer as having accessed host names of said one suspect group; and determining that said endpoint computer has accessed at least a predetermined number of host names from said one suspect group in a predetermined time period and outputting an indication that said endpoint computer has been compromised by said malicious software. 2. The method as recited in claim 1 wherein said suspect alphanumeric digest strings and said normal alphanumeric digest strings do not include numerals. 3. The method as recited in claim 1 further comprising: for each of said suspect groups, extracting a suspect alphabetic digest string that does not include numerals and in which characters are ordered by frequency of use within said each suspect group; for each of said normal groups, extracting a normal alphabetic digest string that does not include numerals and in which characters are ordered by frequency of use within said each normal group; and calculating an alphabetic distance measure between a suspect alphabetic digest string from one of said suspect groups and said normal alphabetic digest strings from said normal groups. 4. A method of detecting host names generated by a domain generation algorithm, said method comprising: grouping a suspect set of host names obtained from raw access logs from a plurality of computers into a plurality of distinct suspect groups by at least one of a destination IP address and a sub-parent domain, wherein each of said computers is an endpoint computer and wherein each of said raw access log reflects Web sites accessed by said endpoint computers over an access period of time and identifies said endpoint computers; for each of said suspect groups, extracting a suspect alphanumeric digest string in which characters are ordered by frequency of use within said each suspect group; grouping a normal set of host names known to not have been generated randomly into a plurality of distinct normal groups; for each of said normal groups, extracting a normal alphanumeric digest string in which characters are ordered by frequency of use within said each normal group; calculating a distance measure between a suspect alphanumeric digest string from one of said suspect groups and said normal alphanumeric digest strings from said normal groups; determining that said one suspect group includes host names generated by a domain generation algorithm when said distance measure is above a threshold; identifying one of said computers as having accessed host names of said one suspect group; and determining that one of said computers has accessed at least a predetermined number of host names from said one suspect group in a predetermined time period and outputting an indication that said one computer has been compromised by malicious software. 5. The method as recited in claim 4 , further comprising: cross-referencing said at least one host name generated by a domain generation algorithm with said raw access logs in order to output an identification of one of said computers that has accessed said at least one host name. 6. The method as recited in claim 4 wherein said suspect alphanumeric digest strings and said normal alphanumeric digest strings do not include numerals. 7. The method as recited in claim 4 further comprising: for each of said suspect groups, extracting a suspect alphabetic digest string that does not include numerals and in which characters are ordered by frequency of use within said each suspect group; for each of said normal groups, extracting a normal alphabetic digest string that does not include numerals and in which characters are ordered by frequency of use within said each normal group; and calculating an alphabetic distance measure between a suspect alphabetic digest string from one of said suspect groups and said normal alphabetic digest strings from said normal groups. 8. A method of detecting host names generated by a domain generation algorithm, said method comprising: accessing sample groups of host names from a database of host names, with each of said host names known to not have been generated randomly such that the host names represent a candidate data set of non-malicious host names; for each of said sample groups, extracting a normal alphanumeric digest string in which characters are ordered by frequency of use within said each normal group; grouping a suspect set of host names obtained from a raw access log of a computer into a plurality of distinct suspect groups, wherein said computer is an endpoint computer and wherein said raw access log reflects Web sites accessed by said endpoint computer over an access period of time and identifies said endpoint computer; extracting, from one of said suspect groups, a suspect alphanumeric digest string in which characters are ordered by frequency of use within said suspect group; calculating a distance measure between said suspect alphanumeric digest string and said normal alphanumeric digest strings from said sample groups; determining that said one suspect group includes host names generated by a domain generation algorithm when said distance measure is above a threshold; identifying said computer as having accessed host names of said one suspect group using said raw access log of said endpoint computer; and determining that said computer has accessed at least a predetermined number of host names from said one suspect group in a predetermined time period and outputting an indication that said computer has been compromised by malicious software. 9. The method as recited in claim 8 wherein said suspect alphanumeric digest strings and said normal alphanumeric digest strings do not include numerals. 10. The method as recited in claim 8 further comprising: grouping said suspect set of host names by an IP address of each of said host names or by a sub-parent domain of each of said host names. 11. The method as recited in claim 8 further comprising: for each of said suspect groups, extracting a suspect alphabetic digest string that does not include numerals and in which characters are ordered by frequency of use within said each suspect group; for each of said sample groups, extracting a normal alphabetic digest string that does not include numerals and in which characters are ordered by frequency of use
Processing captured monitoring data, e.g. for logfile generation · CPC title
Traffic logging, e.g. anomaly detection · CPC title
involving long-term monitoring or reporting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.