Bot detection based on divergence and variance

US10243981B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10243981-B2
Application numberUS-201615261429-A
CountryUS
Kind codeB2
Filing dateSep 9, 2016
Priority dateSep 9, 2016
Publication dateMar 26, 2019
Grant dateMar 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system automatically detects bots and/or botnets.

First claim

Opening claim text (preview).

What is claimed is: 1. A machine implemented process for detecting bots, comprising: receiving at an analysis server a log file generated by an application server that documents interaction with clients and network resources; the analysis server grouping data of the log file into current time windows of data; creating a distribution of URLs for each of the current time windows; creating a historical distribution of URLs for data from previous time windows of previous log files; creating a score for each of the current time windows representing a divergence of the distribution of URLs for the respective current time window from the historical distribution of URLs; creating a histogram of the scores; determining a customized portion of the histogram that represents suspicious scores, time windows in the customized portion of the histogram are suspicious time windows; identifying a reduced set of suspicious time windows by creating a request matrix for each suspicious time window, determining a first principal weight for each request matrix and discarding suspicious time windows having a first principal weight that is less than a threshold; the analysis server identifying a subset of IP addresses in the reduced set of suspicious time windows as bots; and the application server blocking the bots from further access to the network resources in response to the identifying. 2. The machine implemented process of claim 1 , wherein: the creating a score comprises performing a variant of Kullback-Leibler divergence. 3. The machine implemented process of claim 1 , wherein: the identifying a subset of IP addresses comprises performing clustering of IP addresses from the reduced set of suspicious time windows. 4. The machine implemented process of claim 1 , further comprising: determining correlations between the subset of IP addresses; determining a botnet based on the determined correlations; and reporting the bots and the botnet. 5. The machine implemented process of claim 1 , wherein the identifying a reduced set of suspicious time windows further comprises: performing principal component analysis on each request matrix, each request matrix has columns representing IP addresses and rows representing requests; computing each remaining IP address' correlation with the first principal component for its respective request matrix; and creating a list of IP addresses ranked by correlation with first principal component. 6. The machine implemented process of claim 5 , wherein the identifying the subset of IP addresses further comprises: creating correlation matrices for request matrices; calculating an average correlation; adding a top IP address which is at the top of the list of IP addresses ranked by correlation to a new cluster; adding other IP addresses from the list of IP addresses ranked by correlation to the new cluster for IP addresses that have a distance from the top IP address that is shorter than a threshold distance based on the average correlation; removing the top IP address and the other IP addresses from list of IP addresses ranked by correlation; and creating additional clusters by repeating the adding a top IP address, adding other IP addresses and removing. 7. The machine implemented process of claim 6 , further comprising: determining botnets from the clusters, comprising: for each pair of IP addresses in a cluster for all clusters, creating a line between IP addresses, and combining pairs of IP addresses to form botnets. 8. The machine implemented process of claim 7 , further comprising: determining a center of a botnet based on number of lines to an IP address. 9. An apparatus, comprising: a communication interface; a storage medium; and a processor connected to the storage medium and the communication interface, the processor is configured to determine suspicious time windows in a plurality of network requests using a variant of Kullback-Leibler divergence for time windows and identify a subset of requesters of network requests in determined suspicious time windows as bots using principal component analysis and clustering, the processor configured to take action to block the bots from accessing network resources in response to the identifying; the processor is configured to determine suspicious time windows in a plurality of networks requests using a variant of Kullback-Leibler divergence by creating a distribution of URLs for individual time windows, creating a score for each of the individual time windows using the variant of Kullback-Leibler divergence from a historical distribution of URLs, creating a histogram of the scores, determining a customized portion of the histogram that represents suspicious scores, and reporting time windows in the customized portion of the histogram as suspicious time windows. 10. The apparatus of claim 9 , wherein: the processor is configured to determine correlations between requesters, determine a botnet based on the determined correlations, determine a center of the botnet and report the bots and the botnet. 11. The apparatus of claim 9 , wherein: the processor is configured to identify the subset of requesters of network requests in determined suspicious time windows as bots by identifying a reduced set of time windows from determined suspicious time windows using principal component analysis including creating a request matrix for each suspicious time window, performing principal component analysis on each request matrix, determining a first principal weight for each request matrix, discarding suspicious time windows having a first principal weight that is less than a threshold, computing each remaining IP address' correlation with the first principal component for its respective request matrix, and creating a list of IP addresses ranked by correlation with first principal component. 12. The apparatus of claim 11 , wherein: the processor is further configured to identify the subset of requesters of network requests in determined suspicious time windows as bots by clustering including creating correlation matrices for each request matrix not discarded, calculating an average correlation, adding a top IP address which is at the top of the list of IP addresses ranked by correlation to a new cluster, adding other IP addresses from the list of IP addresses ranked by correlation to the new cluster for IP addresses that have a distance from the top IP address that is shorted than a threshold distance based on the average correlation, and removing the top IP address and the other IP addresses from list of IP addresses ranked by correlation. 13. The apparatus of claim 12 , wherein: the processor is configured to determine botnets from the cluster by connecting IP addresses pairs of IP addresses and combining connected pairs of IP addresses to form botnets. 14. The apparatus of claim 13 , wherein: the processor is configured to determine a center of a botnet based on number of connections to an IP address. 15. A computer program product comprising: a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising: computer readable program code configured to access a current log file that includes request URLs and associated source IP addresses; computer readable program code configured to group request URLs and associated source IP addresses into time windows, create a distribution of URLs for each of the time windows, and create a historical distribution of URLs for data from previous log files; computer readable program code configure

Assignees

Inventors

Classifications

  • Traffic logging, e.g. anomaly detection · CPC title

  • Countermeasures against malicious traffic (countermeasures against attacks on cryptographic mechanisms H04L9/002) · CPC title

  • Detection or countermeasures against botnets · CPC title

  • by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10243981B2 cover?
A system automatically detects bots and/or botnets.
Who is the assignee on this patent?
Ca Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).