Training a machine learning-based traffic analyzer using a prototype dataset
US-2021357815-A1 · Nov 18, 2021 · US
US12244479B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12244479-B2 |
| Application number | US-202318520915-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 28, 2023 |
| Priority date | Jan 28, 2022 |
| Publication date | Mar 4, 2025 |
| Grant date | Mar 4, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An anomalous behavior detector has been designed to detect novel behavioral changes of devices based on network traffic data that likely correlate to anomalous behaviors. The anomalous behavior detector uses the local outlier factor (LOF) algorithm with novelty detection. After initial semi-supervised training with a single class training dataset representing stable device behaviors, the obtained model continues learning frontiers that delimit subspaces of inlier observations with live network traffic data. Instead of traffic variables being used as features, the features that form feature vectors are similarities of network traffic variable values across time intervals. A feature vector for the anomalous behavior detector represents stability or similarity of network traffic variables that have been chosen as device identifiers and behavioral indicators.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: for each of a set of one or more device identifiers indicated in network traffic, determining similarity measurements for variables across time intervals of the network traffic, wherein the variables are variables previously identified as correlating to device behavior and device identity; for each set of similarity measurements determined for each device identifier, generating a feature vector with the set of similarity measurements; inputting the feature vector into a local outlier factor with novelty detection model that was trained based on network traffic constrained to devices with stable behavior; and indicating detection of an anomaly if the local outlier factor with novelty detection indicates an outlier. 2. The method of claim 1 , further comprising training a local outlier factor with novelty detection learner with a training dataset constrained to network traffic of devices with stable behavior. 3. The method of claim 1 , wherein determining the similarity measurements comprises computing, for each of the set of device identifiers, a similarity measurement for one or more of the variables across the time intervals corresponding to the device identifier. 4. The method of claim 3 , wherein computing the similarity measurement for one or more of the variables comprises computing one of a distance and a similarity coefficient. 5. The method of claim 1 , further comprising extracting values of the variables from packet capture data for each time interval and aggregating the values by device identifier and time interval. 6. The method of claim 1 , further comprising aggregating values of each variable with multiple values within a time interval into a set of values, wherein determining similarity measurements of variables across time intervals comprises determining similarity measurements between sets of values across the time intervals. 7. The method of claim 1 , further comprising: determining that a listing of variables to monitor indicates a set of variables to aggregate within a time interval; and for each of the set of variables detected in a time interval, aggregating values of the set of variables within a time interval into a set of values, wherein determining similarity measurements of variables across time intervals comprises determining similarity measurements between sets of values across the time intervals. 8. A non-transitory, computer-readable medium having program code stored thereon, the program code comprising instructions to: train a local outlier factor with novelty detection learner with a training dataset constrained to observations of stable device behaviors, wherein the observations are extracted from network traffic and correspond to variables that have been selected as representing device behavior; and analyze network traffic to determine anomalous device behavior using a model obtained from the training, wherein the instructions to analyze the network traffic comprise instructions to, for each of a set of one or more device identifiers indicated in network traffic, determine similarity measurements for the variables across time intervals of the network traffic; generate a first feature vector for a first of the device identifiers with corresponding ones of the similarity measurements; and indicate anomalous behavior corresponding to the first device identifier based, at least in part, on whether the model indicates the first feature vector as an outlier. 9. The non-transitory, computer-readable medium of claim 8 , wherein the training dataset comprises observations of devices that communicate at a hardware layer of a network communications stack. 10. The non-transitory, computer-readable medium of claim 8 , wherein the variables comprise one or more static variables and a plurality of dynamic variables. 11. The non-transitory, computer-readable medium of claim 10 , wherein the static variables at least comprise network address and the dynamic variables comprise variables indicating at least one of an application, a service, a domain, a product, and an agent. 12. The non-transitory, computer-readable medium of claim 8 , wherein the instructions to determine the similarity measurements comprise instructions to compute, for each of the set of device identifiers, a similarity measurement for each variable across the time intervals corresponding to the device identifier. 13. The non-transitory, computer-readable medium of claim 12 , wherein the instructions to compute the similarity measurement for each variable comprise instructions to compute one of a distance and a similarity coefficient for each variable based on values of the variable across the time intervals. 14. The non-transitory, computer-readable medium of claim 8 , wherein the instructions to analyze the network traffic further comprise instructions to extract, for each time interval, values of the variables from packet capture data for the time interval and aggregate the values by device identifiers. 15. The non-transitory, computer-readable medium of claim 8 , wherein the program code further comprises instructions to aggregate values of each variable with multiple values within a time interval into a set of values, wherein the instructions to determine similarity measurements of variables across time intervals comprise instructions to determine similarity measurements between sets of values across the time intervals. 16. The non-transitory, computer-readable medium of claim 8 , wherein the program code further comprises instructions to: determine that a listing of variables to monitor indicates a set of variables to aggregate within a time interval; and for each of the set of variables detected in a time interval, aggregate values of the set of variables within a time interval into a set of values, wherein the instructions to determine similarity measurements of variables across time intervals comprise instructions to determine similarity measurements between the sets of values across the time intervals. 17. An apparatus comprising: a processor; and a machine-readable medium having program code stored thereon, the program code executable by the processor to cause the apparatus to, train a local outlier factor with novelty detection learner with a training dataset constrained to observations of stable device behaviors, wherein the observations are extracted from network traffic and correspond to variables that have been selected as representing device behavior; and analyze network traffic to determine anomalous device behavior using a model obtained from the training, wherein the instructions to analyze the network traffic comprise instructions executable by the processor to cause the apparatus to, for each of a set of one or more device identifiers indicated in the network traffic, determine similarity measurements for the variables across time intervals of the network traffic; generate a first feature vector for a first of the device identifiers with corresponding ones of the similarity measurements; and indicate anomalous behavior corresponding to the first device identifier based, at least in part, on whether the model indicates the first feature vector as an outlier. 18. The apparatus of claim 17 , wherein the training dataset comprises observations of devices that communicate at a hardware layer of a network communications stack. 19. The apparatus of claim 17 , wherein the variables comprise one or more static variables and a plurality of dynamic variables, wherein the
Machine learning · CPC title
Network utilisation, e.g. volume of load or congestion level · CPC title
Traffic logging, e.g. anomaly detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.