MDL-based clustering for application dependency mapping
US-10326672-B2 · Jun 18, 2019 · US
US2021357815A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021357815-A1 |
| Application number | US-202117386020-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 27, 2021 |
| Priority date | Jan 5, 2017 |
| Publication date | Nov 18, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a device in a network generates a feature vector based on traffic flow data regarding one or more traffic flows in the network. The device makes a determination as to whether the generated feature vector is already represented in a training dataset dictionary by one or more feature vectors in the dictionary. The device updates the training dataset dictionary based on the determination by one of: adding the generated feature vector to the dictionary when the generated feature vector is not already represented by one or more feature vectors in the dictionary, or incrementing a count associated with a particular feature vector in the dictionary when the generated feature vector is already represented by the particular feature vector in the dictionary. The device generates a training dataset based on the training dataset dictionary for training a machine learning-based traffic flow analyzer.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: generating, by a device in a network, a feature vector based on traffic flow data regarding one or more traffic flows in the network; making, by the device, a determination as to whether the feature vector is already represented in a training dataset dictionary by one or more feature vectors in the training dataset dictionary; updating, by the device, the training dataset dictionary based on the determination as to whether the feature vector is already represented in the training dataset dictionary by one or more feature vectors in the training dataset dictionary, wherein updating the training dataset dictionary comprises one of: adding the feature vector to the training dataset dictionary when the feature vector is not already represented by one or more feature vectors in the training dataset dictionary, or incrementing a count associated with a particular feature vector in the training dataset dictionary when the feature vector is already represented by the particular feature vector in the training dataset dictionary; and generating, by the device, a training dataset based on the training dataset dictionary for training a machine learning-based traffic flow analyzer. 2 . The method as in claim 1 , wherein the training dataset comprises a plurality of labels, and wherein the machine learning-based traffic flow analyzer comprises a machine learning-based traffic flow classifier. 3 . The method as in claim 1 , wherein the traffic flow data comprises header information for one or more encrypted traffic flows. 4 . The method as in claim 1 , wherein adding the feature vector to the training dataset dictionary comprises: initializing, by the device, a count associated with the feature vector. 5 . The method as in claim 1 , wherein making the determination as to whether the feature vector is already represented in the training dataset dictionary comprises: computing, by the device, function values between the feature vector and feature vectors already in the training dataset dictionary. 6 . The method as in claim 5 , wherein the function values are computed using a squared exponential function. 7 . The method as in claim 1 , wherein generating the training dataset based on the training dataset dictionary comprises: receiving, at the device, one or more parameters indicative of a particular traffic type for a target network to which the machine learning-based traffic flow analyzer is to be deployed; identifying, by the device, one or more feature vectors in the training dataset dictionary that are associated with the particular traffic type; and determining, by the device, a representation of the one or more feature vectors in the training dataset based on the one or more parameters. 8 . The method as in claim 7 , wherein the representation of the one or more feature vectors in the training dataset excludes the one or more feature vectors in the training dataset dictionary associated with the particular traffic type from the training dataset. 9 . The method as in claim 1 , wherein the machine learning-based traffic flow analyzer is configured to detect malicious traffic flows. 10 . An apparatus, comprising: one or more network interfaces to communicate with a network; a processor coupled to the one or more network interfaces and configured to execute one or more operations; and a memory configured to store a process that is executable by the processor, the process when executed operable to: generate a feature vector based on traffic flow data regarding one or more traffic flows in the network; make a determination as to whether the feature vector is already represented in a training dataset dictionary by one or more feature vectors in the training dataset dictionary; update the training dataset dictionary based on the determination as to whether the feature vector is already represented in the training dataset dictionary by one or more feature vectors in the training dataset dictionary, wherein the is apparatus updates the training dataset dictionary by one of: adding the feature vector to the training dataset dictionary when the feature vector is not already represented by one or more feature vectors in the training dataset dictionary, or incrementing a count associated with a particular feature vector in the training dataset dictionary when the feature vector is already represented by the particular feature vector in the training dataset dictionary; and generate a training dataset based on the training dataset dictionary for training a machine learning-based traffic flow analyzer. 11 . The apparatus as in claim 10 , wherein the training dataset comprises a plurality of labels, and wherein the machine learning-based traffic flow analyzer comprises a machine learning-based traffic flow classifier. 12 . The apparatus as in claim 10 , wherein the traffic flow data comprises header information for one or more encrypted traffic flows. 13 . The apparatus as in claim 10 , wherein the apparatus adds the feature vector to the training dataset dictionary by: initializing a count associated with the feature vector. 14 . The apparatus as in claim 10 , wherein the apparatus makes the determination as to whether the feature vector is already represented in the training dataset dictionary by: computing function values between the feature vector and feature vectors already in the training dataset dictionary. 15 . The apparatus as in claim 14 , wherein the function values are computed using a squared exponential function. 16 . The apparatus as in claim 10 , wherein the apparatus generates the training dataset based on the training dataset dictionary by: receiving one or more parameters indicative of a particular traffic type for a target network to which the machine learning-based traffic flow analyzer is to be deployed; identifying one or more feature vectors in the training dataset dictionary that are associated with the particular traffic type; and determining a representation of the one or more feature vectors in the training dataset based on the one or more parameters. 17 . The apparatus as in claim 16 , wherein the representation of the one or more feature vectors in the training dataset excludes the one or more feature vectors in the training dataset dictionary associated with the particular traffic type from the training dataset. 18 . The apparatus as in claim 10 , wherein the machine learning-based traffic flow analyzer is configured to detect malicious traffic flows. 19 . A tangible, non-transitory, computer-readable medium that stores program instructions causing a device in a network to execute a process comprising: generating, by the device, a feature vector based on traffic flow data regarding one or more traffic flows in the network; making, by the device, a determination as to whether the feature vector is already represented in a training dataset dictionary by one or more feature vectors in the training dataset dictionary; updating, by the device, the training dataset dictionary based on the determination as to whether the feature vector is already represented in the training dataset dictionary by one or more feature vectors in the training dataset dictionary, wherein updating the training dataset dictionary comprises one of: adding the feature vector to the training dataset dictionary when the feature vector is not already represented by one or more feature vectors in the training dataset dictionary, or incrementing a cou
Traffic logging, e.g. anomaly detection · CPC title
Vulnerability analysis · CPC title
Countermeasures against malicious traffic (countermeasures against attacks on cryptographic mechanisms H04L9/002) · CPC title
by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.