Training a machine learning-based traffic analyzer using a prototype dataset

US2021357815A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021357815-A1
Application numberUS-202117386020-A
CountryUS
Kind codeA1
Filing dateJul 27, 2021
Priority dateJan 5, 2017
Publication dateNov 18, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a device in a network generates a feature vector based on traffic flow data regarding one or more traffic flows in the network. The device makes a determination as to whether the generated feature vector is already represented in a training dataset dictionary by one or more feature vectors in the dictionary. The device updates the training dataset dictionary based on the determination by one of: adding the generated feature vector to the dictionary when the generated feature vector is not already represented by one or more feature vectors in the dictionary, or incrementing a count associated with a particular feature vector in the dictionary when the generated feature vector is already represented by the particular feature vector in the dictionary. The device generates a training dataset based on the training dataset dictionary for training a machine learning-based traffic flow analyzer.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: generating, by a device in a network, a feature vector based on traffic flow data regarding one or more traffic flows in the network; making, by the device, a determination as to whether the feature vector is already represented in a training dataset dictionary by one or more feature vectors in the training dataset dictionary; updating, by the device, the training dataset dictionary based on the determination as to whether the feature vector is already represented in the training dataset dictionary by one or more feature vectors in the training dataset dictionary, wherein updating the training dataset dictionary comprises one of: adding the feature vector to the training dataset dictionary when the feature vector is not already represented by one or more feature vectors in the training dataset dictionary, or incrementing a count associated with a particular feature vector in the training dataset dictionary when the feature vector is already represented by the particular feature vector in the training dataset dictionary; and generating, by the device, a training dataset based on the training dataset dictionary for training a machine learning-based traffic flow analyzer. 2 . The method as in claim 1 , wherein the training dataset comprises a plurality of labels, and wherein the machine learning-based traffic flow analyzer comprises a machine learning-based traffic flow classifier. 3 . The method as in claim 1 , wherein the traffic flow data comprises header information for one or more encrypted traffic flows. 4 . The method as in claim 1 , wherein adding the feature vector to the training dataset dictionary comprises: initializing, by the device, a count associated with the feature vector. 5 . The method as in claim 1 , wherein making the determination as to whether the feature vector is already represented in the training dataset dictionary comprises: computing, by the device, function values between the feature vector and feature vectors already in the training dataset dictionary. 6 . The method as in claim 5 , wherein the function values are computed using a squared exponential function. 7 . The method as in claim 1 , wherein generating the training dataset based on the training dataset dictionary comprises: receiving, at the device, one or more parameters indicative of a particular traffic type for a target network to which the machine learning-based traffic flow analyzer is to be deployed; identifying, by the device, one or more feature vectors in the training dataset dictionary that are associated with the particular traffic type; and determining, by the device, a representation of the one or more feature vectors in the training dataset based on the one or more parameters. 8 . The method as in claim 7 , wherein the representation of the one or more feature vectors in the training dataset excludes the one or more feature vectors in the training dataset dictionary associated with the particular traffic type from the training dataset. 9 . The method as in claim 1 , wherein the machine learning-based traffic flow analyzer is configured to detect malicious traffic flows. 10 . An apparatus, comprising: one or more network interfaces to communicate with a network; a processor coupled to the one or more network interfaces and configured to execute one or more operations; and a memory configured to store a process that is executable by the processor, the process when executed operable to: generate a feature vector based on traffic flow data regarding one or more traffic flows in the network; make a determination as to whether the feature vector is already represented in a training dataset dictionary by one or more feature vectors in the training dataset dictionary; update the training dataset dictionary based on the determination as to whether the feature vector is already represented in the training dataset dictionary by one or more feature vectors in the training dataset dictionary, wherein the is apparatus updates the training dataset dictionary by one of: adding the feature vector to the training dataset dictionary when the feature vector is not already represented by one or more feature vectors in the training dataset dictionary, or incrementing a count associated with a particular feature vector in the training dataset dictionary when the feature vector is already represented by the particular feature vector in the training dataset dictionary; and generate a training dataset based on the training dataset dictionary for training a machine learning-based traffic flow analyzer. 11 . The apparatus as in claim 10 , wherein the training dataset comprises a plurality of labels, and wherein the machine learning-based traffic flow analyzer comprises a machine learning-based traffic flow classifier. 12 . The apparatus as in claim 10 , wherein the traffic flow data comprises header information for one or more encrypted traffic flows. 13 . The apparatus as in claim 10 , wherein the apparatus adds the feature vector to the training dataset dictionary by: initializing a count associated with the feature vector. 14 . The apparatus as in claim 10 , wherein the apparatus makes the determination as to whether the feature vector is already represented in the training dataset dictionary by: computing function values between the feature vector and feature vectors already in the training dataset dictionary. 15 . The apparatus as in claim 14 , wherein the function values are computed using a squared exponential function. 16 . The apparatus as in claim 10 , wherein the apparatus generates the training dataset based on the training dataset dictionary by: receiving one or more parameters indicative of a particular traffic type for a target network to which the machine learning-based traffic flow analyzer is to be deployed; identifying one or more feature vectors in the training dataset dictionary that are associated with the particular traffic type; and determining a representation of the one or more feature vectors in the training dataset based on the one or more parameters. 17 . The apparatus as in claim 16 , wherein the representation of the one or more feature vectors in the training dataset excludes the one or more feature vectors in the training dataset dictionary associated with the particular traffic type from the training dataset. 18 . The apparatus as in claim 10 , wherein the machine learning-based traffic flow analyzer is configured to detect malicious traffic flows. 19 . A tangible, non-transitory, computer-readable medium that stores program instructions causing a device in a network to execute a process comprising: generating, by the device, a feature vector based on traffic flow data regarding one or more traffic flows in the network; making, by the device, a determination as to whether the feature vector is already represented in a training dataset dictionary by one or more feature vectors in the training dataset dictionary; updating, by the device, the training dataset dictionary based on the determination as to whether the feature vector is already represented in the training dataset dictionary by one or more feature vectors in the training dataset dictionary, wherein updating the training dataset dictionary comprises one of: adding the feature vector to the training dataset dictionary when the feature vector is not already represented by one or more feature vectors in the training dataset dictionary, or incrementing a cou

Assignees

Inventors

Classifications

  • Traffic logging, e.g. anomaly detection · CPC title

  • Vulnerability analysis · CPC title

  • Countermeasures against malicious traffic (countermeasures against attacks on cryptographic mechanisms H04L9/002) · CPC title

  • by monitoring network traffic (monitoring network traffic per se H04L43/00) · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021357815A1 cover?
In one embodiment, a device in a network generates a feature vector based on traffic flow data regarding one or more traffic flows in the network. The device makes a determination as to whether the generated feature vector is already represented in a training dataset dictionary by one or more feature vectors in the dictionary. The device updates the training dataset dictionary based on the dete…
Who is the assignee on this patent?
Cisco Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 18 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).