System for classifying encrypted traffic based on data packet

US12438819B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12438819-B2
Application numberUS-202318386251-A
CountryUS
Kind codeB2
Filing dateNov 1, 2023
Priority dateMar 18, 2022
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a system for classifying encrypted traffic based on a data packet. The system includes a traffic capture module, a traffic analysis module, and a traffic classification module. The system collects data packets from a network flow to construct a machine learning model, so as to classify encrypted traffic and identify normal traffic and malicious traffic. In a process of constructing a feature matrix, basic spatial-temporal features, header features, load features, and statistical features are obtained. In addition, behavioral features of the data packets are obtained and used to demonstrate differences between the normal traffic and the malicious traffic. Moreover, the present disclosure focuses on a difference between different versions of an encryption protocol, especially a transport layer security (TLS) protocol, and introduces the difference into a model for analysis, so that the system classifies encrypted traffic in a more efficient manner.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for classifying encrypted traffic based on a data packet, comprising a traffic capture module, a traffic analysis module, and a traffic classification module, wherein the traffic capture module is configured to filter data packet information in a network flow by identifying an IP address, a port number, a protocol type, and a flag bit in traffic, to obtain flow data, wherein the network flow refers to all data packets transmitted between two IP addresses and ports corresponding to the two IP addresses; the traffic analysis module is configured to: extract transport layer security (TLS), hypertext transfer protocol (HTTP), and domain name system (DNS) protocol information and related fields from the flow data; extract information about data packets in the flow data; and perform a cluster analysis on information about sizes, flow directions, and delays of the data packets, to extract spatial-temporal features, header features, load features, and statistical features from the flow data, wherein the spatial-temporal features refer to temporal attributes and spatial attributes of data packets that are normally sent in a network traffic transmission process, the header features comprise 5-tuple information of the traffic, DNS information, and HTTP information, the load features refer to content encapsulated in the flow data, and the statistical features comprise an average packet length, a maximum packet length, an average inter-packet delay, a ratio of a quantity of uplink data packets to a quantity of downlink data packets, and a ratio of a quantity of uplink bytes to a quantity of downlink bytes; and the traffic classification module is configured to classify normal data packets and malicious data packets through k-means clustering, wherein an input dataset is in a format of D={x 1 , x 2 , . . . , x i }, and an output is a classification result C={C 1 , C 2 }, wherein C 1 and C 2 represent labels of normal traffic and malicious traffic respectively; and a specific classification process comprises: first, randomly selecting two samples from the dataset D, to constitute a centroid set {μ 1 , μ 2 }, wherein a centroid of the set is represented by μ j ; then, calculating a distance between each sample x i and the centroid μμ j , wherein the distance is calculated based on the following formula: d ij =∥x i −μ j ∥ 2 2 next, recalculating a centroid of the set C based on the following formula: μ j = 1 ❘ "\[LeftBracketingBar]" C i ❘ "\[RightBracketingBar]" ⁢ ∑ xi ∈ C j x i subsequently, calculating distances between each sample and two centroids; allocating each sample to a centroid that is closest to the sample, wherein the centroid and the sample that is allocated to the centroid constitute a cluster; and after all samples are allocated, outputting a clustering result if no centroid vector is changed, wherein the following clustering result is finally output: C={C 1 ,C 2 } after categories of the normal data packets and the malicious data packets in the traffic are obtained, calculating a proportion of the normal data packets in the traffic, a proportion of the malicious data packets in the traffic, and a ratio of the normal data packets to the malicious data packets; and adding, as parameters to a feature matrix, the proportion of the normal data packets in the traffic, the proportion of the malicious data packets in the traffic, and the ratio of the normal data packets to the malicious data packets, to finally obtain a sample set S={S_1, S_2|x i ∈S}, wherein x i is a sample in the set S; after the sample set is input, using a light gradient-boosting machine (LightGBM) model for classification, so as to obtain a traffic classification result, wherein a Gini coefficient expression of probability distribution is: Gini( p )=2 p (1− p ) wherein p represents a probability of being normal traffic, a loss function that is used is a log-likelihood loss function, and the log-likelihood loss function is calculated based on the following formula: L = - 1 N ⁢ ∑ i = 1 N ( y i ⁢ log ⁢ p i + ( 1 - y i ) ⁢ log ⁢ ( 1 - p i ) ) wherein L represents the loss function, N represents a quantity of samples, y i represents a true category of an input instance, and p i represents a predicted probability that the input instance belongs to a normal traffic category. 2. The system for classifying encrypted traffic based on a data packet according to claim 1 , wherein the temporal attributes of the data packets comprise time points at which the data packets are sent and inter-packet delays. 3. The system for classifying encrypted traffic based on a data packet according to claim 1 , wherein the spatial attributes of the data packets comprise lengths of the data packets, directions in which the data packets are sent, and a quantity of the data packets. 4. The system for classifying encrypted traffic based on a data packet according to claim 1 , wherein the DNS information comprises a DNS domai

Assignees

Inventors

Classifications

  • Event detection, e.g. attack signature detection · CPC title

  • relying on flow classification, e.g. using integrated services [IntServ] · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

  • with fixed number of clusters, e.g. K-means clustering · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12438819B2 cover?
Disclosed is a system for classifying encrypted traffic based on a data packet. The system includes a traffic capture module, a traffic analysis module, and a traffic classification module. The system collects data packets from a network flow to construct a machine learning model, so as to classify encrypted traffic and identify normal traffic and malicious traffic. In a process of constructing…
Who is the assignee on this patent?
Univ Guangzhou
What technology area does this patent fall under?
Primary CPC classification H04L47/2441. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).