Threat detection and mitigation in a virtualized computing environment
US-2019297096-A1 · Sep 26, 2019 · US
US12184672B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12184672-B2 |
| Application number | US-202017761861-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 18, 2020 |
| Priority date | Oct 28, 2019 |
| Publication date | Dec 31, 2024 |
| Grant date | Dec 31, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for detecting security based on machine learning in combination with rule matching is provided, including: establishing a machine learning model; training the machine learning model by using a labeled legal traffic and a labeled malicious traffic; collecting a network traffic; preprocessing the collected network traffic; detecting a malicious traffic from the preprocessed network traffic by using a rule-matching-based method; identifying a malicious traffic from the preprocessed network traffic by using the trained machine learning model, including: extracting a feature of the preprocessed network traffic, and identifying the malicious traffic based on the extracted feature by using the trained machine learning model; and integrating the malicious traffic detected by the rule-matching-based method and the malicious traffic identified by the trained machine learning model.
Opening claim text (preview).
What is claimed is: 1. A method for detecting security based on machine learning in combination with rule matching, comprising: establishing a machine learning model; training the machine learning model by using a labeled legal traffic and a labeled malicious traffic; collecting a network traffic; preprocessing the collected network traffic; detecting a malicious traffic from the preprocessed network traffic by using a rule-matching-based method; identifying a malicious traffic from the preprocessed network traffic by using the trained machine learning model, comprising: extracting a feature of the preprocessed network traffic, and identifying the malicious traffic based on the extracted feature by using the trained machine learning model; and integrating the malicious traffic detected by the rule-matching-based method and the malicious traffic identified by the trained machine learning model, wherein the method further comprises: sampling the collected network traffic according to a specified sampling rule, and the preprocessing the collected network traffic further comprises: preprocessing the sampled network traffic; and wherein the method further comprises: displaying an integrated result through a visualization technology, wherein the sampling the collected network traffic comprises: sampling the collected network traffic by using a flexible sampling algorithm; wherein the flexible sampling algorithm is a data stream record selection algorithm depending on a size of the data stream; for a data stream set S={X 1 , . . . , X n } with a size n, a data stream x i ′ with a size x i is selected from each X i through the flexible sampling algorithm with a probability P(x i ), i=1, . . . , n, so as to form a new data stream set S′={x i ′, . . . , x n ′}; the flexible sampling algorithm aims to make a total number of byte X′=Σ x i ′∈s′ , x i ′/P(x i ) calculated by sampling approach a total number of byte X=Σ x i ∈s X i of a real traffic; where i=1, . . . , n. 2. The method according to claim 1 , wherein the training the machine learning model by using a labeled legal traffic and a labeled malicious traffic comprises: extracting a time-based feature, a network-layer-based feature and a TTL-based feature from the labeled legal traffic and the labeled malicious traffic; and training the machine learning model based on the extracted features; and wherein the method further comprises: verifying the trained machine learning model by using a verification data set. 3. The method according to claim 1 , wherein the network traffic is collected by a GPU, and a data packet in the network traffic is directly copied from a cache queue of a network card to a user space based on a zero copy technology by using a direct memory access structure. 4. The method according to claim 1 , wherein the preprocessing the collected network traffic further comprises: performing a data packet reassembling, a protocol decoding and/or an anomaly detection on a data packet in the collected network traffic; wherein the data packet reassembling comprises a stream reassembling and a fragment reassembling, the protocol decoding is to decode a protocol of the data packet into a unified format, and the anomaly detection at least comprises a port scanning; and wherein a result of the preprocessing is a data after the packet reassembling and the protocol decoding in response to the data packet passing the anomaly detection; and an alarm is generated in response to the data packet failing the anomaly detection. 5. The method according to claim 1 , wherein the detecting a malicious traffic from the preprocessed network traffic by using a rule-matching-based method comprises: detecting the malicious traffic by using a PFAC algorithm; wherein a separate thread is created for each byte of an input data stream through the PFAC algorithm, so as to identify a mode starting from a starting position of the thread, and the number of created thread equals to a length of the input data stream; wherein each thread of the PFAC algorithm is only responsible for identifying the mode starting from the starting position of the thread, and terminating in response to the thread failing to find any mode located at the starting position of the thread, without a fault transition by a backtracking state machine; each final state of the PFAC algorithm represents a specified mode, so that a uniqueness of the each final state in the PFAC is maintained without processing a plurality of outputs; wherein a payload of the data stream is matched and verified with a plurality of rules in a rule set of an intrusion detection at the same time and in parallel through the PFAC algorithm, and in response to a match existing, the data stream is marked as the malicious traffic and an alarm is triggered. 6. The method according to claim 1 , wherein the extracting a feature of the preprocessed network traffic comprises: extracting a source port, a source address, a destination port, a destination address, an ICMP type, a protocol identifier, an original data length and an original data. 7. The method according to claim 6 , wherein the extracting a feature of the preprocessed network traffic comprises: implementing a hash table in a GPU, wherein the hash table is used to maintain and track an index of a feature data of each active traffic in the network traffic, and a specified hash value for each data unit is used to determine a specified data stream; wherein an atomic lock is used on each mutually exclusive hash entry, so that only one thread is allowed to update a hash entry of the thread at each moment; a data stream corresponding to a feature data becomes inactive in response to the feature data being transmitted, so as to trigger an operation of deleting the feature data corresponding to the data stream from the hash table; and for each data stream in the network traffic, a moment of a last-arrived packet is recorded in the hash table, wherein a threshold-based method is used to determine an inactive data stream, the threshold-based method comprises: determining that the feature data corresponding to the data stream is inactive in response to a time interval exceeding a threshold; wherein a feature data of the inactive data stream is output by providing a timing task, and the trained machine learning model is used for classifying based on the feature data. 8. The method according to claim 1 , wherein the steps of establishing and training the machine learning model are performed offline, and the steps of the collecting, preprocessing, detecting, identifying and integrating are performed online. 9. A device for detecting security based on machine learning in combination with rule matching, comprising: a processor; and a memory storing instructions, wherein the instructions, when executed by the processor, cause the processor to: establish a machine learning model; train the machine learning model by using a labeled legal traffic and a labeled malicious traffic; collect a network traffic; preprocess the collected network traffic; detect a malicious traffic from the preprocessed network traffic by using a rule-matching-based method; identify a malicious traffic from the preprocessed network traffic by using the trained machine learning model, comprising: extract a feature of the preprocessed network traffic, and identify the malicious traffic based on the extracted feature by using the trained machine learning model; and integrate the malicious traffic detected by the rule-matching-based method and the malicious traffic identified by the trained machine learning model, wherein the instructions, when executed by the processor, further cause the processor to: sampl
Machine learning · CPC title
Inference or reasoning models · CPC title
Ensemble learning · CPC title
at the transport layer · CPC title
involving simulating, designing, planning or modelling of a network · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.