Dynamic feature selection for joint probabilistic recognition
US-2016063358-A1 · Mar 3, 2016 · US
US9923912B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9923912-B2 |
| Application number | US-201514960086-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 4, 2015 |
| Priority date | Aug 28, 2015 |
| Publication date | Mar 20, 2018 |
| Grant date | Mar 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are presented that identify malware network communications between a computing device and a server utilizing a detector process. Network traffic records are classified as either malware or legitimate network traffic records and divided into groups of classified network traffic records associated with network communications between the computing device and the server for a predetermined period of time. A group of classified network traffic records is labeled as malicious when at least one of the classified network traffic records in the group is malicious and as legitimate when none of the classified network traffic records in the group is malicious to obtain a labeled group of classified network traffic records. A detector process is trained on individual classified network traffic records in the labeled group of classified network traffic records and network communication between the computing device and the server is identified as malware network communication utilizing the detector process.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: at a networking device, classifying network traffic records as either malware network traffic records or legitimate network traffic records, wherein a subset of the classified network traffic records is classified with flaws; dividing classified network traffic records into at least one group of classified network traffic records, the at least one group including classified network traffic records associated with network communications between a computing device and a server for a predetermined period of time; labeling the at least one group of classified network traffic records as malicious when at least one of the classified network traffic records in the at least one group is malicious or labeling the at least one group of classified network traffic records as legitimate when none of the classified network traffic records in the at least one group is malicious to obtain at least one labeled group of classified network traffic records; training a detector process on individual classified network traffic records in the at least one labeled group of classified network traffic records to learn a flow-level model based on the labeling of the at least one group of classified network traffic records, wherein the detector process is a Neyman-Pearson (NP) detector process combined with a Multi Instance Learning (MIL) algorithm; and identifying malware network communications between the computing device and the server utilizing the flow-level model of the detector process, wherein the NP detector process reduces a false negative rate of detection results to achieve a predetermined false positive rate of the detection results when identifying the malware network communication, and wherein the MIL algorithm reduces an impact of flawed classified network traffic records on an accuracy of the detector process in identifying malware network communication. 2. The method of claim 1 , wherein the network traffic records include proxy logs, and wherein the classifying comprises analyzing proxy log domains of the proxy logs to classify the network traffic records. 3. The method of claim 1 , wherein the classifying comprises classifying network traffic records based on blacklists, domain reputation, security reports and sandboxing analysis results. 4. The method of claim 3 , further comprising: repeatedly retraining the detector process based on updated blacklists, domain reputation, security reports and sandboxing analysis results. 5. The method of claim 1 , wherein a false positive rate of the detector process is determined by an instance that has a maximal distance from a malicious decision hyperplane. 6. The method of claim 1 , wherein the MIL algorithm reduces a weighted sum of errors made by the detector process on the at least one labeled group of classified network traffic records and allows the subset of the classified network traffic records that is classified with flaws. 7. The method of claim 1 , wherein training the detector process comprises: estimating a number of false positive detection results and a number of false negative detection results for results generated by the NP detector process; formulating a learning criterion for training the NP detector process and solving an optimization problem by using a parameter to weight the estimated numbers of the false positive detection results and the false negative detection results; randomly generating parameters for a stochastic gradient descent (SGD) function; repeatedly executing the SGD function using the randomly generated parameters thereby optimizing operating parameters of the NP detector process. 8. An apparatus comprising: one or more processors; one or more memory devices in communication with the one or more processors; and at least one network interface unit coupled to the one or more processors, wherein the one or more processors are configured to: classify network traffic records as either malware network traffic records or legitimate network traffic records, wherein a subset of the classified network traffic records is classified with flaws; divide classified network traffic records into at least one group of classified network traffic records, the at least one group including classified network traffic records associated with network communications between a computing device and a server for a predetermined period of time; label the at least one group of classified network traffic records as malicious when at least one of the classified network traffic records in the at least one group is malicious or label the at least one group of classified network traffic records as legitimate when none of the classified network traffic records in the at least one group is malicious to obtain at least one labeled group of classified network traffic records; train a detector process on individual classified network traffic records in the at least one labeled group of classified network traffic records to learn a flow-level model based on the labeling of the at least one group of classified network traffic records, wherein the detector process is a Neyman-Pearson (NP) detector process combined with a Multi Instance Learning (MIL) algorithm; and identify malware network communications between the computing device and the server utilizing the flow-level model of the detector process, wherein the NP detector process reduces a false negative rate of detection results to achieve a predetermined false positive rate of the detection results when identifying the malware network communication, and wherein the MIL algorithm reduces an impact of flawed classified network traffic records on an accuracy of the detector process in identifying malware network communication. 9. The apparatus of claim 8 , wherein the network traffic records include proxy logs, and wherein the one or more processors are configured to classify network traffic records by analyzing proxy log domains of the proxy logs to classify the network traffic records. 10. The apparatus of claim 8 , wherein the one or more processors are configured to classify network traffic records based on blacklists, domain reputation, security reports and sandboxing analysis results. 11. The apparatus of claim 10 , wherein the one or more processors are configured to: repeatedly retrain the detector process based on updated blacklists, domain reputation, security reports and sandboxing analysis results. 12. The apparatus of claim 8 , wherein a false positive rate of the detector process is determined by an instance that has a maximal distance from a malicious decision hyperplane. 13. The apparatus of claim 8 , wherein the MIL algorithm reduces a weighted sum of errors made by the detector process on the at least one labeled group of classified network traffic records and allows the subset of the classified network traffic records that is classified with flaws. 14. The apparatus of claim 8 , wherein the one or more processor is configured to train the detector process by: estimating a number of false positive detection results and a number of false negative detection results for results generated by the NP detector process; formulating a learning criterion for training the NP detector process and solving an optimization problem by using a parameter to weight the estimated numbers of the false positive detection results and the false negative detection results; randomly generating parameters for a stochastic gradient descent (SGD) function; repeatedly executing the SGD function using the randomly generated parameters thereby optimizing operating parameters of the
Proxies · CPC title
by executing in a restricted environment, e.g. sandbox or secure virtual machine · CPC title
Traffic logging, e.g. anomaly detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.