What technology area does this patent fall under?

Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Jan 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Robust representation of network traffic for detecting malware variations

US10187412B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10187412-B2
Application number	US-201514946156-A
Country	US
Kind code	B2
Filing date	Nov 19, 2015
Priority date	Aug 28, 2015
Publication date	Jan 22, 2019
Grant date	Jan 22, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are presented that identify malware network communications between a computing device and a server based on a cumulative feature vector generated from a group of network traffic records associated with communications between computing devices and servers. Feature vectors are generated, each vector including features extracted from the network traffic records in the group. A self-similarity matrix is computed for each feature which is a representation of the feature that is invariant to an increase or a decrease of feature values across all feature vectors in the group. Each self-similarity matrix is transformed into corresponding histograms to be invariant to a number of network traffic records in the group. The cumulative feature vector is a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records and is generated based on the corresponding histograms.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: at a networking device, dividing network traffic records to create at least one group of network traffic records, the at least one group including network traffic records being associated with network communications between a computing device and a server for a predetermined period of time; generating a set of feature vectors, each feature vector of the set of features vectors representing one of the network traffic records of the network communications included in the at least one group of network traffic records, wherein each feature vector comprises a predefined set of features extracted from one of the network traffic records; computing a self-similarity matrix for each feature of the predefined set of features using all feature vectors generated for the at least one group, each self-similarity matrix being a representation of one feature of the predefined set of features that is invariant to an increase or a decrease of values of the one feature across all of the feature vectors generated for the at least one group of network traffic records, each self-similarity matrix including a plurality of elements in rows and columns, wherein an (i, j)-th element of a self-similarity matrix corresponds to a distance between a feature value of an i-th network traffic record and a feature value of a j-th network traffic record; transforming each self-similarity matrix into a corresponding histogram to form a set of histograms, each histogram being a representation of the one feature that is invariant to a number of network traffic records in the at least one group of network traffic records; generating a cumulative feature vector based on the set of histograms, the cumulative feature vector being a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records; training a classifier based on the cumulative feature vector to produce a trained classifier; classifying, by the trained classifier, the at least one group as being malicious; and identifying a malware network communication between the computing device and the server utilizing the at least one classified group, wherein the cumulative feature vector enables detection of variations and modifications of the malware network communication. 2. The method of claim 1 , further comprising: transforming each self-similarity matrix into a locally-scaled self-similarity matrix, each locally-scaled self-similarity matrix being a representation of the one feature of the predefined set of features that is invariant to values of the one feature across all of the feature vectors being multiplied by a common factor. 3. The method of claim 1 , wherein generating the cumulative feature vector comprises concatenating the histograms in the set of histograms to form the cumulative feature vector. 4. The method of claim 1 , wherein the variations and modifications of the malware network communication include a variation in one or more of: a shift of the flow-based features, a scale of the flow-based features, a permutation of the flow-based features, a number of the flow-based features, or in a size of the at least one group of network traffic records, and further comprising transforming a representation of the at least one group of network traffic records to be invariant against the variations and modifications of the malware network communication. 5. The method of claim 1 , wherein the network traffic records include proxy logs and network flow reports, and wherein the predefined set of flow-based feature values includes values describing a structure of a Uniform Resource Locator (URL), a number of bytes transferred from the server to the computing device, a status of a user agent, a Hypertext Transfer Protocol (HTTP) status, a Multipurpose Internet Mail Extension (MIME) type, and a port value. 6. The method of claim 1 , wherein the self-similarity matrix is a symmetric positive semidefinite matrix in which the rows and columns represent individual network communications between the computing device and the server. 7. The method of claim 6 , further comprising: scaling all values in the self-similarity matrix into an interval [0,1] to produce scale invariance. 8. An apparatus comprising: one or more processors; one or more memory devices in communication with the one or more processors; and at least one network interface unit coupled to the one or more processors, wherein the one or more processors are configured to: divide network traffic records to create at least one group of network traffic records, the at least one group including network traffic records being associated with network communications between a computing device and a server for a predetermined period of time; generate a set of feature vectors, each feature vector of the set of feature vectors representing one of the network traffic records of the network communications included in the at least one group of network traffic records, wherein each feature vector comprises a predefined set of features extracted from one of the network traffic records; compute a self-similarity matrix for each feature of the predefined set of features using all feature vectors generated for the at least one group, each self-similarity matrix being a representation of one feature of the predefined set of features that is invariant to an increase or a decrease of values of the one feature across all of the feature vectors generated for the at least one group of network traffic records, each self-similarity matrix including a plurality of elements in rows and columns, wherein an (i, j)-th element of a self-similarity matrix corresponds to a distance between a feature value of an i-th network traffic record and a feature value of a j-th network traffic record; transform each self-similarity matrix into a corresponding histogram to form a set of histograms, each histogram being a representation of the one feature that is invariant to a number of network traffic records in the at least one group of network traffic records; generate a cumulative feature vector based on the set of histograms, the cumulative feature vector being a cumulative representation of the predefined set of features of all network traffic records included in the at least one group of network traffic records; train a classifier based on the cumulative feature vector to produce a trained classifier; classify, by the trained classifier, the at least one group as being malicious; and identify a malware network communication between the computing device and the server utilizing the at least one classified group, wherein the cumulative feature vector enables detection of variations and modifications of the malware network communication. 9. The apparatus of claim 8 , wherein the one or more processors are configured to: transform each self-similarity matrix into a locally-scaled self-similarity matrix, each locally-scaled self-similarity matrix being a representation of the one feature of the predefined set of features that is invariant to values of the one feature across all of the feature vectors being multiplied by a common factor. 10. The apparatus of claim 8 , wherein the one or more processors generate the cumulative feature vector by concatenating the histograms in the set of histograms to form the cumulative feature vector. 11. The apparatus of claim 8 , wherein the variations and modifications of the malware network communication include a variation in one or more of: a shift of the flow-based features, a scale of the flow-based features, a permutation of the flow-based features, a number of the

Assignees

Cisco Tech Inc

Inventors

Classifications

H04L63/1425Primary
Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

View patent family 58096972

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10187412B2 cover?: Techniques are presented that identify malware network communications between a computing device and a server based on a cumulative feature vector generated from a group of network traffic records associated with communications between computing devices and servers. Feature vectors are generated, each vector including features extracted from the network traffic records in the group. A self-simi…
Who is the assignee on this patent?: Cisco Tech Inc
What technology area does this patent fall under?: Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Jan 22 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).