Mdl-based clustering for application dependency mapping

US2016359697A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016359697-A1
Application numberUS-201615145666-A
CountryUS
Kind codeA1
Filing dateMay 3, 2016
Priority dateJun 5, 2015
Publication dateDec 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Application dependency mapping (ADM) can be automated in a network. The network can determine an optimum number of clusters for the network using the minimum description length principle (MDL). The network can capture network and associated data using a sensor network that provides multiple perspectives and generate a graph therefrom. The nodes of the graph can include sources, destinations, and destination ports identified in the captured data, and the edges of the graph can include observed flows from the sources to the destinations at the destination ports. Each clustering can be evaluated according to an MDL score. The optimum number of clusters for the network may correspond to the number of clusters of the clustering associated with the minimum MDL score.

First claim

Opening claim text (preview).

1 . A method comprising: capturing network data using at least a first plurality of sensors executing on a plurality of endpoints of a network and a second plurality of sensors executing on a plurality of networking devices connected to the plurality of endpoints; determining a graph using the network data, the graph including at least a plurality of nodes corresponding to the plurality of endpoints and a plurality of edges corresponding to a plurality of observed flows included in the network data; and identifying a first clustering corresponding to a minimum description length (MDL) score that is computed based at least in part on observed edges of the graph and unobserved edges of the graph; and determining an optimum number of clusters for the plurality of endpoints based on a number of clusters for the first clustering. 2 . The method of claim 1 , wherein identifying the first clustering includes: computing a number of the observed edges for a first cluster combination of the first clustering; computing a number of the unobserved edges for the first cluster combination; determining a minimum value between the number of the observed edges and the number of the unobserved edges for the first cluster combination; determining the MDL score for the first clustering by summing the minimum value for the first cluster combination and a respective minimum value for each remaining cluster combination of the first clustering; and determining a minimum MDL score from a respective MDL score of each clustering of the plurality of endpoints, wherein the first clustering corresponds to the minimum MDL score. 3 . The method of claim 2 , wherein the first cluster combination includes a source cluster, a destination cluster, and a destination port. 4 . The method of claim 1 , wherein the network includes a plurality of partitions, and the method further comprises: determining a random order for the plurality of partitions; determining, in the random order, a respective MDL score for each of the plurality of partitions and each potential number of clusters; and determining a respective number of clusters for each of the plurality of partitions based at least in part on the respective MDL score for each of the plurality of partitions and each potential number of clusters. 5 . The method of claim 1 , wherein the graph further includes a second plurality of nodes corresponding to a plurality of destination endpoint ports. 6 . The method of claim 1 , further comprising: capturing data in a second domain corresponding to the network data, wherein the graph further includes a third plurality of nodes corresponding to an attribute of the second domain. 7 . The method of claim 6 , wherein the second domain is at least one of a host domain, a process domain, a user domain, a virtualization domain, or a tenant domain. 8 . The method of claim 1 , further comprising: determining a first weight of a first edge of the plurality of edges based on a number of flows corresponding to the first edge, wherein the MDL score is further based at least in part on the first weight. 9 . The method of claim 1 , wherein each cluster of the first clustering corresponds to a respective endpoint group. 10 . The method of claim 1 , wherein a first endpoint of the plurality of endpoints is a virtual partition, the first plurality of sensors includes a first sensor executing within the virtual partition and a second sensor executing on a physical server hosting the virtual partition. 11 . The method of claim 1 , further comprising: determining an application dependency between a first cluster of the first clustering and a second cluster of the first clustering; and determining a policy allowing traffic from the first cluster to the second cluster. 12 . The method of claim 11 , further comprising: enforcing the policy within a simulated environment corresponding to the network using at least one of historical ground truth traffic data or real time ground truth traffic data. 13 . A system comprising: a processor; and memory including instructions that, upon being executed by the processor, cause the system to: capture network data using at least a first respective sensor executing on each of a plurality of endpoints of a network and a second respective sensor executing on each of a plurality of networking devices connected to the plurality of endpoints; determine a graph using the network data, the graph including at least a plurality of nodes, corresponding to the plurality of endpoints and a plurality of endpoint ports, and a plurality of edges corresponding to a plurality of observed flows included in the network data; and determine an optimum number of clusters for the plurality of endpoints based on a number of clusters of a first clustering that corresponds to a minimum description length (MDL) score. 14 . The system of claim 13 , wherein the instructions upon being executed further cause the system to: for each potential clustering for the plurality of endpoints, compute a respective number of observed edges of the graph for each cluster combination of the potential clustering; compute a respective number of unobserved edges of the graph for each cluster combination of the potential clustering; and determine a respective minimum value between the respective number of the observed edges for each cluster combination and the respective number of the unobserved edges for each cluster combination; determine a respective MDL score for each potential clustering by summing the respective minimum value for each cluster combination of the potential clustering; and determine the minimum MDL score from the respective MDL score for each potential clustering. 15 . The system of claim 13 , wherein the network includes a plurality of partitions, and the instructions upon being executed further cause the system to: for each potential number of clusters, determining a random order for the plurality of partitions; determining, in the random order, a respective MDL score for each of the plurality of partitions for the potential number of clusters; and determining a minimum MDL score from the respective MDL score for each of the plurality of partitions for the potential number of clusters; and determining a respective number of clusters for each of the plurality of partitions based at least in part on a respective minimum MDL score for each of the plurality of partitions and each potential number of clusters. 16 . The system of claim 15 , wherein the instructions upon being executed further cause the system to: capture data in a second domain corresponding to the network data, wherein the graph further includes a third plurality of nodes corresponding to an attribute of the second domain, and wherein the second domain is at least one of a host domain, a process domain, a user domain, a virtualization domain, or a tenant domain. 17 . A non-transitory computer-readable medium having computer readable instructions that, upon being executed by a processor, cause the processor to: capture network data using at least a first plurality of sensors executing on a plurality of endpoints of a network and a second plurality of sensors executing on a plurality of networking devices connected to the plurality of endpoints; determine a graph using the network data, the graph including at least a plurality of nodes, corresponding to the plurality of endpoints and a plurality of endpoint ports, and a plurality of edges corresponding to a plurality of observed flows included in

Assignees

Inventors

Classifications

  • Drawing of charts or graphs · CPC title

  • based on quality criteria · CPC title

  • Policy-based network configuration management · CPC title

  • Vulnerability analysis · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016359697A1 cover?
Application dependency mapping (ADM) can be automated in a network. The network can determine an optimum number of clusters for the network using the minimum description length principle (MDL). The network can capture network and associated data using a sensor network that provides multiple perspectives and generate a graph therefrom. The nodes of the graph can include sources, destinations, an…
Who is the assignee on this patent?
Cisco Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04L43/04. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Dec 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).