Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS.

Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS / Performance 2009 2009. 7. 8 Seokshin Son ssson@mmlab.snu.ac.kr

Methodology for network graph analysis Decompose the graph (extracting community structures or dense subgraphs) Interpretation and persistence of the subgraphs (identifying non- random substructures) Understanding the formation of network graphs and associated applications decomposition interpretable? persistent? 1. Characterizing the graph 2. Explain the graph formation 3. Applications 1 2 3 2/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General remarks 3/24

Application traffic activity graphs (TAG’s) Host-to-host communication involves various types of traffic. We study snapshots of network traffic to capture the temporal correlations. We detrend the data based on ports (applications) and create TAGs. Inside hosts are (likely) service requesters and outside hosts are service providers. TAG’s are bi-partite and the associated adjacency matrices are binary. UMN Internet T HTTP port 80/443 Email port 25/993 Gnutella port 6346/6348 4/24

Application traffic activity graphs (TAGs) and evolution HTTP 1K to 3KDNS 1K to 3KAOL IM 1K to 3KEmail 1K to 3K 5/24

Characteristics of TAGs We observe difference in terms of basic statistics, such as graph density, average in/out degree, etc. ALL TAGs contain giant connected component (GCC), which accounts for more than 85% of all the edges. 6/24

Dense subgraphs in TAGs Block structures in the adjacency matrices indicate dense subgraphs in TAGs Rotating rows and columns of corresponding adjacency matrices… 1 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 1 HTTP EmailAOL IM BitTorrentDNS 8/24

Extracting dense subgraphs Extracting dense subgraphs can be formulated as a co-clustering problem, i.e., cluster hosts into inside host groups and outside host groups, then extract pairs of groups with more edges connected (higher density). This co-clustering problem can be solved by tri-nonnegative matrix factorization algorithm, which minimizes: We use hard clustering setting by assigning each host to only one inside/outside group 9/24

Tri-nonnegative matrix factorization ≈ 1 0 0 … 0 0 1 0 … 0 … 0 0 0 … 1 1 1 1 0 0 … 0 0 0 0 1 1 … 0... 0 0 0 0 0 … 1 × × ≈ × × A R H C Adjacency matrix assoc. with TAG Row group membership Indicator matrix Proportional to the subgraph density matrix Column group membership Indicator matrix We identify dense subgraphs based on the large entries in H R is m-by-k, C is r-by-n, hence, the product is a low-rank approximation of A, with rank min(k, r) 10/24

Subgraph prototypes Recall inside (UMN) hosts are (likely) service requesters and outside hosts are service providers. Based on the number of inside/outside hosts in each subgraph, we propose three prototypes. In-starBi-meshOut-star One inside client accesses multiple outside servers Multiple inside client accesses one outside servers Multiple inside clients interacts with many outside servers 11/24

Characterizing TAGs with subgraph prototypes Different application TAGs contain different types of subgraphs We can distinguish and characterize applications based on the subgraph components What do these subgraphs mean? HTTP Email AOL IMBitTorrent DNS 12/24

Interpreting HTTP bi-mesh structures Most star structures are due to popular servers or active clients We can explain more than 80% of the HTTP bi- meshes identified in one day Server correlation driven –Server farms Lycos, Yahoo, Google –Correlated service providers CDN: LLNW, Akamai, SAVVIS, Level3 Advertising providers: DoubleClick, etc. User interests driven News: WashingtonPost, New York Times, Cnet Media: ImageShack, casalemedia, tl4s2 Online shopping: Ebay, Costco, Walmart Social network: Facebook, MySpace 14/24

How are the dense subgraphs connected? (A) Randomly connected stars (C) Pool (B) Tree: client/server dual role (D) Correlated pool 15/24

Evolution of HTTP TAGs The extracted dense subgraphs are assumed to be non-random (persistent) However, each subgraph may evolve over time with hosts leaving/joining the subgraph A “best effort” similarity metric: AS domain name IP = ? If the percentage of similar hosts (at certain level) is greater than η 17/24

Evolution of HTTP TAGs (2) TAGs are temporally stable at the Domain/AS level TAGs are transient at the host level 18/24

Evolution of HTTP TAGs (2) Subgraphs last from a few hours to a whole day Subgraphs become more similar during the day time 19/24

Application: Identifying unknown traffic Similarity of UDP port 4000 TAG subgraphs with subgraphs from messenger (chat) traffic AOL Messenger Yahoo! Messenger UDP port 4000 Best match 21/24

Application: Storm worm botnet analysis StormTypical P2P Storm worm botnet graph contains many bi-mesh structures, which differs significantly from typical p2p applications Bots query for supernodes Bots acquire commands from supernodes Spam campaign 22/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General Remarks 23/24

Summary and General Remarks Many network research questions can be formulated as “graph analysis” problem These graphs are mostly not random and have latent structures. We propose a co-clustering (tri-Nonnegative matrix factorization) based method for decomposing such graphs and reveal the community structures (dense subgraphs). Obtained subgraphs are meaningful and persistent, which help understand the formation of complicated communication graphs. We demonstrate various applications based on the decomposition results 24/24

Appendix 25

Characteristics of app. TAGs These statistics show difference between various app. TAGs It does not explain the formation of TAGs 26/24

TNMF algorithm related Iterative optimization algorithm Group density matrix derivation 27/24

Pratical issues of tNMF Selection of rank and density –Linear search of appropriate ranks –Edge coverage converges after proper chosen rank and density With the above rank selection method, we achieve stable subgraph decomposition results 28/24

Pratical issues of tNMF (2) Low convergence rate and local minima –We use SVD as an initialization approach for tNMF. –Multiple runs are applied when necessary and the result with the lowest relative square error (RSE) is selected. 29/24

Number of subgraphs over time 30/24

Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS.

Similar presentations

Presentation on theme: "Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS.

Similar presentations

Presentation on theme: "Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS."— Presentation transcript:

Similar presentations

About project

Feedback