Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Experimental Design, Response Surface Analysis, and Optimization
Dimensionality Reduction PCA -- SVD
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
PCA + SVD.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Graph & BFS.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
Segmentation Graph-Theoretic Clustering.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Cluster Validation.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.
Computer Science 1 Web as a graph Anna Karpovsky.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Application Layer Functionality and Protocols Network Fundamentals – Chapter 3.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Tag-based Social Interest Discovery
Research Meeting Seungseok Kang Center for E-Business Technology Seoul National University Seoul, Korea.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.
Community Evolution in Dynamic Multi-Mode Networks Lei Tang, Huan Liu Jianping Zhang Zohreh Nazeri Danesh Zandi & Afshin Rahmany Spring 12SRBIAU, Kurdistan.
Resisting Denial-of-Service Attacks Using Overlay Networks Ju Wang Advisor: Andrew A. Chien Department of Computer Science and Engineering, University.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair Eric P. Xing William W. Cohen Ambuj K. Singh, University.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs Zhe Jin.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Applied Quantitative Analysis and Practices LECTURE#19 By Dr. Osman Sadiq Paracha.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Preliminary Oral Exam Yu Jin Advisor: Professor Zhi-Li Zhang Tackling Network Management Problems with Machine Learning Techniques.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Community detection in graphs
CSE 4705 Artificial Intelligence
Segmentation Graph-Theoretic Clustering.
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
3.3 Network-Centric Community Detection
Recommendation Systems
Presentation transcript:

Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS / Performance Seokshin Son

Methodology for network graph analysis Decompose the graph (extracting community structures or dense subgraphs) Interpretation and persistence of the subgraphs (identifying non- random substructures) Understanding the formation of network graphs and associated applications decomposition interpretable? persistent? 1. Characterizing the graph 2. Explain the graph formation 3. Applications /24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General remarks 3/24

Application traffic activity graphs (TAG’s) Host-to-host communication involves various types of traffic. We study snapshots of network traffic to capture the temporal correlations. We detrend the data based on ports (applications) and create TAGs. Inside hosts are (likely) service requesters and outside hosts are service providers. TAG’s are bi-partite and the associated adjacency matrices are binary. UMN Internet T HTTP port 80/443 port 25/993 Gnutella port 6346/6348 4/24

Application traffic activity graphs (TAGs) and evolution HTTP 1K to 3KDNS 1K to 3KAOL IM 1K to 3K 1K to 3K 5/24

Characteristics of TAGs We observe difference in terms of basic statistics, such as graph density, average in/out degree, etc. ALL TAGs contain giant connected component (GCC), which accounts for more than 85% of all the edges. 6/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General remarks 7/24

Dense subgraphs in TAGs Block structures in the adjacency matrices indicate dense subgraphs in TAGs Rotating rows and columns of corresponding adjacency matrices… HTTP AOL IM BitTorrentDNS 8/24

Extracting dense subgraphs Extracting dense subgraphs can be formulated as a co-clustering problem, i.e., cluster hosts into inside host groups and outside host groups, then extract pairs of groups with more edges connected (higher density). This co-clustering problem can be solved by tri-nonnegative matrix factorization algorithm, which minimizes: We use hard clustering setting by assigning each host to only one inside/outside group 9/24

Tri-nonnegative matrix factorization ≈ … … 0 … … … … … 1 × × ≈ × × A R H C Adjacency matrix assoc. with TAG Row group membership Indicator matrix Proportional to the subgraph density matrix Column group membership Indicator matrix We identify dense subgraphs based on the large entries in H R is m-by-k, C is r-by-n, hence, the product is a low-rank approximation of A, with rank min(k, r) 10/24

Subgraph prototypes Recall inside (UMN) hosts are (likely) service requesters and outside hosts are service providers. Based on the number of inside/outside hosts in each subgraph, we propose three prototypes. In-starBi-meshOut-star One inside client accesses multiple outside servers Multiple inside client accesses one outside servers Multiple inside clients interacts with many outside servers 11/24

Characterizing TAGs with subgraph prototypes Different application TAGs contain different types of subgraphs We can distinguish and characterize applications based on the subgraph components What do these subgraphs mean? HTTP AOL IMBitTorrent DNS 12/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General remarks 13/24

Interpreting HTTP bi-mesh structures Most star structures are due to popular servers or active clients We can explain more than 80% of the HTTP bi- meshes identified in one day Server correlation driven –Server farms Lycos, Yahoo, Google –Correlated service providers CDN: LLNW, Akamai, SAVVIS, Level3 Advertising providers: DoubleClick, etc. User interests driven News: WashingtonPost, New York Times, Cnet Media: ImageShack, casalemedia, tl4s2 Online shopping: Ebay, Costco, Walmart Social network: Facebook, MySpace 14/24

How are the dense subgraphs connected? (A) Randomly connected stars (C) Pool (B) Tree: client/server dual role (D) Correlated pool 15/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General remarks 16/24

Evolution of HTTP TAGs The extracted dense subgraphs are assumed to be non-random (persistent) However, each subgraph may evolve over time with hosts leaving/joining the subgraph A “best effort” similarity metric: AS domain name IP = ? If the percentage of similar hosts (at certain level) is greater than η 17/24

Evolution of HTTP TAGs (2) TAGs are temporally stable at the Domain/AS level TAGs are transient at the host level 18/24

Evolution of HTTP TAGs (2) Subgraphs last from a few hours to a whole day Subgraphs become more similar during the day time 19/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General remarks 20/24

Application: Identifying unknown traffic Similarity of UDP port 4000 TAG subgraphs with subgraphs from messenger (chat) traffic AOL Messenger Yahoo! Messenger UDP port 4000 Best match 21/24

Application: Storm worm botnet analysis StormTypical P2P Storm worm botnet graph contains many bi-mesh structures, which differs significantly from typical p2p applications Bots query for supernodes Bots acquire commands from supernodes Spam campaign 22/24

Outline Application traffic activity graph (TAG) Decomposing application traffic activity graphs Interpretation of TAG subgraphs Temporal properties of TAG subgraphs Applications Summary and General Remarks 23/24

Summary and General Remarks Many network research questions can be formulated as “graph analysis” problem These graphs are mostly not random and have latent structures. We propose a co-clustering (tri-Nonnegative matrix factorization) based method for decomposing such graphs and reveal the community structures (dense subgraphs). Obtained subgraphs are meaningful and persistent, which help understand the formation of complicated communication graphs. We demonstrate various applications based on the decomposition results 24/24

Appendix 25

Characteristics of app. TAGs These statistics show difference between various app. TAGs It does not explain the formation of TAGs 26/24

TNMF algorithm related Iterative optimization algorithm Group density matrix derivation 27/24

Pratical issues of tNMF Selection of rank and density –Linear search of appropriate ranks –Edge coverage converges after proper chosen rank and density With the above rank selection method, we achieve stable subgraph decomposition results 28/24

Pratical issues of tNMF (2) Low convergence rate and local minima –We use SVD as an initialization approach for tNMF. –Multiple runs are applied when necessary and the result with the lowest relative square error (RSE) is selected. 29/24

Number of subgraphs over time 30/24