Network Service Identification through Hypergraph Clustering

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Data Mining Techniques: Clustering
Tru-Alarm: Trustworthiness Analysis of Sensor Network in Cyber Physical Systems Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Social Network Analysis via Factor Graph Model
Honeypots. Introduction A honeypot is a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems.
CLIENT SERVER VS PEER TO PEER Networks. Lesson objectives Candidates should understand the advantages and disadvantages of: client server networks peer.
Submitted by: Shailendra Kumar Sharma 06EYTCS049.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
1 Figure 3-27: Use of TCP and UDP Port Number Client From: :50047 To: :80 SMTP Server Port 25 Webserver.
Nir Geffen Yotam Margolin Supervisor Professor Zeev Volkovich 1 ORT BRAUDE COLLEGE – SE DEPT
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Structures and Algorithms in Parallel Computing
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Chapter 1 Real World Incidents Spring Incident Response & Computer Forensics.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
UDP: User Datagram Protocol Brian Jorgage CSC /24/2004.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
Unsupervised Learning
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Cohesive Subgraph Computation over Large Graphs
Hierarchical Agglomerative Clustering on graphs
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
Distributed Network Traffic Feature Extraction for a Real-time IDS
Hierarchical Clustering
Network Configurations
Introduction to Networking
Dipartimento di Ingegneria «Enzo Ferrari»,
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
Community detection in graphs
Network Modelling Group
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Transport Layer Unit 5.
Flavio Toffalini, Ivan Homoliak, Athul Harilal,
K-means and Hierarchical Clustering
Jianping Fan Dept of CS UNC-Charlotte
Lecture 9: Entity Resolution
Hierarchical and Ensemble Clustering
12/6/2018 Honeypot ICT Infrastructure Sashan
DATA MINING Introductory and Advanced Topics Part II - Clustering
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
iSRD Spam Review Detection with Imbalanced Data Distributions
Hierarchical and Ensemble Clustering
Consensus Partition Liang Zheng 5.21.
CSCI N317 Computation for Scientific Applications Unit Weka
3.3 Network-Centric Community Detection
Leverage Consensus Partition for Domain-Specific Entity Coreference
Junheng, Shengming, Yunsheng 11/09/2018
CSE572: Data Mining by H. Liu
Hierarchical Clustering
“Traditional” image segmentation
Neal Kurande, WinaGodwin Anyanwu Jr., Adam Chau
Minimum Spanning Trees
Unsupervised Learning
Presentation transcript:

Network Service Identification through Hypergraph Clustering Li PU, Boi FALTINGS @ EPFL 2011.02.15 @ UESTC

Introduction In a corporate network What applications are installed in the client machines? What services are installed in the servers? The administrator needs an in-depth overview 2 2

Introduction Network traffic is collected from client machines by Nexthink solution 3 3

Introduction Picture from NEXThink Finder Source User Application Port Destination >1000 computers, >500 applications, >1 million TCP/UDP sessions Simplify the data for a network administrator to look at 4 4

Introduction Group the ports according to the functionality. ports group = service file shareing: 139, 445 antivirus: 1281, 2967, … malware Reduce the number of groups need to skim over Assumption: one port belongs to exactly one service 5 5

Network Service A group of ports can be identified as a network service. They might be inconsecutive Evidence: Application Port Destination 6 6

Hypergraph Model What is a good partition? A hyperedge contains one or more vertices. Each hyperedge has a weight Network service identification Vertex partition What is a good partition? Break as less (weighted) hyperedges as possible Isolate small services as well as large services 7 7

Cut-based Partitioning Minimize the cut of the partition Different cuts for hypergraph: 1) Number of broken hyperedges [karypis2002multilevel] Designed for VLSI applications, not suitable for general hypergraph partitioning or network service identification 8 8

Cut-based Partitioning 2) Normalized hypergraph cut [zhou2007learning] First convert a hypergraph into a simple graph Then compute the simple normalized cut 9 9

Cut-based Partitioning 3) Non-pairwise hypergraph cut Best partition only depends on the weight, but not on the hyperedge degree 10 10

Determining Number of Clusters To minimize hypergraph cut, there is a trivial solution where only one cluster exists We need to find the best number of clusters The graph modularity [newman2004finding] is extended for hypergraph 11 11

Partitioning Algorithm Build a hierarchy clustering tree by hypergraph cut Bottom-up : agglomerative approach Determine the threshold by hypergraph modularity 12 12

Results We compare the following methods: Hierarchy clustering based on HC0 (SetE1) Hierarchy clustering based on NHC2 (SetE2) K-means Synthetic dataset and Nexthink dataset collected from a real company are used 13 13

Results Synthetic data ( = max service size/min service size, = noisy data rate) All performance are averaged over 100 runs with randomly generated dataset Observations: HC0 is the best with or without unbalanced service sizes The results are sensitive to noises Q2 is very similar to PWF even if it is unsupervised 14 14

Results Synthetic data: performance on individual clusters α = 2 K-means size 3.000, 3.000, 3.000, 6.000, 6.000, 6.000 precision 0.753, 0.800, 0.779, 0.886, 0.919, 0.879 recall 0.992, 0.971, 0.953, 0.944, 0.921, 0.913 F-score 0.815, 0.838, 0.810, 0.891, 0.898, 0.864 α = 2 HC0 size 3.000, 3.000, 3.000, 6.000, 6.000, 6.000 precision 1.000, 1.000, 1.000, 1.000, 1.000, 1.000 recall F-score α = 8 K-means size 1.000, 1.100, 1.300, 7.500, 7.600, 7.900 precision 0.641, 0.679, 0.690, 0.952, 0.935, 0.935 recall 1.000, 1.000, 0.997, 0.873, 0.860, 0.915 F-score 0.745, 0.774, 0.779, 0.889, 0.868, 0.905 α = 8 HC0 size 1.000, 1.100, 1.300, 7.500, 7.600, 7.900 precision 1.000, 1.000, 1.000, 1.000, 1.000, 0.969 recall 1.000, 1.000, 1.000, 1.000, 1.000, 1.000 F-score 1.000, 1.000, 1.000, 1.000, 1.000, 0.982 15 15

Results Real data - collected by NEXThink in the client’s corporate network HC0 HC0 K-means Ports, applications, destinations 2 TCP139, TCP445 system 10.130.10.111, 10.130.10.107, 10.130.10.226,10.130.10.98, … 3 TCP3464, TCP3466 nvdkit.exe, radconct.exe, radstgms.exe, … 10.130.10.94, 10.144.0.5, 10.136.0.5, 10.60.15.5,10.140.1.5, 10.20.3.8, ... 5 TCP2967, UDP1281, UDP2967, UDP38293 rtvscan.exe, savroam.exe 10.130.10.98, 10.144.0.5, 10.136.0.5, 10.2.0.5, 10.60.15.5, 10.20.3.8, ... Ports, applications, destinations k1 TCP2638 vlaknagl.exe, novoterm.exe, tpmeritve.exe 10.21.49.7 k2 UDP2638 novoterm.exe, tpmeritve.exe 10.255.255.255 k3 TCP50000 corporateebankmain.exe, commonupdt.exe, corporateebank.exe Ports, applications, destinations k1 TCP2638, TCP8290, TCP16384 vlaknagl.exe, novoterm.exe, tpmeritve.exe, hpqscnvw.exe, agentservice.exe, ... 10.21.49.7, 10.136.10.2,10.0.21.105, 10.130.11.86,10.100.0.15 k2 UDP2638, UDP138, TCP2869 novoterm.exe, tpmeritve.exe, system, svchost.exe, wmpnetwk.exe, ... 10.255.255.255, 10.136.0.36, 10.200.21.74, 10.200.255.255, 10.136.0.30, ... k3 TCP50000, TCP40000, TCP1233 corporateebankmain.exe, commonupdt.exe, corporateebank.exe, mmc.exe, java.exe, ... 10.21.49.7, 10.130.10.111, 10.150.31.8 16 16

Discussion HC0 produces better results than NHC2 on the synthetic data. But is it suitable for all hypergraph structure? (given the fact that NHC2 is easier to deal with) The Nexthink dataset is interesting. Can we play with it a bit more? Where can we find killer applications of community detection techniques? (for both simple graph and hypergraph) 17 17

Thank you