Network Service Identification through Hypergraph Clustering

Network Service Identification through Hypergraph Clustering
Li PU, Boi EPFL UESTC

Introduction In a corporate network
What applications are installed in the client machines? What services are installed in the servers? The administrator needs an in-depth overview 2 2

Introduction Network traffic is collected from client machines by Nexthink solution 3 3

Introduction Picture from NEXThink Finder Source User Application Port Destination >1000 computers, >500 applications, >1 million TCP/UDP sessions Simplify the data for a network administrator to look at 4 4

Introduction Group the ports according to the functionality. ports group = service file shareing: 139, 445 antivirus: 1281, 2967, … malware Reduce the number of groups need to skim over Assumption: one port belongs to exactly one service 5 5

Network Service A group of ports can be identified as a network service. They might be inconsecutive Evidence: Application Port Destination 6 6

Hypergraph Model What is a good partition?
A hyperedge contains one or more vertices. Each hyperedge has a weight Network service identification Vertex partition What is a good partition? Break as less (weighted) hyperedges as possible Isolate small services as well as large services 7 7

Cut-based Partitioning
Minimize the cut of the partition Different cuts for hypergraph: 1) Number of broken hyperedges [karypis2002multilevel] Designed for VLSI applications, not suitable for general hypergraph partitioning or network service identification 8 8

2) Normalized hypergraph cut [zhou2007learning] First convert a hypergraph into a simple graph Then compute the simple normalized cut 9 9

3) Non-pairwise hypergraph cut Best partition only depends on the weight, but not on the hyperedge degree 10 10

Determining Number of Clusters
To minimize hypergraph cut, there is a trivial solution where only one cluster exists We need to find the best number of clusters The graph modularity [newman2004finding] is extended for hypergraph 11 11

Partitioning Algorithm
Build a hierarchy clustering tree by hypergraph cut Bottom-up : agglomerative approach Determine the threshold by hypergraph modularity 12 12

Results We compare the following methods:
Hierarchy clustering based on HC0 (SetE1) Hierarchy clustering based on NHC2 (SetE2) K-means Synthetic dataset and Nexthink dataset collected from a real company are used 13 13

Results Synthetic data ( = max service size/min service size, = noisy data rate) All performance are averaged over 100 runs with randomly generated dataset Observations: HC0 is the best with or without unbalanced service sizes The results are sensitive to noises Q2 is very similar to PWF even if it is unsupervised 14 14

Results Synthetic data: performance on individual clusters α = 2
K-means size 3.000, 3.000, 3.000, 6.000, 6.000, 6.000 precision 0.753, 0.800, 0.779, 0.886, 0.919, 0.879 recall 0.992, 0.971, 0.953, 0.944, 0.921, 0.913 F-score 0.815, 0.838, 0.810, 0.891, 0.898, 0.864 α = 2 HC0 size 3.000, 3.000, 3.000, 6.000, 6.000, 6.000 precision 1.000, 1.000, 1.000, 1.000, 1.000, 1.000 recall F-score α = 8 K-means size 1.000, 1.100, 1.300, 7.500, 7.600, 7.900 precision 0.641, 0.679, 0.690, 0.952, 0.935, 0.935 recall 1.000, 1.000, 0.997, 0.873, 0.860, 0.915 F-score 0.745, 0.774, 0.779, 0.889, 0.868, 0.905 α = 8 HC0 size 1.000, 1.100, 1.300, 7.500, 7.600, 7.900 precision 1.000, 1.000, 1.000, 1.000, 1.000, 0.969 recall 1.000, 1.000, 1.000, 1.000, 1.000, 1.000 F-score 1.000, 1.000, 1.000, 1.000, 1.000, 0.982 15 15

Results Real data - collected by NEXThink in the client’s corporate network HC0 HC0 K-means Ports, applications, destinations 2 TCP139, TCP445 system , , , , … 3 TCP3464, TCP3466 nvdkit.exe, radconct.exe, radstgms.exe, … , , , , , , ... 5 TCP2967, UDP1281, UDP2967, UDP38293 rtvscan.exe, savroam.exe , , , , , , ... Ports, applications, destinations k1 TCP2638 vlaknagl.exe, novoterm.exe, tpmeritve.exe k2 UDP2638 novoterm.exe, tpmeritve.exe k3 TCP50000 corporateebankmain.exe, commonupdt.exe, corporateebank.exe Ports, applications, destinations k1 TCP2638, TCP8290, TCP16384 vlaknagl.exe, novoterm.exe, tpmeritve.exe, hpqscnvw.exe, agentservice.exe, ... , , , , k2 UDP2638, UDP138, TCP2869 novoterm.exe, tpmeritve.exe, system, svchost.exe, wmpnetwk.exe, , , , , , ... k3 TCP50000, TCP40000, TCP1233 corporateebankmain.exe, commonupdt.exe, corporateebank.exe, mmc.exe, java.exe, ... , , 16 16

Discussion HC0 produces better results than NHC2 on the synthetic data. But is it suitable for all hypergraph structure? (given the fact that NHC2 is easier to deal with) The Nexthink dataset is interesting. Can we play with it a bit more? Where can we find killer applications of community detection techniques? (for both simple graph and hypergraph) 17 17

Thank you

Network Service Identification through Hypergraph Clustering

Similar presentations

Presentation on theme: "Network Service Identification through Hypergraph Clustering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Network Service Identification through Hypergraph Clustering

Similar presentations

Presentation on theme: "Network Service Identification through Hypergraph Clustering"— Presentation transcript:

Similar presentations

About project

Feedback