11 Automatic Discovery of Botnet Communities on Large-Scale Communication Networks Wei Lu, Mahbod Tavallaee and Ali A. Ghorbani - in ACM Symposium on InformAtion, Computer and Communications Security (ASIACCS’09) Reporter: 高嘉男 Advisor: Chin-Laung Lei 2009/09/28
2 Outline Introduction Methodology Traffic Classification ◦ Payload signature based classification ◦ Identifying unknown traffic applications Botnet Detection Experimental Evaluation Conclusions
3 Life-cycle of an IRC Botnet
4 Approaches of Botnet Detection Honeypots ◦ Capture malware & understand the behavior of botnets. Passive anomaly analysis ◦ Usually independent of the traffic content ◦ Example: Botsniffer & Botminer Traffic application classification ◦ Classifying traffic into IRC traffic & non-IRC traffic ◦ Can only detected IRC based botnets
5 Two Challenges of Botnet Detection Detect new (or recent) appeared botnets ◦ Centralized C&C structure -> decentralized (P2P) structure ◦ Network protocols: IRC or HTTP -> own developed protocol Identify applications for network traffic ◦ Port number: limited information ◦ Examine the payload of network flows and then create signatures for each application Legal issues related to privacy Encrypted traffic 40% network flows cannot be classified
6 Methodology
7 Payload Signature based Classification Characteristics of bit strings in the payload
8 Payload Signature based Classification (cont’d)
9 Identifying Unknown Traffic Applications Basic idea: ◦ Association relationship between known traffic & unknown traffic Step 1: ◦ Cluster flows in terms of the src IP & the dst IP ◦ Generate a set of rectangles -> community Step 2: ◦ Cluster flows in terms of the dst IP & the dst port ◦ Generate a set of rectangles -> application community Label each application community ◦ Assign unknown flows according to probability of known flows
10 Identifying Unknown Traffic Applications (cont’d)
11 Identifying Unknown Traffic Applications (cont’d)
12 Botnet Detection Object: ◦ Differentiate the botnet behavior from the normal traffic on a specific application community Concept: ◦ Temporal-frequent characteristics of the 256 ASCII binary bytes in the payload over a time period Botnet behavior: ◦ Response time of bots: immediate and accurate once they receive commands ◦ Bots might be synchronized with each other
13 Detection Algorithm
14 Detection Algorithm (cont’d) Metric: standard deviation for m each cluster m ◦ The higher the value of average m over 256 ACSII characters for flows on a cluster m, the more normal the cluster m is. Given the frequency vectors for n flows: ◦ j = standard deviation of the j th ASCII over n flows ◦ average standard deviation over 256 ACSII characters for flows
15 Detection Algorithm (cont’d)
16 Tested Network Topology
17 Evaluation on Traffic Classification Part of known traffic → label them as unknown
18 Evaluation on Botnet Detection
19 Evaluation on Botnet Detection (cont’d)
20 Conclusions They propose a novel application discovery approach for automatically classifying network applications on a large-scale WiFi ISP network. They develop a generic algorithm to discriminate general botnet behavior from the normal network traffic on a specific application community, which is based on n-gram (frequent characteristics) of flow payload over a time period (temporal characteristics). Evaluation results show that their approach obtains a very high detection rate (approaching 100% for IRC bot) with a low false alarm rate when detecting IRC botnet traffic.
21 Reference Lu, W., M. Tavallaee, and A.A. Ghorbani, “ Automatic Discovery of Botnet Communities on Large‐Scale Communication Networks ”, in ACM Symposium on InformAtion, Computer and Communications Security (ASIACCS’09). 2009: Sydney, Australia.