BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010
Reference Wei Lu, Mahbod Tavallaee, Goaletsa Rammidi, Ali A. Ghorbani, "BotCop: An Online Botnet Traffic Classifier," cnsr, pp.70-77, 2009 Seventh Annual Communication Networks and Services Research Conference, /1/11
Outline Introduction Traffic classification Botnet detection Experimental evaluation Conclusions 32016/1/11
Introduction Honeypots: To capture malware, understand the basic behavior of botnets, and create bot binaries or botnet signatures. Based on the existing botnets and provides no solution for the new botnets. Automatically detect the botnets: ◦ (1) passive anomaly analysis. ◦ (2) traffic classification. 2016/1/114
Hierarchical Framework In the higher level all unknown network traffic are labeled and classified into different network application communities. ◦ P2P, HTTP Web, Chat, DataTransfer, Online Games, Mail Communication, Multimedia(streaming and VoIP) and Remote Access. In the lower level focusing on each application community, we investigate and apply the temporal- frequent characteristics of network flows to differentiate the malicious botnet behavior from the normal application traffic. 2016/1/115
Traffic Classification We first model and generate signatures for more than 470 applications according to port numbers and protocol specifications of these applications. Second, concentrating on unknown flows that cannot be identified by signatures, we investigate their temporal-frequent characteristics in order to differentiate them into the already labeled applications based on a decision tree. Fred-eZone, a free WiFi for Fredericton, Canada. 2016/1/116
Signatures Based Classifier For most applications, their initial protocol handshake steps are usually different and thus can be used for classification. 2016/1/117
Decision Tree Based Classifier A general result is that about 40% flows cannot be classified by the current payload signatures based classification method. Extend n-gram frequency into a temporal domain. Generate a set of 256-dimentional vector representing the temporal-frequent characteristics of the 256 ASCII binary bytes on the payload over a predefined time interval. The n-gram (i.e. n = 1 in particular) over a one second time interval for both source flow payload and destination flow 2016/1/118
9 Temporal-frequent metric for source flow payload of LimeWire application. Temporal-frequent metric for source flow payload of BitTorrent application.
2016/1/1110 Temporal-frequent metric for source flow payload of HTTPWeb application. Temporal-frequent metric for source flow payload of SecureWeb application.
Profiling Applications We denote the 256-dimensional n-gram byte distribution as a vector. : The frequency of the ASCII character on the flow payload over a time window. Given n historical known flows for each specific application, we define a n× 256 matrix,, for profiling applications, 2016/1/1111
A Typical Decision Tree 2016/1/1112
Botnet Detection Botnets behavior: ◦ Response time. ◦ Synchronized. 2016/1/1113
Botnet Detection Approach A set of N data objects, where. Initialization: each cluster contains only one data instance. Repeat: find the closest pair of clusters and then merge them into a single cluster. Until: clusters number = /1/1114
Experimental evaluation The botnet traffic is collected on a honeypot deployed on a real network, aggregated them into 243 flows. Traffic trace collected over 2 days are used for training and the realtime traffic flows collect on the 3rd day are used for testing. The size of input data for training decision tree is 11000× typical applications belonging to 8 typical application groups. 2016/1/1115
Applications in training dataset 2016/1/1116
Distribution of "unknown" application flows More than 90,000 flows are collected over the testing day and been identified as unknown. 2016/1/1117
Source Flow Based Decision Tree Classifier 2016/1/1118 Total number of flows correctly indentified: %
Destination Flow Based Decision Tree Classifier 2016/1/1119 Total number of flows correctly indentified: %
IRC Application Communities 2016/1/1120
Conclusions Unknown applications on the current network are firstly classified into different application communities. Then focusing on each application community. A temporal-frequent characteristic. How to evaluate the approach on the P2P community and measure its performance on P2P based botnets? 2016/1/1121