Download presentation
Presentation is loading. Please wait.
Published byLambert Doyle Modified over 9 years ago
1
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
2
Motivation Sophisticated type of malware: Bots Multiple bots under single control botnet Distinct characteristics: command and control (C&C) channel Threats raised by bots: Spam Information theft (e.g., credit card data) Identity theft Click fraud Distributed denial of service attacks (DDoS) CoNEXT 2012 C&C Victim hosts $2M-$600M revenue estimated for single botnet 2/24
3
Challenge Complementary approach: Network based Vertical correlation (single end host) (Rishi, BotHunter, Wurzinger et al., …) Typical behavior (SPAM, DDos traffic) Anomaly detection (Giroire et al.) Packet analysis: HTTP structure, payloads, typical signatures Horizontal correlation (multiple end hosts) (BotSniffer, BotMiner, TAMD…): Two or more hosts do the same malicious stuff CoNEXT 2012 How to detect bot infections? Classically: End host – Anti Virus Scanner But: Requires installation on every machine 3/24
4
Challenge and Solution Approach Existing vertical: Typically relies on scanning, spam, DDoS traffic and requires packet inspection. Existing horizontal: Requires multiple hosts in single domain to be infected. Also triggered by noisy activity (e.g., BotMiner) CoNEXT 2012 Contribution: Vertical detection of single bot infections without packet inspection! Botmaster establishes C&C connections frequently to disseminate orders. C&C connections show patterns. Use these statistical properties of C&C communication! Core assumption: Periodic behavior! 4/24
5
Methodology CoNEXT 2012 Basic machine learning approach: Learn about bot behavior: Training phase (a) Use learned behavior: Detection phase (b) Training: Observe malware in controlled environment Extract flows and build traces Perform statistical analysis to obtain “features” Create models to describe malware 5/24
6
Methodology – Detection Phase CoNEXT 2012 Detection: Obtain traffic Perform analysis analog to training Compare statistical features of the traffic with models During the whole process: No deep packet inspection! 6/24
7
Methodology – Details Analysis performed on flows Flow is a connection from A to B: Source IP address Destination IP address Source port Destination port Transport protocol ID Start time Duration of connection Number of bytes Number of packets CoNEXT 2012 This information is easy to obtain in real-world environments! Example: NetFlow 7/24
8
Methodology – Details cont’d Trace: Chronologically ordered sequence of flows. Represents long term communication behavior! CoNEXT 2012 Example for two dimensions: time and duration 8/24
9
Distinguishing Characteristics CoNEXT 2012 Bot traffic is more regular than normal, benign traffic! The lower the bar, the more periodic. 9/24
10
Methodology – Features Use statistical features to describe trace! Average time between two flows. Average duration of flows. Average number of source bytes. Average number of destination bytes. A Fourier transform to detect underlying communication frequencies. More robust than simple averaging. CoNEXT 201210/24
11
24min Methodology – Models Example scenario: Multiple binary versions of the same bot family generated traces Example: time interval feature: “Intervals of 8, 20, or 210 minutes are typical for this bot.” Clusters with low standard deviation are trustworthy representations of malware behavior Drop very small (one-element) clusters CoNEXT 2012 20min 18min 8min 7.5min 17min 22min 9min 230min 190min Feature clustering… 20min 8.2min 210min Cluster centroids 912min 11/24
12
Methodology – Model Matching CoNEXT 2012 Compare a trace to the cluster centers of a malware family model: 1. If trace feature “hits” a model: Increase scoring value based on cluster quality 2. Take model with highest scoring value 3. If scoring value > threshold: Consider model matched Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.) 12/24
13
Evaluation CoNEXT 2012 Method is implemented in BotFinder Six representative malware families Dataset LabCapture: 2.5 months of lab traffic with 60 machines Full traffic capture – allows verificiation Should contain benign traffic only Dataset ISPNetflow: one month of NetFlow data from large network Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of traffic. No ground truth but possibility to compare to blacklisted IP addresses and judgment of usability. 13/24
14
Execution: Split the ground truth malware dataset randomly into a training set and a detection set Mix the detection set with all traces from the LabCapture dataset Train BotFinder on the training set Run BotFinder against the detection set Result summary: 77% detection rate with low false positives (1 out of 5 million traces) Evaluation – Cross Validation CoNEXT 2012 Training data Training set Detection set Lab- Capture TrainDetect Repeat experiment 50 times per acceptance threshold 14/24
15
Evaluation – Cross Validation CoNEXT 201215/24
16
Evaluation – Comparison to BotHunter CoNEXT 2012 BotHunter is an optimized Snort Intrusion Detection System. It requires packet inspection and leverages anomaly detection. Many false positives for BotHunter, typically raised by IRC activity or binary downloads. Detection Results: BotFinder Detection Rate: 77.5% BotHunter Detection Rate: 10% BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives. * *: http://www.bothunter.net Experimental setup not reproducing elements crucial to BotHunter? 16/24
17
Evaluation - ISPNetFlow CoNEXT 2012 Challenging to analyze as minimal information (only internal IP ranges) is available 542 traces (from >1 billion traces) are identified by BotFinder to be malicious On average 14.6 alerts per day 17/24
18
Speed is sufficient for large networks: 3min for 15M NetFlow records (~15min of ISPNetFlow, 800MB filesize) Processing is dominated by feature extraction Easy to parallelize Detailed IP address investigation of raised alarms: Comparison of external IPs with publicly available blacklists* Result: 56% of all IPs are known to be malicious! The “false positives” show a large cluster of connections to Apple With whitelisted Apple: 61% of all raised alerts connect to known malicious pages Strong support that BotFinder works! Evaluation ISP NetFlow CoNEXT 2012 *=rbls.org 18/24
19
Bot Evolution Botmasters may try to evade detection by changing communication patterns: Introduction of randomized intervals Introduction of large gaps between flows IP or domain flux (fast changing C&C servers) Randomization impact: Randomizing individual features does not significantly impact detection CoNEXT 2012 Lower limit! 19/24
20
FFT Peak Detection with Gaps CoNEXT 201220/24
21
Anti-Domain Flux CoNEXT 2012 Problem: Fast C&C-Domain/IP changes Problem: BotFinder can’t create a sufficiently long trace Idea: Look at each source IP and compare all connections with each other When two connections look very similar, combine them to one! Inherently horizontal correlation per source IP! Change of IP address Trace “breaks” Subtrace 1: A to C&C IP 1 Subtrace 2: A to C&C IP 2 21/24
22
How can one check that it is working? Split of real C&C traces and random other, long traces (from real traffic). Does BotFinder recombine them? “Low” overhead: 85% increase in the ISPNetFlow. Large distance! Good! Additional Pre-Processing CoNEXT 201222/24
23
Conclusion CoNEXT 2012 High detection rates - nearly 80% - with low false positives and no need for packet inspection! BotFinder shows better results than BotHunter. 61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host! BotFinder is robust against potential evasion strategies. 23/24
24
Questions CoNEXT 2012 Thank you for your attention! Any questions? 24/24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.