BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)

Motivation  Sophisticated type of malware: Bots  Multiple bots under single control botnet  Distinct characteristics: command and control (C&C) channel  Threats raised by bots:  Spam  Information theft (e.g., credit card data)  Identity theft  Click fraud  Distributed denial of service attacks (DDoS) CoNEXT 2012 C&C Victim hosts $2M-$600M revenue estimated for single botnet 2/24

Challenge  Complementary approach: Network based  Vertical correlation (single end host) (Rishi, BotHunter, Wurzinger et al., …)  Typical behavior (SPAM, DDos traffic)  Anomaly detection (Giroire et al.)  Packet analysis: HTTP structure, payloads, typical signatures  Horizontal correlation (multiple end hosts) (BotSniffer, BotMiner, TAMD…):  Two or more hosts do the same malicious stuff CoNEXT 2012  How to detect bot infections?  Classically: End host – Anti Virus Scanner  But: Requires installation on every machine 3/24

Challenge and Solution Approach  Existing vertical: Typically relies on scanning, spam, DDoS traffic and requires packet inspection.  Existing horizontal: Requires multiple hosts in single domain to be infected. Also triggered by noisy activity (e.g., BotMiner) CoNEXT 2012  Contribution: Vertical detection of single bot infections without packet inspection!  Botmaster establishes C&C connections frequently to disseminate orders. C&C connections show patterns.  Use these statistical properties of C&C communication! Core assumption: Periodic behavior! 4/24

Methodology CoNEXT 2012  Basic machine learning approach:  Learn about bot behavior:  Training phase (a)  Use learned behavior:  Detection phase (b)  Training:  Observe malware in controlled environment  Extract flows and build traces  Perform statistical analysis to obtain “features”  Create models to describe malware 5/24

Methodology – Detection Phase CoNEXT 2012  Detection:  Obtain traffic  Perform analysis analog to training  Compare statistical features of the traffic with models  During the whole process:  No deep packet inspection! 6/24

Methodology – Details  Analysis performed on flows  Flow is a connection from A to B:  Source IP address  Destination IP address  Source port  Destination port  Transport protocol ID  Start time  Duration of connection  Number of bytes  Number of packets CoNEXT 2012 This information is easy to obtain in real-world environments! Example: NetFlow 7/24

Methodology – Details cont’d  Trace: Chronologically ordered sequence of flows.  Represents long term communication behavior! CoNEXT 2012 Example for two dimensions: time and duration 8/24

Distinguishing Characteristics CoNEXT 2012  Bot traffic is more regular than normal, benign traffic! The lower the bar, the more periodic. 9/24

Methodology – Features  Use statistical features to describe trace!  Average time between two flows.  Average duration of flows.  Average number of source bytes.  Average number of destination bytes.  A Fourier transform to detect underlying communication frequencies. More robust than simple averaging. CoNEXT 201210/24

24min Methodology – Models  Example scenario:  Multiple binary versions of the same bot family generated traces  Example: time interval feature:  “Intervals of 8, 20, or 210 minutes are typical for this bot.”  Clusters with low standard deviation are trustworthy representations of malware behavior  Drop very small (one-element) clusters CoNEXT 2012 20min 18min 8min 7.5min 17min 22min 9min 230min 190min Feature clustering… 20min 8.2min 210min Cluster centroids 912min 11/24

Methodology – Model Matching CoNEXT 2012  Compare a trace to the cluster centers of a malware family model:  1. If trace feature “hits” a model:  Increase scoring value based on cluster quality  2. Take model with highest scoring value  3. If scoring value > threshold:  Consider model matched  Some more math involved (quality of matching trace, clustering algorithm, minimal trace length, etc.) 12/24

Evaluation CoNEXT 2012  Method is implemented in BotFinder  Six representative malware families  Dataset LabCapture: 2.5 months of lab traffic with 60 machines  Full traffic capture – allows verificiation  Should contain benign traffic only  Dataset ISPNetflow: one month of NetFlow data from large network  Reflects 540 Terabytes of data or 150 MegaBytes(!) per second of traffic.  No ground truth but possibility to compare to blacklisted IP addresses and judgment of usability. 13/24

 Execution:  Split the ground truth malware dataset randomly into a training set and a detection set  Mix the detection set with all traces from the LabCapture dataset  Train BotFinder on the training set  Run BotFinder against the detection set  Result summary:  77% detection rate with low false positives (1 out of 5 million traces) Evaluation – Cross Validation CoNEXT 2012 Training data Training set Detection set Lab- Capture TrainDetect Repeat experiment 50 times per acceptance threshold 14/24

Evaluation – Cross Validation CoNEXT 201215/24

Evaluation – Comparison to BotHunter CoNEXT 2012  BotHunter is an optimized Snort Intrusion Detection System. It requires packet inspection and leverages anomaly detection.  Many false positives for BotHunter, typically raised by IRC activity or binary downloads.  Detection Results:  BotFinder Detection Rate: 77.5%  BotHunter Detection Rate: 10%  BotFinder outperformed BotHunter and shows relatively high detection rates and low false positives. * *: http://www.bothunter.net Experimental setup not reproducing elements crucial to BotHunter? 16/24

Evaluation - ISPNetFlow CoNEXT 2012  Challenging to analyze as minimal information (only internal IP ranges) is available  542 traces (from >1 billion traces) are identified by BotFinder to be malicious  On average 14.6 alerts per day 17/24

 Speed is sufficient for large networks:  3min for 15M NetFlow records (~15min of ISPNetFlow, 800MB filesize)  Processing is dominated by feature extraction  Easy to parallelize  Detailed IP address investigation of raised alarms:  Comparison of external IPs with publicly available blacklists*  Result: 56% of all IPs are known to be malicious!  The “false positives” show a large cluster of connections to Apple  With whitelisted Apple: 61% of all raised alerts connect to known malicious pages  Strong support that BotFinder works! Evaluation ISP NetFlow CoNEXT 2012 *=rbls.org 18/24

Bot Evolution  Botmasters may try to evade detection by changing communication patterns:  Introduction of randomized intervals  Introduction of large gaps between flows  IP or domain flux (fast changing C&C servers)  Randomization impact:  Randomizing individual features does not significantly impact detection CoNEXT 2012 Lower limit! 19/24

FFT Peak Detection with Gaps CoNEXT 201220/24

Anti-Domain Flux CoNEXT 2012  Problem: Fast C&C-Domain/IP changes  Problem: BotFinder can’t create a sufficiently long trace  Idea:  Look at each source IP and compare all connections with each other  When two connections look very similar, combine them to one!  Inherently horizontal correlation per source IP! Change of IP address Trace “breaks” Subtrace 1: A to C&C IP 1 Subtrace 2: A to C&C IP 2 21/24

 How can one check that it is working?  Split of real C&C traces and random other, long traces (from real traffic). Does BotFinder recombine them?  “Low” overhead: 85% increase in the ISPNetFlow. Large distance! Good! Additional Pre-Processing CoNEXT 201222/24

Conclusion CoNEXT 2012  High detection rates - nearly 80% - with low false positives and no need for packet inspection!  BotFinder shows better results than BotHunter.  61% of BotFinder-flagged connections in the ISPNetFlow dataset were destined to known, blacklisted host!  BotFinder is robust against potential evasion strategies. 23/24

Questions CoNEXT 2012  Thank you for your attention!  Any questions? 24/24

BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)

Similar presentations

Presentation on theme: "BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)

Similar presentations

Presentation on theme: "BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)"— Presentation transcript:

Similar presentations

About project

Feedback