Automatically Generating Models for Botnet Detection Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel, Engin Kirda Vienna University.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
An Introduction of Botnet Detection – Part 2 Guofei Gu, Wenke Lee (Georiga Tech)
Greg Williams CS691 Summer Honeycomb  Introduction  Preceding Work  Important Points  Analysis  Future Work.
Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Intrusion Detection Systems and Practices
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
EECS Presentation Web Tap: Intelligent Intrusion Detection Kevin Borders.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
Botnet Dection system. Introduction  Botnet problem  Challenges for botnet detection.
Detecting Botnets Using Hidden Markov Models on Network Traces Wade Gobel Bio-Grid, Summer 2008.
Botnets Usman Jafarey Including slides from The Zombie Roundup by Cooke, Jahanian, McPherson of the University of Michigan.
Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
SMS Mobile Botnet Detection Using A Multi-Agent System Abdullah Alzahrani, Natalia Stakhanova, and Ali A. Ghorbani Faculty of Computer Science, University.
2009/9/151 Rishi : Identify Bot Contaminated Hosts By IRC Nickname Evaluation Reporter : Fong-Ruei, Li Machine Learning and Bioinformatics Lab In Proceedings.
BOTNETS & TARGETED MALWARE Fernando Uribe. INTRODUCTION  Fernando Uribe   IT trainer and Consultant for over 15 years specializing.
WAC/ISSCI Automated Anomaly Detection Using Time-Variant Normal Profiling Jung-Yeop Kim, Utica College Rex E. Gantenbein, University of Wyoming.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
Automated malware classification based on network behavior
1 Measurements and Mitigation of Peer-to-Peer-based Botnets: A Case Study on Storm Worm T. Holz, M. Steiner, F. Dahl, E. Biersack, and F. Freiling - Proceedings.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
B OTNETS T HREATS A ND B OTNETS DETECTION Mona Aldakheel
An Evaluation model of botnet based on peer to peer Gao Jian KangFeng ZHENG,YiXian Yang,XinXin Niu 2012 Fourth International Conference on Computational.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Intrusion Detection for Grid and Cloud Computing Author Kleber Vieira, Alexandre Schulter, Carlos Becker Westphall, and Carla Merkle Westphall Federal.
Speaker : Hong-Ren Jiang A Novel Testbed for Detection of Malicious Software Functionality 1.
Amir Houmansadr CS660: Advanced Information Assurance Spring 2015
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu, Roberto Perdisci, Junjie Zhang, and.
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
11 Automatic Discovery of Botnet Communities on Large-Scale Communication Networks Wei Lu, Mahbod Tavallaee and Ali A. Ghorbani - in ACM Symposium on InformAtion,
Topics to be covered 1. What are bots,botnet ? 2.How does it work? 4.Prevention of botnet. 3.Types of botnets.
Principles of Computer Security: CompTIA Security + ® and Beyond, Third Edition © 2012 Principles of Computer Security: CompTIA Security+ ® and Beyond,
A Multifaceted Approach to Understanding the Botnet Phenomenon Authors : Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Terzis Computer Science.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
IEEE Communications Surveys & Tutorials 1st Quarter 2008.
Week 10-11c Attacks and Malware III. Remote Control Facility distinguishes a bot from a worm distinguishes a bot from a worm worm propagates itself and.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Presented by: Akbar Saidov Authors: M. Polychronakis, K. G. Anagnostakis, E. P. Markatos.
Attack signatures derived from Metasploit Final Presentation E. Ramirez A. Zoghbi
1 HoneyNets. 2 Introduction Definition of a Honeynet Concept of Data Capture and Data Control Generation I vs. Generation II Honeynets Description of.
Intrusion Detection System (IDS). What Is Intrusion Detection Intrusion Detection is the process of identifying and responding to malicious activity targeted.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Cryptography and Network Security Sixth Edition by William Stallings.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
I NTRUSION P REVENTION S YSTEM (IPS). O UTLINE Introduction Objectives IPS’s Detection methods Classifications IPS vs. IDS IPS vs. Firewall.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Speaker:Chiang Hong-Ren An Investigation and Implementation of Botnet Detection Schemes.
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Presented by D Callahan.
Role Of Network IDS in Network Perimeter Defense.
Speaker: Hom-Jay Hom Date:2009/10/20 Botnet Research Survey Zhaosheng Zhu. et al July 28-August
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Using Honeypots to Improve Network Security Dr. Saleh Ibrahim Almotairi Research and Development Centre National Information Centre - Ministry of Interior.
DIVYA K 1RN09IS016 RNSIT1. Cloud computing provides a framework for supporting end users easily through internet. One of the security issues is how to.
Botnets Usman Jafarey Including slides from The Zombie Roundup by Cooke, Jahanian, McPherson of the University of Michigan.
A lustrum of malware network communication: Evolution & insights
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
NetSpy: Automatic Generation of Spyware Signatures for NIDS
Data Mining & Machine Learning Lab
Presentation transcript:

Automatically Generating Models for Botnet Detection Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel, Engin Kirda Vienna University of Technology Institute Eurecom, Sophia Antipolis, France University of Mannheim, Germany University of California, Santa Barbara 14th European Symposium on Research in Computer Security (ESORICS), 2009

2 Outline Introduction System Overview ▫ Detection Models ▫ Model Generation Analyzing Bot Activity ▫ Locating Bot Responses ▫ Extracting Model Generation Data Generating Detection Models ▫ Command Model Generation ▫ Response Model Generation ▫ Mapping Models into Bro Signatures Evaluation Related Work Limitations Conclusions 2

3 Introduction: abstract aims to detect bots independent of any prior information about the C&C channels or propagation vectors without requiring multiple infections for correlation target the characteristic fact that every bot receives commands from the botmaster to which it responds in a specific way models are generated automatically from network traffic traces recorded from actual bot instances 3

4 Introduction: botnet A bot is a type of malware that is written with the intent of compromising and taking control of hosts on the Internet. It is typically installed on the victim’s computer by either ▫ exploiting a software vulnerability in the web browser or the operating system, or ▫ by using social engineering techniques to trick the victim into installing the bot herself. The distinguishing characteristic of a bot is its ability to establish a command and control (C&C) channel that allows an attacker to remotely control or update a compromised machine. 4

5 Introduction: past approaches host-based analysis techniques ▫ such as anti-virus (AV) software network-level ▫ vertical correlation techniques  These techniques focus on the detection of individual bots.  by checking for traffic patterns or content that reveal C&C traffic or malicious, bot-related activities  require prior knowledge about the C&C channels and the propagation vectors of the bots that they can detect ▫ horizontal correlation approaches  analyze the network traffic for patterns that indicate that two or more hosts behave similarly  a command that is sent to several members, bots react in the same fashion  cannot detect individual bots, need at least two bots 5

6 Introduction: goal The authors propose a detection approach to identify single, bot-infected machines ▫ without any prior knowledge about C&C mechanisms ▫ or the way in which a bot propagates Their detection model leverages the characteristic behavior of a bot ▫ it (bot) receives commands from the botmaster, and ▫ carries out some actions in response to these commands Similar to previous work, they assume that the command and response activity results in some kind of network communication that can be observed. 6

7 Introduction: basic idea Dynamic approach ▫ by launching a bot in a controlled environment and recording its network activity (traces), we can observe the commands that this bot receives as well as the corresponding responses. Points ▫ They present techniques that allow identifying points in a network trace that likely correlate with response activity. Response and Command ▫ They analyze the traffic that precedes this response to find the corresponding command. Model ▫ generate detection models that can be deployed to scan network traffic for similar activity  Capture the command and response activity  Automated mechanism to generate bot detection models 7

8 System Overview: bot collection The input to the system is a collection of bot binaries form the wild. ▫ honeynet systems such as Nepenthes [2], ▫ or through Anubis [5], a malware collection and analysis platform The output of our system is a number of models that can be used to detect instances of different bot families. 8

9 System Overview: Detection Models In their system, a detection model has two states. ▫ (1) The first state of the model specifies signs in the network traffic that indicate that a particular bot command is sent.  For example, such a sign could be the occurrence of the string.advscan, which is a frequently-used command to instruct an IRC bot to start scanning.  Once such a command is identified, the detection model is switched into the second state. ▫ (2) This second state specifies the signs that represent a particular bot response.  Such a sign could be the fact that the number of new connections opened by a host is above a certain threshold, which indicates that a scan is in progress.  When a model is in the second state and the system identifies activity that matches the specified behavior, a bot infection is reported. ▫ (X) If no activity is found that matches the specification of the second state for a certain time period, the model is switched back to the first state. 9

10 System Overview: correlation (logical) model instance ▫ When a command is found to be sent to host x, only the model for this host is switched to the second state. ▫ Therefore, there is no correlation between the activity of different hosts. ▫ For example, when a scan command is sent to host x, while immediately thereafter, host y initiates a scan, no alert is raised. Two states: content-based and network-based. 10

11 System Overview: model generation Finding responses ▫ They first look for the activity that indicates that a response has occurred. (rather than identify command first!)  Because a response launched by a bot is often more visible in the network trace than an incoming command. ▫ Once a bot response is identified, it is characterized by a behavior profile. (Sec 3) Finding commands ▫ The network traffic before this point must contain the command that has caused this response. ▫ Before each point at which a significant change in traffic behavior is detected, we extract a snippet, a small section of the network trace. ▫ Typically, different commands will lead to responses that are different. ▫ They cluster those traffic snippets that lead to similar responses, assuming that they contain the same command. ▫ Once clusters of related network snippets have been identified, they search them for sets of common (string) tokens. (Sec 4) 11

12 System Overview: model generation Putting it all together. ▫ Extracted tokens can be directly used to represent the bot command in the first state of the detection model. ▫ For the second state, the author leverage the network behavior profiles that characterize bot response activity.  A bot detection model consists of a set of tokens that represent the bot command, followed by a network-level description of the expected response 12

13 Analyzing Bot Activity: Locating Bot Responses by checking for sudden changes in the network traffic ▫ most current bot do so Change point detection (CPD) problem ▫ CPD algorithms operate on time series, that is, on chronologically ordered sequences of data values. ▫ Their goal is to find those points in time at which the data values change abruptly. 13

14 Analyzing Bot Activity: CPD & CUSUM The authors characterize the bot’s behavior using the features during a given time interval. ▫ traffic profile (a normalized vector) CUSUM (cumulative sum) for solving CPD problem ▫ The ordered sequence of value d(t) forms the input of next step. (if the d(t) is sufficiently large and a local maximum, a change point at time interval t is determined.) 14 Euclidean distance

15 Analyzing Bot Activity: CUSUM parameters local_max ▫ an upper bound for the normal, expected deviation of the present traffic form past cusum_max ▫ For each time interval t, CUSUM adds d(t)-local_max to a cumulative sum S. ▫ It determines the upper bound that S may reach before a change point is reported. If there are consecutive time intervals that cause S exceed cusum_max, adjust the time interval. ▫ Their optimized t is 50 seconds. 15

16 Extracting Model Generation Data After the change points are found … ▫ First, extract the snippet of the traffic (contain the commends)  the time interval having change point  + next10 seconds of the following interval (in case the change point occurs close to the interval boundary)  + 30 seconds of the previous interval to cover the command response delay ▫ Second, extract the information required for creating model for response behavior  behavior profile (capture the network-level activity) 16

17 Generating Detection Models Given ▫ a set of network traffic snippets ▫ with their associated response behavior profiles They are first applied a two phase clustering ▫ (1) arrange snippets such that those are put together in a cluster that likely contain the same command ▫ (2) group the contents of the snippets in each cluster such that elements in a group share commonalities that can be leveraged by the token extraction algorithm. 17

18 Generating Detection Models Observations ▫ The network traffic of a bot responding to a certain command will look similar to the traffic generated by this bot executing the same command at some later time. ▫ The same bot executing a different command will generate traffic that looks different. Find behavior clustering ▫ To identify behavior clusters, they perform hierarchical clustering based on the normalized response behavior profiles. ▫ Then Sec 4.1, extract command in the same cluster ▫ Sec 4.2, the behavior profile are used to model the response activity. 18

19 Command Model Generation Use a signature generation technique that produces token sequences. To find common tokens, they use the longest common subsequence algorithm (based on suffix arrays). Note: different commands may lead to similar responses which may be clustered together. ▫ They employ a standard complete-link, hierarchical clustering algorithm to find payloads that are similar. 19

20 Response Model Generation computing the element-wise average of the (vectors of the) individual behavior profiles ▫ The result is another behavior profile vector that captures the aggregate of the behaviors combined in the respective behavior cluster. Example ▫ They define a threshold of 1,000 for the number of UDP packets within one time interval (50 seconds), 100 for HTTP packets, 10 for SMTP packets, and 20 for the number of different IPs. ▫ When a response profile exceeds none of these thresholds, the corresponding behavior cluster (and its token sequences) are not used to generate a detection model. 20

21 Mapping Models into Bro Signatures encode the model’s set of token sequences as well as its behavior profile ▫ (state 1) token  regular expression operator ▫ (state 2) if token is matched, Bro starts to record the traffic of the host that triggered a signature (for 50 seconds). ▫ When the observed traffic exceeds, for at least one of these four features, the corresponding value stored in the response profile, it is considered as a match. 21

22 Evaluation 416 different bot + 30 well-known bot “Storm Worm” ▫ Manually analysis: 16 IRC bot families, 1 http, 1 p2p 22

23 Example 23 IRC protocol headers to transmit message instruct bot starting scanning scanning parameter Snort signature

24 Detection Capability Total 446 network traces ▫ training set: 25% of one bot family’s traces ▫ test set: remaining ones ▫ detection rate: 88% ▫ The remaining 12% did not even trigger state 1 (token). ▫ BotHunter  manually-developed signatures  69% 24

25 Real-World Deployment: False positive Aachen ▫ 130 token sequence matches, but not second state.  Pure token matching might cause FP. Greece ▫ These 11 hosts were responsible for 60 alerts. ▫ Manually checking, all FP. Alerts/day is acceptable! 25

26 Related Work Network intrusion detection ▫ Bro, Snort, model normal network traffic Signature generation ▫ Early Bird, Autograph, Polygraph, and Hamsa Botnet analysis and defense ▫ horizontal correlation  receiving the same commands and reacting in lockstep ▫ vertical correlation  BotHunter, Snort 26

27 Limitations A possible way of handling this evasion attempt is to randomize the time window, making it harder for the adversary to select an appropriate delay. Encrypted traffic ▫ difficult to recognize commands 27

28 Comments It is a signature generation paper based on a set of bot traces. ▫ token (content-based) ▫ traffic profile (network-based) Command + network activity is an interesting perspective. (vs C&C) ▫ Token generation, profile extraction are not main contribution. Only FP can be examined. (Assume captured traffic is clean.) Produce Bro signature. 28