BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1,

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Computer Networks TCP/IP Protocol Suite.
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Challenges in Making Tomography Practical
Network Security Highlights Nick Feamster Georgia Tech.
1 Resonance: Dynamic Access Control in Enterprise Networks Ankur Nayak, Alex Reimers, Nick Feamster, Russ Clark School of Computer Science Georgia Institute.
1 Network-Level Spam Detection Nick Feamster Georgia Tech.
Network Security Highlights Nick Feamster Georgia Tech.
Monitoring and Intrusion Detection Nick Feamster CS 4251 Fall 2008.
UNITED NATIONS Shipment Details Report – January 2006.
New Packet Sampling Technique for Robust Flow Measurements Shigeo Shioda Department of Architecture and Urban Science Graduate School of Engineering, Chiba.
Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.
Scalable Routing In Delay Tolerant Networks
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Randomized Algorithms Randomized Algorithms CS648 1.
Chapter 1: Introduction to Scaling Networks
Local Area Networks - Internetworking
Countering DoS Attacks with Stateless Multipath Overlays Presented by Yan Zhang.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
1 Network Address Translation (NAT) Relates to Lab 7. Module about private networks and NAT.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 EN0129 PC AND NETWORK TECHNOLOGY I IP ADDRESSING AND SUBNETS Derived From CCNA Network Fundamentals.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 EN0129 PC AND NETWORK TECHNOLOGY I NETWORK LAYER AND IP Derived From CCNA Network Fundamentals.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 Joseph Ghafari Artificial Neural Networks Botnet detection for Stéphane Sénécal, Emmanuel Herbert.
Chapter 9: Subnetting IP Networks
25 seconds left…...
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA TCP/IP Protocol Suite and IP Addressing Halmstad University Olga Torstensson
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
VPN AND REMOTE ACCESS Mohammad S. Hasan 1 VPN and Remote Access.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Incremental Update for a Compositional SDN Hypervisor Xin Jin Jennifer Rexford, David Walker.
Compiling Path Queries in Software-Defined Networks Srinivas Narayana Jennifer Rexford and David Walker Princeton University.
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
An Introduction of Botnet Detection – Part 2 Guofei Gu, Wenke Lee (Georiga Tech)
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Written by Guofei Gu, Roberto Perdisci, Junjie.
Botnet Dection system. Introduction  Botnet problem  Challenges for botnet detection.
Detecting Botnets Using Hidden Markov Models on Network Traces Wade Gobel Bio-Grid, Summer 2008.
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.
Amir Houmansadr CS660: Advanced Information Assurance Spring 2015
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu, Roberto Perdisci, Junjie Zhang, and.
Nullcon Goa 2010http://nullcon.net Botnet Mitigation, Monitoring and Management - Harshad Patil.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Botnets Usman Jafarey Including slides from The Zombie Roundup by Cooke, Jahanian, McPherson of the University of Michigan.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Presented by D Callahan.
Speaker: Hom-Jay Hom Date:2009/10/20 Botnet Research Survey Zhaosheng Zhu. et al July 28-August
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
Memento: Making Sliding Windows Efficient for Heavy Hitters
Data Mining & Machine Learning Lab
Presentation transcript:

BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu1,2, Roberto Perdisci3, Junjie Zhang1, and Wenke Lee1 1Georgia Tech 3Damballa, Inc. 2Texas A&M University

Roadmap Introduction BotMiner Conclusion Botnet problem Challenges for botnet detection Related work BotMiner Motivation Design Evaluation Conclusion

What Is a Bot/Botnet? Bot Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work What Is a Bot/Botnet? Bot A malware instance that runs autonomously and automatically on a compromised computer (zombie) without owner’s consent Profit-driven, professionally written, widely propagated Botnet (Bot Army): network of bots controlled by criminals Definition: “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” Architecture: centralized (e.g., IRC,HTTP), distributed (e.g., P2P) “25% of Internet PCs are part of a botnet!” ( - Vint Cerf) a remote control facility (C&C) IRC, HTTP, P2P a spreading mechanism to propagate Remote vulnerability scan, Email, Drive-by download, IM Botmaster bot C&C

Botnets are used for … All DDoS attacks Spam Click fraud Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Botnets are used for … All DDoS attacks Spam Click fraud Information theft Phishing attacks Distributing other malware, e.g., spyware 95% of email is spam

Challenges for Botnet Detection Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Challenges for Botnet Detection Bots are stealthy on the infected machines We focus on a network-based solution Bot infection is usually a multi-faceted and multi-phased process Only looking at one specific aspect likely to fail Bots are dynamically evolving Static and signature-based approaches may not be effective Botnets can have very flexible design of C&C channels A solution very specific to a botnet instance is not desirable

Why Existing Techniques Not Enough? Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Why Existing Techniques Not Enough? Traditional AV tools Bots use packer, rootkit, frequent updating to easily defeat AV tools Traditional IDS/IPS Look at only specific aspect Do not have a big picture Honeypot Not a good botnet detection tool Not scalable, mostly passively waiting Bots can detect/discover honeypot/honeynet

Existing Botnet Detection Work Introduction BotMiner Conclusion Botnet Problem Challenges for Botnet Detection Related Work Existing Botnet Detection Work [Binkley,Singh 2006]: IRC-based bot detection combine IRC statistics and TCP work weight Rishi [Goebel, Holz 2007]: signature-based IRC bot nickname detection [Livadas et al. 2006, Karasaridis et al. 2007]: (BBN, AT&T) network flow level detection of IRC botnets (IRC botnet) BotHunter [Gu etal Security’07]: dialog correlation to detect bots based on an infection dialog model BotSniffer [Gu etal NDSS’08]: spatial-temporal correlation to detect centralized botnet C&C TAMD [Yen, Reiter 2008]: traffic aggregation to detect botnets that use a centralized C&C structure

Example: Nugache, Storm, … Introduction BotMiner Conclusion Motivation Design Evaluation Why BotMiner? Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models … Example: Nugache, Storm, …

BotMiner: Protocol- and Structure-Independent Detection Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner: Protocol- and Structure-Independent Detection Horizontal correlation - Bots are for long-term use Botnet: communication and activities are coordinated/similar Enterprise-like Network Internet

Revisit the Definition of a Botnet Introduction BotMiner Conclusion Motivation Design Evaluation Revisit the Definition of a Botnet “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel” We need to monitor two planes C-plane (C&C communication plane): “who is talking to whom” A-plane (malicious activity plane): “who is doing what” Color, re-draw, bigger

BotMiner Architecture Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner Architecture

BotMiner C-plane Clustering Introduction BotMiner Conclusion Motivation Design Evaluation BotMiner C-plane Clustering What characterizes a communication flow (C-flow) between a local host and a remote service? <protocol, srcIP, dstIP, dstPort>

How to Capture “Talking in What Kind of Patterns”? Introduction BotMiner Conclusion Motivation Design Evaluation How to Capture “Talking in What Kind of Patterns”? Temporal related statistical distribution information in BPS (bytes per second) FPH (flow per hour) Spatial related statistical distribution information in BPP (bytes per packet) PPF (packet per flow) Content independent

Two-step Clustering of C-flows Introduction BotMiner Conclusion Motivation Design Evaluation Two-step Clustering of C-flows Why multi-step? How? Coarse-grained clustering Using reduced feature space: mean and variance of the distribution of FPH, PPF, BPP, BPS for each C-flow (2*4=8) Efficient clustering algorithm: X-means Fine-grained clustering Using full feature space (13*4=52) What’s left?

A-plane Clustering Capture “activities in what kind of patterns” Introduction BotMiner Conclusion Motivation Design Evaluation A-plane Clustering Capture “activities in what kind of patterns”

Cross-plane Correlation Introduction BotMiner Conclusion Motivation Design Evaluation Cross-plane Correlation Botnet score s(h) for every host h Similarity score between host hi and hj Hierarchical clustering Ai Aj Two hosts in the same A-clusters and in at least one common C-cluster are clustered together

Evaluation Traces BotMiner Introduction Conclusion Evaluation Motivation Design Evaluation Evaluation Traces

Evaluation Results: False Positives Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Results: False Positives

Evaluation Results: Detection Rate Introduction BotMiner Conclusion Motivation Design Evaluation Evaluation Results: Detection Rate

Summary and Future Work Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework Summary and Future Work BotMiner New botnet detection system based on Horizontal correlation Independent of botnet C&C protocol and structure Real-world evaluation shows promising results Future work More efficient clustering, more robust features New faster detection system using active techniques BotMiner: offline correlation, and requires a relatively long time for detection BotProbe: fast detection by observing at most one round of C&C New real-time solution for very high speed and very large networks

Correlation-based Botnet Detection Framework Introduction BotMiner Conclusion Summary & Future Work Correlation-based Botnet Detection Framework Correlation-based Botnet Detection Framework Vertical Correlation BotHunter (Security’07) Enterprise-like Network Horizontal Correlation BotSniffer (NDSS’08) BotMiner (Security’08) BotHunter, regardless of archtectue… BotSniffer: mainly centralized BotMiner: Time Internet Cause-Effect Correlation BotProbe

Limitation and Discussion Appendix Limitation and Discussion Evading C-plane monitoring and clustering Misuse whitelist Manipulate communication patterns Evading A-plane monitoring and clustering Very stealthy activity Individualize bots’ communication/activity Evading cross-plane analysis Extremely delayed task

High-Speed Packet Sampling Traffic arrives at high rates High volume Some analysis scales with the size of the input Possible approaches Random packet sampling Targeted packet sampling

Instantaneous sampling probability Approach Idea: Bias sampling of traffic towards subpopulations based on conditions of traffic Two modules Counting: Count statistics of each traffic flow Sampling: Sample packets based on (1) overall target sampling rate (2) input conditions Instantaneous sampling probability Overall sampling rate Input conditions Traffic stream Traffic subpopulations Counting Sampling

Challenges How to specify subpopulations? Solution: multi-dimensional array specification How to maintain counts for each subpopulation? Solution: rotating array of counting Bloom filters How to derive instantaneous sampling probabilities from overall constraints? Solution: multi-dimensional counter array, and scaling based on target rates

Specifying Subpopulations Idea: Use concatenation of header fields (“tupples”) as a “key” for a subpopulation These keys specify a group of packets that will be counted together Count groups of packets with the same source and destination IP address # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, 1] AND tuple_2 in (0, 5]: 0.5 Count groups of packets with the same source IP, source port, and destination port

Sampling Rates for Subpopulations Operator specifies Overall sampling rate Conditional rate within each class Flexsample computes instantaneous sampling probabilities based on this # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Sample one in 100 packets on average Within the 1/100 “budget”, half of sampled packets should come from groups satisfying this condition

Examining the Condition Biases sampling towards packets from (source IP, destination IP) pairs which Have sent at least 30 packets Have sent packets to at least 5 distinct ports Application: Portscan # base sampling rate sampling_rate = 0.01 # number of tuples tuples = 2 # number of conditions conditions = 1 # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5

Next problem: Determining which condition each packet satisfies Sampling Lookup Table Problem: Conditions may not be completely specified Solution: Sampling budget lookup table Lookup table for allocating sampling “budget” to each class Deduced values # tuple definitions tuple_1 := srcip.dstip tuple_2 := srcip.srcport.dstport # condition : sampling budget tuple_1 in (30, inf] AND tuple_2 in (0, 5]: 0.5 Next problem: Determining which condition each packet satisfies

Counting Subpopulations Each packet belongs to a particular range in n-dimensional space Counts for each condition Maintain counter (counting Bloom filter) for each tuple in every subcondition Rotate counters to expunge “stale” values Details: 1. Number of counters 2. How often to rotate

Deriving Instantaneous Sampling Rates Problem: Traffic rates are dynamic Relative fractions of packets in each class may change Solution: Count packets in each sampling class, and adjust probabilities to rebalance according to the lookup table Instantaneous rate = overall rate * (target rate) / (actual rate) Keep track of actual rate using Bloom filter array and EWMA

Example Evaluation: Portscan Setup Parameters as above Nmap scan injected into ful one-hour trace from department network Results FlexSample can capture 10x more of the portscan packets if all sampling budget is allocated to portscan class Bias can be configured

Other Applications Recovering unique “conversations” in sampled traffic Identifying DDoS Attacks Identifying heavy hiters, high-degree nodes, etc.

Open Challenges Specifying ranges and classes for specific applications Scaling the counter array as the number of tuples and ranges increases Simultaneously satisfying multiple objectives

Next Steps: BotMiner Integration Determine The traffic rates that BotMiner can support for online analysis The subpopulations that will yield the highest detection rates Evaluation on traffic traces that contain botnets of interest