Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST)

Slides:

Advertisements

Similar presentations

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.

Advertisements

Code-Red : a case study on the spread and victims of an Internet worm David Moore, Colleen Shannon, Jeffery Brown Jonghyun Kim.

A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.

Forecasting Using the Simple Linear Regression Model and Correlation

Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 7.3 Secure and Resilient Location Discovery in Wireless.

 Well-publicized worms  Worm propagation curve  Scanning strategies (uniform, permutation, hitlist, subnet) 1.

Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Electrical Engineering and Computer Science Northwestern University

Observed Structure of Addresses in IP Traffic CSCI 780, Fall 2005.

Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.

BA 555 Practical Business Analysis

Internet Intrusions: Global Characteristics and Prevalence Presented By: Elliot Parsons Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS.

Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.

1 A Suite of Schemes for User-level Network Diagnosis without Infrastructure Yao Zhao, Yan Chen Lab for Internet and Security Technology, Northwestern.

Defense Questions # of correlated attacks: under-estimated or over-estimated? Conservative estimation –Average across all the three dataset? Dataset w/

Multivariate Data Analysis Chapter 4 – Multiple Regression.

Fast Port Scan Using Sequential Hypothesis Testing Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan.

Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.

Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,

Variance of Aggregated Web Traffic Robert Morris MIT Laboratory for Computer Science IEEE INFOCOM 2000’

Measurement and Diagnosis of Address Misconfigured P2P traffic Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic Lab for Internet and Security.

School of Computer Science and Information Systems

Intrusion Detection CS-480b Dick Steflik. Hacking Attempts IP Address Scans scan the range of addresses looking for hosts (ping scan) Port Scans scan.

Collaborating Against Common Enemies Sachin Katti Balachander Krishnamurthy and Dina Katabi AT&T Labs-Research & MIT CSAIL.

10/21/20031 Framework For Classifying Denial of Service Attacks Alefiya Hussain, John Heidemann, Christos Papadopoulos Kavita Chada & Viji Avali CSCE 790.

© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.

Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.

RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks Huamin Chen, Jian Li, and Prasant Mohapatra Presenter: Jian Li.

Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology USENIX Security '08 Presented by Lei Wu.

Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.

MITACS-PINTS Prediction In Interacting Systems Project Leader : Michael Kouriztin.

Inference for regression - Simple linear regression

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Chapter 12 Multiple Regression and Model Building.

Yan Chen Lab for Internet and Security Technology (LIST) Dept. of Electrical Engineering and Computer Science Northwestern University

Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.

Detection Unknown Worms Using Randomness Check Computer and Communication Security Lab. Dept. of Computer Science and Engineering KOREA University Hyundo.

Report on Intrusion Detection and Data Fusion By Ganesh Godavari.

Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,

Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,

BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.

Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.

1 On the Performance of Internet Worm Scanning Strategies Authors: Cliff C. Zou, Don Towsley, Weibo Gong Publication: Journal of Performance Evaluation,

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.

ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2, Christophe Diot 1, Nina Taft 3, Ramesh Govindan 4 1 Technicolor 2 UPMC.

1 On the Performance of Internet Worm Scanning Strategies Cliff C. Zou, Don Towsley, Weibo Gong Univ. Massachusetts, Amherst.

Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.

Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.

-Mayukh, clemson university1 Project Overview Study of Tfrc Verification, Analysis and Development Verification : Experiments. Analysis : Check for short.

2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.

OS Fingerprinting and Tethering Detection in Mobile Networks

Regression Analysis AGEC 784.

The Devil and Packet Trace Anonymization

Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)

RTP: A Transport Protocol for Real-Time Applications

Footprinting (definition 1)

Network-based Intrusion Detection, Prevention and Forensics System

RESOLVING IP ALIASES USING DISTRIBUTED SYSTEMS

TA1| Northwestern University, LGS Innovations, University of Delaware| Real-Time Detection and Situational-Aware Analysis of Internet-Scale Heterogeneous.

De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.

BA 275 Quantitative Business Methods

Simple Linear Regression

Data Mining & Machine Learning Lab

Presentation transcript:

Automating Analysis of Large-Scale Botnet Probing Events Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST) Northwestern University * UC Berkeley / ICSI

2 Motivation Administrators IPv4 Space Enterprise Botnets Does this attack specially target us? Can we answer this question with only limited information observed locally in the enterprise?

3 Motivation Can we infer the probe strategy used by botnets? Can we infer whether a botnet probing attack specially targets a certain network, or we are just part of a larger, indiscriminant attack? Can we extrapolate botnet global properties given limited local information?

4 Agenda Motivation Basic framework Discover the botnet probing strategies Extrapolate global properties Evaluation Conclusions

5 Botnet Probing Events Big spikes of larger numbers of probers mainly caused by botnets

6 System Framework See the paper for subtle system details.

7 Agenda Motivation Basic framework Discover the botnet probing strategies Extrapolate global properties Evaluation Conclusions

8 Discover the Botnet Probing Strategies Use statistical tests to understand probing strategies –Leverage on existing statistical tests Monotonic trend checking: detect whether bots probe the IP space monotonically Uniformity checking: detect whether bots scan the IP range uniformly. –Design our own Hitlist (liveness) checking: detect whether they avoid the dark IP space Dependency checking: do the bots scan independently or are they coordinated?

9 Design Space

10 Hitlist Checking Configure the sensor to be half darknet and half honeynet Use metric θ = # src in darknet/ # src in honeynet. Threshold 0.5

11 Agenda Motivation Basic framework Discover the botnet probing strategies Extrapolate global properties –Global scan scope, total # of bots, total # of scans, total scan rate for each bot Evaluation Conclusions

12 Extrapolate Global Properties: Basic Ideas and Validation Observe the packet fields that change with certain patterns in continuous probes. –IPID: a packet field in IP header used for IP defragmentation –Ephemeral port number: the source port used by bots –Increment for a fixed # per scan Validation –IPID continuity: All versions of Windows and MacOS –Ephemeral port number continuity: botnet source code study Agobot, Phatbot, Spybot, SDbot, rxBot, etc. –Control experiments with NAT

13 Estimate Global Scan Rate of Each Bot Count the IPID & ephemeral port # changes –Recover the overflow of IPID and ephemeral port number –Estimate the rate with linear regression when correlation coefficient > 0.99 –Counter overestimation: use less of the two T IPID

14 Extrapolate Global Scan Scope IPv4 Space Botnets Total scans from bot i : scan rate R i * scan time T i = 100*1000=100,000 bot i n i =100 Aggregating multiple bots Local/global ratio

15 Extrapolate Global # of Bots Idea: similar to Mark and Recapture Assumption: All bots have the same global scan range Bots Total M=4000 First half m1=1000 Observed by both m12= 250 Second half m2=1000 M=m1*m2/m12 M m1m2 m12

16 Agenda Motivation Basic framework Discover the botnet probing strategies Extrapolate global properties Evaluation Conclusions

17 Dataset Based on a 10 /24 honeynet in a National Lab (LBNL) 293GB packet traces in 24 months ( ) Totally observed 203 botnet probing events –Average observed #bots/event is 980. Mainly on SMB/WINRPC, VNC, Symantec, MSSQL, HTTP, Telnet Size of the system: 13,900 lines: Bro (6,000), Python (4,000), C++ (2,500), R (1,400)

18 More than 80% uniform scanning Validate the results through visualization and find the results are highly accurate. Property Checking Results

19 Extrapolation Results Most of extrapolated global scopes are at /8 size, which means the botnets do not target the enterprise (LBNL). Validation based with DShield data –DShield: the largest Internet alert repository –Find the /8 prefixes in DShield with sufficient source (bots) overlap with the honeynet events Due to incompleteness of Dshield data, 12 events validated –Calculate the scan scope in each /8 based on sensor coverage ratio.

20 Extrapolation Validation Define scope factor as max(DShield/Honeynet,Honeynet/DShield) CDF of the scope factor 75% within 1.35 All within 1.5

21 Conclusions Develop a set of statistical approaches to assess four properties of botnet probing strategies Designed approaches to extrapolate the global properties of a scan event based on limited local view Through real-world validation based on DShield, we show our scheme are promisingly accurate

22 Backup

23 Event size distribution

24 Extrapolate the scope Local/global ratio Probing time window Estimate global probing rate Probes observed locally

25 Monotonic trend checking Goal: detect whether the bots probe the IP space monotonically –E.g. simple sequential probing Technique: –Mann-Kendall trend test –Intuition: check whether the aggregated sign value (sign(A i+1 -A i )) out of the range of randomness can achieve. –When most (>80%) senders in an events follow trend we label the events follow trends

26 Uniformity Checking Goal: detect whether the botnet scan the IP range uniformly. Technique: –Chi-Square test –Intuition: put address into bins. The scan observed in each bin should be similar. –Significance level of 0.5%

27 Dependency Checking Goal: Is the bots try to get out each other’s way? Idea: account the number of address receive zero scan and comparing with confidence interval of the independent random case.