Local Worm Detection using Honeypots Justin Miller Jan 25, 2007 Don’t become a stat… Use HoneyStat! Local Worm Detection using Honeypots Justin Miller Jan 25, 2007
Original Paper HoneyStat: Local Worm Detection Using Honeypots By: D.Dagon, X.Qin, G.Gu, W.Lee, J.Grizzard, J.Levine, H.Owen Georgia Institute of Technology
Background Worm detection systems Detection in local networks HoneyStat nodes Data collection Improvements in HoneyStat
Worm Detection Relied on artifacts incidental to worm infection Measure incoming scan rates Filter results for small networks Increase data collection Global monitoring centers Doesn’t help local networks
Worm Detection Proposition: use honeypots to improve accuracy of alerts (local intrusion detection) Honeypot – computer system set up as a trap for attackers
Honeypot Network decoy Distracts attackers Gather early warnings about new attacks Facilitate in-depth analysis of adversary’s strategy
Honeypot Use Gather info about how human attackers operate Labor-intensive log (1:40) 1 week per hour of data log Virtual honeypots Used to prevent OS fingerprinting
Honeypot Use Detect/disable worms (honeyd) Not ready for early warning IDS Know attack pattern Catch zero day worms – already know system vulnerability
Worm Detection Worm propagation proposals Early detection proposals Model to study worm spreading Early detection proposals Statistical models analyze repeated outgoing connections Worm info collected at routers
Objective Early worm detection challenges Focus on local networks Large space to monitor Coordinated responses Focus on local networks Detection using local honeypots Lower false positive rate of worms
Infection Cycles 3 actions result from infection Memory events Network events Disk events Describe worm installation on compromised system
Memory Events Begins with probe for victim Provides port Victim shell listens on port 4,444 Honeypot acknowledges incoming packets Infection begins corrupting process
Network Events Blaster shell remains open for only one connection Instructs victim to download “egg” program Honeypot initiates TCP or UDP traffic
Disk Events Occur after Blaster “egg” is downloaded Disk writes – become active after system reboot Not all worms have disk writes
Data Capture Most worms follow similar cycle Traditional worm detection Usually at start or end of cycle Activity in middle of cycle can be tracked Intrusion detection based on scan rates has high rate of noise
HoneyStat Node Minimal honeypot created in an emulator Covers large address space Honeypots remain idle until HoneyStat event occurs
HoneyStat Data Data recorded includes: OS/patch level of host Type of event Trace file of all prior network activity
HoneyStat Events Events forwarded to analysis node Usually central server Places alert events in queue Perform statistical analysis
Data Analysis Check if event corresponds to an active honeypot Update previous event to include new event Reset honeypot if event involved Network Events (DL an egg or initiating outgoing scans)
Data Analysis Analysis node examines basic properties of the event HoneyStat event is correlated with other observed events Search for worm pattern Objective: Zero-day worms Statistical analysis identifies worm behavior
Logistic Regression Analyzes port correlation Non-linear transformation of linear regression model Honeypot event is dichotomous Awake (1) or asleep (0)
Logistic Regression Model is binary expectation of the honeypot state j: counter for honeypot events i: counter for each individual port traffic for a specific honeypot
Logistic Regression Measures inverse of time between honeypot events Resolve equation after each event Identify candidate ports that explain why honeypots become active Also finds traffic patterns Traffic measured for last 5 minutes
Logistic Analysis Estimate βi,j coefficients (MLE) Find coefficients that minimize prediction error Find which variables significantly affect honeypot activity Single variable = ALERT!
Practical Aspects Properly identify worm outbreaks Low false positive rate Sample data from 6 honeypots active during Blaster worm
Worm Detection
Worm Detection
Worm Detection Logit Analysis of Multiple HoneyStat Events
Worm Detection Scans on ports 135, 139, 445 Require: 10 sample events No test can focus on 135 alone Leads to pattern for 1 worm Require: 10 sample events Not sure of effective sample size
Benefits Accurate data stream Events result from successful attack Reduces amount of data to process Detects zero day worms Detects ports worm enter/exit Finds presence and also explains worm activity
False Positives Identify wrong network traffic Worm present, HoneyStat identifies wrong source Repeated human breakins could be identified as a worm Disregard manual breakins These are more dangerous than robotic worms
Sample Data Tested HoneyStat on the Internet Injected a worm attack at Georgia Tech Log from 2002-2004 Random sample of 250+ synthetic honeypot events 0 false positives
HoneyStat as IDS Low false positive rate Good for local IDS Effectively detects worms using random scan techniques Will attack honeypots
HoneyStat as IDS What about non-random worms? Ω = entire IPv4 space (232) T = # of potential victims N = total vulnerable machines nt = # of victims at time t s = scan rate
HoneyStat as IDS ki+1 = sniT/Ω P = 1 – (1 - 1/T)ki+1 # scans entering space T at time (i+1) P = 1 – (1 - 1/T)ki+1 Probability of host being hit
HoneyStat as IDS Worm propagation equation: ni+1 = ni + [N - ni](1 – (1 - 1/T)sniT/Ω) T and Ω are big, reducing to: ni+1 = sni/Ω Same as previous models
HoneyStat as IDS
HoneyStat as IDS Machines can be multihomed Local early worm detection Each searches 100’s of IP addresses Local early worm detection D = 211 α = 0.25 First victim found after 0.19% of vulnerable hosts are infected
Contributions Statistical techniques used in worm detection Previously applied time series-based statistical analysis Logistic regression detects worm outbreaks
Weakness Honeypot evasion Attackers have worms detect and avoid honeypot traps Attackers make observations about victim’s machine Effective sample size unknown
Improvements Reduce traffic length (logistic) measured < 5 minutes Studies recent network events Improve quality of data Avoid linear identification of multiple worms Best Subsets logistic regression Study effective sample size
Conclusion Further research for local IDS Logistic regression detects worm outbreaks Honeypots create accurate alert 3 classes: memory, disk, network events Logit analysis eliminates noise Extensive data traces identifies worm activity
Questions ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?