Design of a System for Real- Time Worm Detection Bharath Madhusudan, John Lockwood Department of Computer Science and Engineering Washington University, St. Louis ©2004 IEEE Presented by Stephen Karg November 14, 2005
Contributions The Problems: 1.Many IDS’s have limited effectiveness due to the fact that they can filter known worms. 2.Dark-space scan detection can’t defend against hit-list worms. Proposed Solutions: 1.Monitor network traffic to automatically detect new worms in real-time. 2.Analyze packet content, not header. Gets a new worm signature.
Their Goals Low reaction time High throughput Low Cost Low False-Positive Rate Robust to simple countermeasures.
System Properties Designed to work in tandem with signature- based IDS. Frequently occurring content = new signature. Hardware-based system to keep pace with high volume traffic (Gigabit Ethernet). Centralized monitoring. Computationally intensive, hence the need for H/W-based system.
General Algorithm 1.Hash over sliding 10-byte window of packet- content data stream (header data stripped). So multiple hashes over each payload (gets around basic metamorphism, shuffling blocks, etc.) 2.On-chip vector of counters* for each hash value. 3 stage pipeline: 1 read/inc./write per clock cycle. 3.If threshold count exceeded, offending signature hashed to off-chip SRAM. 4.Iff a 2 nd signature is hashed to same SRAM bucket (that matches the first), alert thrown. This last step reduces false-positives. * 8-bit, periodically reduced by avg. count (called timeouts)
Design Considerations 1.Throughput: –Steps 1 & 2 implemented in parallel using multiple windows vector pairs. Counters aggregated. 2.Benign Strings: –False-positive potential w/regularly occurring strings (e.g. 1 st several bytes of HTTP request) –Sys. Admin can reconfigure to ignore.
Design Considerations (cont.) 3.False-Positives: Potential Counter-Attack: Flood IDS with packet(s) repeating the same string. Solution: Count any given signature only once per window of size T (not same window as before, larger). Bloom Filter used (prior research). 1.False-positives can be kept low using proven formula. 2.Signatures over window stored compactly and efficiently queried with dual-ported on-chip memory. 4.Threshold vs. timeout relationship Reduces to well-studied problem in hashing - can again calculate & minimize false-positive rate.
Performance Evaluation “Normal” packet stream uses 2-day trace of UC Berkeley FTP server traffic. –What about other types of traffic? Notably SMTP. Worm-like data inserted in above stream. –Does stream reflect epidemic behavior? Worms are detected, but are they detected in time? –Perhaps reaction/containment out of scope here. Would have liked to see performance on sandboxed subnet with real traffic and real worms.
Evaluation Results Detecting larger worms more difficult. Signature Length Concentration (in Bytes) in Trace Data 5001% 10002% 50003% % % –If worm size exceeds number of buckets/counters, all of them will be incremented as it passes, no stand-out. –Prototype has 64x512 counters (each w/10B window, ~276KB)
Evaluation Results (cont.) Memory collisions decrease with use of more dual-ported memory blocks. –Not surprising, but tests show hardware requirements (and diminishing returns). 64 blocks, 0.02 collision rate. –Also shows empirical collision rate to be consistently below the theoretical calculations.
Functional prototype –64 Block RAMs –Calculates 4 hash values per clock cycle. –Targeted to run on FPX platform w/FPGA hardware. –Circuit implementation runs at 91.5 Mhz –Introduces pipeline delay of 70ns into datapath. –Allows processing at OC-48 line speeds. –Conclusion: real-time performance.
Conclusions A move towards more automated NIDS. –Yes, remove the slow humans from equation. –Performance is impressive considering speed-of-light adversary. Exploit parallelism afforded by hardware to scan much larger amount of traffic than traditional software implementations of similar algorithm. –But do we need to add the H/W requirement & cost? –Does every packet need to seen to spot a trend? –Could software use sampling to produce the same results? Or will it fall too far behind growth in bandwidth?
Conclusions (cont.) Argue much easier to deploy and maintain centralized NIDS than host-based system. –Sure, but as effective? (Wu’s presentation) System robust to “simple” counter-measures. –Perhaps paper’s greatest weakness. Only the most simple metamorphism defended against. (block reordering, some nop insertion) –Instruction replacement: UNDETECTED –Instruction reordering: UNDETECTED –Polymorphic decryptor engines: UNDETECTED –Or just pad w/garbage until 277KB long!
Questions? Thanks.