1 Issues in Benchmarking Intrusion Detection Systems Marcus J. Ranum
2 IDS Benchmarking? How hard can it be to benchmark intrusion detection systems? –Very! –There are lots of ways to get it wrong Accidentally Deliberately –Avoiding doing it wrong does not necessarily mean you’ve done it right
3 What’s an IDS? IDS = Intrusion Detection System –Primary criterion for measurement is the IDS’ ability to detect intrusions –Secondary criteria for measurement are other issues: False positives - false alarms False negatives - real attacks that are missed Performance impact - thruoughput delay or CPU usage on host processor
4 Types of IDS Primary Types: –Network IDS (NIDS) –Host IDS (HIDS) Hybrid Types: –Per-Host Network IDS (PH-NIDS) –Load Balanced Network IDS (LB-NIDS) –Firewall IDS (FW-IDS)
5 Properties of: Network IDS Collect packets in promiscuous mode Issues: –Packet collection rate - what is the maximum throughput? –Reassembly/defragmentation/reordering - what about traffic spoofing? –Selective analysis - is the IDS choosing to ignore some traffic in order to optimize?
6 Properties of: Host IDS Operate on host logs and processes –Sometimes forwards audit records to a central for analysis Issues: –CPU usage on host –What about packet-oriented attacks? –Per-platform (individual) view of attacks - single system is monitored per agent
7 Properties of: Per-Host Network IDS Network IDS “shim” layer inserted into network stack on each host Issues: –Has properties of a network IDS –But: Traffic is processed per-host only Does not have same performance as NIDS “Local” only view of traffic (but no drops)
8 Properties of: Load-Balanced Network IDS Use a load-balancing pre-processor to “spread” load across multiple NIDS Issues: –Can scale to “infinite” bandwidth –Total cost of solution is not single unit pricing (requires switch + multiple NIDS)
9 Properties of: Firewall IDS Place network IDS capability in a firewall or bridge type device Issues: –No packet loss issues (retransmits take care of packets that are lost) –(May) slow down network throughput
10 Other Issues Other things affecting speed and detection ability: –TCP fragment re-assembly –TCP packet re-ordering –TCP state/sequence tracking –Analyzing only selected sessions
11 Fragment Re-assembly Re-assembling fragments takes significant CPU time as well as memory to buffer packets –IDS can be negatively impacted by faked fragments intended to consume extra memory –How does IDS handle fragmented attacks? Simply alert “I see fragmented traffic” or de-fragment then apply IDS logic?
12 Packet Re-ordering Re-ordering packets requires significant CPU as well as memory for packet buffering –IDS can be impacted by unintentional or deliberate packet drops since it tries to buffer out-of-sequence packets –How does IDS handle re-ordering? Does it just flag out-of-sequence packets, or does it re-order then apply IDS logic?
13 TCP State Tracking Tracking TCP states requires maintaining per-session information –IDS is impacted by number of simultaneous streams –IDS is impacted by randomized traffic –IDS is harder to fool with faked out-of- sequence FIN packets
14 Analyzing Selected Sessions IDS can “optimize” performance by only reassembling or tracking TCP related with known signatures –IDS might have extremely good performance against random traffic but poor performance against (e.g.) Web traffic –Tradeoff is coverage versus performance; vendors do not usually document this
15 Naïve Simulation Network Test Network Attack Generator Target Host Attack Stream NIDS
16 What’s Wrong? The Naïve test network permits traffic that is not likely to be seen in a “real world” deployment - e.g.: ARP cache poisoning (you see a lot of this on DEFCON CTF networks) The presence of a router would “smooth” spikes somewhat and actually achieve higher sustained loads
17 Naïve Simulation Network #2 Test Network #2 Target Host Attack Stream NIDS Router w/some screening Test Network #1 Attack Generator Smartbits Load Generator
18 What’s Wrong? SmartBits style traffic generators do not generate “real” TCP traffic –This penalizes IDS that actually look at streams and try to reassemble them (which are desirable properties of a good IDS)
19 Skunking a Benchmark Test Network Attack Generator Target Host w/Host-Net Attack Stream Target Host w/Host-Net Target Host w/Host-Net Smartbits Load Generator
20 What’s Wrong? Packet style counts are not relevant to host-network IDS
21 Skunking a Benchmark: #2 Test Network Attack Generator Target Host Attack Stream Smartbits Load Generator NIDS with selective detection turned on
22 What’s Wrong? IDS with selective detection can be configured to only look at traffic aimed to local subnet –SmartBits style generators’ random traffic largely gets seen and discarded
23 Effective Simulation Network Test Network Replayed packets dumped back onto network NIDSRecorded attack and normal traffic on hard disk
24 What’s Wrong? Nothing: –Predictable baseline –Can verify traffic rate with simple math –Can scale load arbitrarily (use multiple machines each with different capture data) –Traffic is real including “real” data contents –NID cannot be configured to watch a specific machine (there are no targets)
25 Tools to Use Fragrouter - generates fragmented packets Whisker - generates out-of-sequence packets Pcap-pace - replays packets from a hard disk with original inter-packet timing
26 Summary It’s easy to skunk an intrusion detection benchmark It’s hard to design a good intrusion detection benchmark If you want to see if a given system works, the best way to find out is to try it on your actual network