Download presentation
Presentation is loading. Please wait.
Published byEmery Paul Evans Modified over 9 years ago
1
1 Network-based Intrusion Detection, Mitigation and Forensics System Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.edu
2
2 The Spread of Sapphire/Slammer Worms
3
3 Current Intrusion Detection Systems (IDS) Mostly host-based and not scalable to high-speed networks –Slammer worm infected 75,000 machines in <10 mins –Host-based schemes inefficient and user dependent Have to install IDS on all user machines ! Mostly simple signature-based –Cannot recognize unknown anomalies/intrusions –New viruses/worms, polymorphism
4
4 Current Intrusion Detection Systems (II) Statistical detection –Unscalable for flow-level detection IDS vulnerable to DoS attacks –Overall traffic based: inaccurate, high false positives Cannot differentiate malicious events with unintentional anomalies –Anomalies can be caused by network element faults –E.g., router misconfiguration, link failures, etc.
5
5 Network-based Intrusion Detection, Mitigation, and Forensics System Online traffic recording [SIGCOMM IMC 2004, INFOCOM 2006, ToN to appear] –Reversible sketch for data streaming computation –Record millions of flows (GB traffic) in a few hundred KB –Small # of memory access per packet –Scalable to large key space size (2 32 or 2 64 ) Online sketch-based flow-level anomaly detection [IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 06] –Adaptively learn the traffic pattern changes –As a first step, detect TCP SYN flooding, horizontal and vertical scans even when mixed Online stealthy spreader (botnet scan) detection [IWQoS 2007]
6
6 Network-based Intrusion Detection, Mitigation, and Forensics System (II) Integrated approach for false positive reduction Polymorphic worm signature generation & detection [IEEE Symposium on Security and Privacy 2006] [IEEE ICNP 2007 to appear] Accurate network diagnostics [ACM SIGCOMM 2006] [IEEE INFOCOM 2007] Scalable distributed intrusion alert fusion w/ DHT [SIGCOMM Workshop on Large Scale Attack Defense 2006] Large-scale botnet event forensics using honeynet [work in progress]
7
7 System Architecture Remote aggregated sketch records Streaming packet data Part II Per-flow monitoring & detection Reversible sketch monitoring Filtering Sketch based statistical anomaly detection (SSAD) Local sketch records Sent out for aggregation Per-flow monitoring Normal flows Suspicious flows Intrusion or anomaly alarms Keys of suspicious flows Keys of normal flows Data path Control path Modules on the critical path Signature -based detection Polymorphic worm detection Part I Sketch- based monitoring & detection Modules on the non-critical path Network fault diagnosis
8
8 System Deployment Attached to a router/switch as a black box Edge network detection particularly powerful Original configuration Monitor each port separately Monitor aggregated traffic from all ports Router LAN Inter net Switch LAN (a) Router LAN Inter net LAN (b) HPNAIDM system scan port Splitter Router LAN Inter net LAN (c) Splitter HRAID system Switch HPNAIDM system HPNAIDM system
9
Detecting Stealthy Spreaders Using Online Outdegree Histograms Yan Gao 1, Yao zhao 1, Robert Schweller 1, Shobha Venkataraman 2, Yan Chen 1, Dawn Song 2 and Ming-Yang Kao 1 1. Northwestern University 2. Carnegie Mellon University
10
10 Outline Motivation Problem definition System design Evaluation Conclusion
11
11 Motivation High-speed network monitoring –Small amount of memory usage –Small number of memory accesses per packet Superspreaders vs. Stealthy spreaders –Superspreaders: sources that connect a large number of distinct destinations e.g. a compromised host doing fast scanning for worm propagation –Stealthy spreaders: a number of sources that send more than a certain number of connections (unsuccessful) to distinct destinations e.g. botnet scans or moderate worm propagation
12
12 Existing Data Streaming Algorithms Online entropy estimation approaches Chakrabarti et al. [STACS 06] and Guha et al. [ACM SODA 06] –Pros: detect unexpected changes in the network traffic –Cons: lose some concrete distribution information Online histogram estimation algorithms Gibbons et al. [VLDB 97] and Gilbert et al. [STOC 02] –Pros: provide more information on the features of network traffic –Cons: cannot record the number of unique items Superspreader detection schemes Venkataraman et al. [NDSS 05] and Zhao et al. [IMC 05] –Pros: detect sources with an very large outdegree –Cons: memory usage unscalable to small/medium outdegrees such as bot scans Superspreader detection is a special case of spreader detection
13
13 Outline Motivation Problem definition System design Evaluation Conclusion
14
14 Problem Definitions Two high-level problems Construct an approximation of the outdegree histogram online Directly detect the presence of stealthy spreaders without constructing the complete outdegree histogram
15
15 Problem Definition Input: stream of (Src, Dst) pairs S Output z --- of which powers define the buckets of the histogram (z=2) … 2020 21212 2323 2424 2525 2626 2727 … Histogram Number of sources Number of unique destinations
16
16 Problem Definition Input: stream of (SIP, DIP) pairs S Output W i --- the set of sources A source s is in W i if and only if the number of unique destinations that s connects to is in the range of [z i, z i+1 ) … 2020 21212 2323 2424 2525 2626 2727 … Histogram Number of sources Number of unique destinations
17
17 Problem Definition Input: stream of (SIP, DIP) pairs S Output … 2020 21212 2323 2424 2525 2626 2727 … Histogram Number of sources Number of unique destinations m i = |W i | Creating an approximate histogram is to estimate m i for each bucket
18
18 Contribution Study the problem of detecting stealthy spreaders online –With constant small memory –With small memory accesses per packet Design the algorithm to detect stealthy spreaders online by approximating the outdegree histogram –Data recording phase Sampling and coupon collection-based algorithms –Spreader detection phase Linear regression to find bins where attacks happen Show that the change of approximated histogram reveals the presence of anomalies
19
19 Outline Motivation Problem definition System design Evaluation Conclusion
20
20 Recording Phase: Sampling Algorithm Fast: update a smaller number of counters per packet (src, dst) Packet 2 -3 ≤ h(src) ≤ 2 -2 src Sampling algorithm
21
21 Recording Phase: Coupon Collecting Algorithm Accurate: create a better approximation interim structure (src, dst) Packet 2 -3 ≤ h(src) ≤ 2 -2 (src,g 0 (dst))(src,g 1 (dst)) (src,g 2 (dst))(src,g 3 (dst))(src,g d (dst)) Coupon collecting algorithm : uniform random hash function for hashing dst to an integer in [1, 2 i ]
22
22 Outdegree histogram construction Interim data structure -> final outdegree histogram Using linear programming method Build a convex hull Other constraints: Find the lower and upper bounds for m i Solution –Directly use the interim data structure Pros: Obtain a reasonably accurate histogram for normal network traffic Cons: Fail to accurately estimate the outdegree histogram for anomalous traffic Spreader Detection Phase
23
23 System Design Change detection –The change of the interim data structure of two time intervals Stealthy spreader detection k i ’ > c h (threshold) System architecture
24
24 Spreader Detection Phase The real scan event Number of distinct destination Number of scanners One Peak Close to 0
25
25 Spreader Detection Phase Linear regression for coupon collecting algorithm –Mean squared error as the fitting metric Bucket Example of linear regression Value of counting
26
26 Outline Motivation Problem definition System design Evaluation Conclusion
27
27 Evaluation Methodology Traffic traces –OC-48 CAIDA data on Aug. 14 th, 2002 –The average packet rate: 191K/s –The average flow rate: 3.75K/s A real scanning event collected from one class B honeynet on Jan 7 th, 2007 –Port 23 –2.5 hours –1,607 unique sources –1,700,236 scan sessions Synthetic scanning traces
28
28 Simulation Results Synthetic stealthy scan Estimate ratio The estimate ratio of scan outdegree Percentage of detection results False negative: 17.8% The estimation error within 20%: 33.9% False negative: 0 The estimation error within 20%: 76.1% Estimate ratio = Attack intensity =
29
29 Synthetic stealthy scan Simulation Results Estimate ratio CDF of estimate ratio for spreader intensity estimation Cumulative percentage (%) 35% 80%
30
30 Simulation Results Real stealthy scan Number of distinct destination The histogram of outdegree of scanners collected in the honeynet Number of scanners Estimation: 90 Ground truth: 87
31
31 Simulation Results Real stealthy scan Estimate ratio CDF of estimate ratios of scan outdegree estimation Cumulative percentage (%) 80% Mix the 5-min data of a real scanning event with 5-min normal traffic of CAIDA data (distribution over 30 such intervals)
32
32 Online Performance Memory consumption –Our method: O(c log(m)) Constant memory: 24×1KB = 24KB –Superspreader: When k is small, the memory usage is closer to the size of the entire data stream N. Memory access per packet –Single memory access per packet for each distinct counting structure –Speed up: processing in parallel or in pipeline Speed –3.2GHz Pentium 4 computer –Recording: 200 seconds for each 5-min CAIDA data interval –Detection: less than 0.1 second
33
33 Conclusion Propose the stealthy spreader detection problem Design an online outdegree histogram based stealthy spreader detection algorithm –Propose two randomized algorithms for recording phase –Propose the linear regression based approach for stealthy spreader detection
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.