Introspective Networks George Varghese University of California, San Diego
1.Basic: stateless, transparent. Tools: protocol design (e.g., soft-state) 2. Active: customizable, re-configurable Tools: Code Safety (e.g., sandboxing) 3. Cognitive: intelligent, reasoning Tools: AI (e.g., multi-agent systems) 4. Introspective: pattern detection/response Tools: Streaming algorithms, statistical inference (e.g. Bloom Filters, sampling) Network Evolution?
What is Introspection? Detecting patterns in data traffic, either in real- time or based on packet logs. Examples: Measurement Introspection: Identify resource usage patterns for better resource management Security Introspection: Identify attack patterns to mitigate or prevent attacks. Fault Introspection: Identify fault or anomaly patterns to allow automated fault repair. Motivated by market pull and technology push
Market Pull 1: Better ROI for ISPs Better ROI: Optimize resources (BGP policy, OSPF weights, light up fibers, add bandwidth) based on resource usage patterns. Better Isolation: Better QoS (200 msec versus 2000 msec delay for during Slammer) during attacks is major differentiator. Competitive Edge: Just as banks use data mining to better manage loan portfolios, can better manage “bandwidth portfolio”. Sprint Monitoring Proposal, IETF BOF 2003 ISP Customer Site 1 Customer Site 3 Customer Site 2 reroute or add B/W
Market Pull 2: Costs of (In)Security Cost: Too many isolated perimeter solutions (firewalls, IDS devices, patches). Total cost of ownership (TCO) very high. Delay: When perimeter detects, damage is already done. Complexity: End users finding and installing patches; or require router support for traceback which could be used for detection. Gartner Research: Security solutions deployed within enterprises by 2004 and within ISPs by 2006 ISP Attacker Zombie 1 Zombie N Victim IDS Firewall (patches) traceback
Technology Push: Streaming Algorithms and Hardware Gates Algorithms: Recent major thrust in streaming algorithms in database, web analysis, theory, networks Hardware: Memory accesses remain expensive (< 100) and SRAM not scaling as fast as number of connections (< 32 Mbits), but gates are plentiful. Mapping: Many randomized streaming algorithms (e.g., Bloom Filters, Min-wise hashing) developed to find patterns in disk logs map well to network ASICs. Opportunity: Invent or adapt streaming algorithms for networking patterns.
Concerns about Network Introspection Speed: Can hardware run fast enough? Recall IP lookups in 1990’s, surprisingly complex things (branch predictors, TCP Offload) being done routinely today. Even if not, can use algorithms to mine packet logs offline for insight. Inflexible: Hardware not easy to change. Design hardware to identify useful “primitive” patterns that can be combined. Network Processors (ISCA 2003) can offer flexibility & speed. End-to-end argument: Not simple, stateless core. Not required for correctness of basic forwarding, but only as an optimization or value-add.
Introspection as Pattern Detection Within Packet Patterns: Prefix matches, classification, signature detection (e.g., Code Red Payload) Across Packet Patterns: Scheduling, Timing, Heavy- hitters, large flows, partial completion. S1 S2 S5S2S1 ROUTER
Pattern Detection Algorithm Requirements Low memory: On-chip SRAM limited to around 32 Mbits. Not constant but is not scaling with number of concurrent conversations. Small processing: For wire-speed at 40 Gbps, using 40 byte packets, have 8 nsec. Using 1 nsec SRAM, 8 memory accesses. Factor of 30 in parallelism buys 240 accesses.
Talk Outline Part 1: Motivation Part 2: Basic Patterns and Algorithms (heavy- hitters, many flows, partial completion) Part 3: Combining patterns to solve useful application problems Part 4: Conclusions.
Pattern 1: Heavy-hitters Heavy-hitters: In a measurement interval, (e.g., 10 minutes) detect the flows (e.g., sources) on a link that send more than a threshold (say 1% of the traffic) on a link. S1 S6S2S5S2 Source S2 is 30 percent of traffic sequence Estan,Varghese, ACM TOCS 2003
Field Extraction Comparator Counters Hash 1 Hash 2 Hash 3 Stage 1 Stage 2 Stage 3 ALERT ! If all counters above threshold HeavyHitters via Multistage Filters Increment
Multistage filters in Action Grey = other flows Yellow = small flow Green = large flow Stage 1 Stage 3 Stage 2 Counters Threshold...
Multistage Filter Analysis Assume 1 percent threshold. Bound probability that a flow F of 0.1 % or less gets through 6 stages of size 1000 each. Why trouble?: F can fall into a ``hot'' bucket if and only the sum of traffic of all other flows in that bucket is more than 0.9 % Single stage probability: At most 100/0.9 = 111 buckets that can be over 0.9 % before we bring on F. Thus probability F falls in a ``hot'' bucket is less than 111/1000 = Multistage probability: To be branded, F must be unlucky in all 6 stages with a probability of no more than which is very small. Thus at most 1000 false positives with very high probability.
Pattern 2: Partial Completion Partial Completion: In a measurement interval, detect the flows (e.g., destinations) which have several Start Packets (e.g., SYN) without the corresponding End (e.g., FIN). Destination X has 3 partial completions in sequence SYN x SYN Y SYN z FIN Y SYN x FIN Z
Field Extraction Comparator Counters Hash 1 Hash 2 Hash 3 Stage 1 Stage 2 Stage 3 ALERT ! If all counters above threshold Partial Completion Filters Increment for SYN, Decrement for FIN
Interval 1Interval 2Interval 3Interval 4 Long Lived Connection SYN y Retransmissions FIN z Retransmissions SYN x FIN x Analysis 1: Benign but Malformed Connections Model benign but malformed connections as adding extra SYN or FIN to an interval with probability 0.5
Greater than 6 Probability of false positives = Probability of false negatives = Analysis 2: using Gaussian approximation Counter Values Probability
Pattern 3: Many Flows Many Flows: In a measurement interval, find if number of flows exceeds a threshold. S1 S6S2S5S2 6 packets but only 4 distinct sources
Simple Bitmap counting Problem: bitmap takes too much memory to count a large number of flows Hash based on flow identifier F Estimate: based on the number of bits set 1111
Sampled Bitmap counting Problem: inaccurate if too few or too many flows Solution: keep only a sample of the bitmap 11 Estimate: scale up sampled count
Multi-resolution Bitmap counting Solution: multiple bitmaps, each covering a different range Estimate: use first bitmap that has less than 93.1% of its bits set, count, scale 1-10 flows
Outline of Talk Part 1: Motivation Part 2: Basic Patterns and Algorithms Part 3: Combining base patterns to solve useful application problems (traffic matrix, DoS, worms) Part 4: Conclusions.
Application 1: Traffic Matrix Each entry router uses a multistage filter on traffic to destination prefixes to isolate subnets to which there is large traffic. Aggregating across all entry routers gives the “dominant” part of traffic matrix. ATT reports rule for prefixes. ISP Customer Site 1 Customer Site 3 Customer Site 2 reroute or add B/W
Application 2, Process Logs to Find Large Bandwidth Usage Patterns Multidimensional analysis via our tool Old methods look at a single dimension at a time Estan,Savage,Varghese, SIGCOMM 2003
Application 3: DoS Attacks Bandwidth attacks: (e.g.. Smurf). Pound victim with large traffic of certain type. Use heavy-hitter pattern relative to traffic type (e.g., ICMP) to find attacked destinations Partial Completion attacks: (e.g., TCP SYN- Flood). May not be unusual bandwidth but characterized by partial connections. Use partial completion pattern?
Network Core Attacker ISP Attacker 1 Attacker ISP Victim ISP Back-Scatter Detection Attacker n Victim Syn-Kill Syn-Defender Multops Syn-cookie/cache Syn-Dog TraceBack OUR SOLUTION Partial Completion Filters in network Syn-Flood Detection Options
Network Core Attacker ISP Attacker 1 Attacker ISP Victim ISP Back-Scatter Vantage Point Attacker n Victim Destination based SYN-FIN PCF for detection and defense (can be spoofed) Source based SYN-ACK/FIN PCF for BackScatter detection (Spoof-Proof) PCF Deployment Options
Application 4: Worm Detection Concrete approaches to worm containment: routers block packets with specific code signature. Manual signature extraction: slow and enormous effort for each new worm. Automatic signature extraction of a specific worm by automatically detecting an abstract worm. ISP Infected 1 Infected N New Victim Inactive Address
Abstract Worm Definition F1, Content Repetition: Payload of worm is seen frequently at router. F2, Increasing Infection Levels: Same content is disbursed to increasing number of distinct source- destination pairs. O1, Random Probing: Worm replicates by probing random IP addresses. O2, Code fragments: Worm payload contains content that has some resemblance to code.
Abstract Worm Detection F1, Content Repetition: Use heavy-hitter pattern with hash H of content as index. F2, Increasing Infection Levels: Use many flows pattern with content hash H as index. O1, Random Probing: Count dests sent with H in sample unused space ( Telescope, Moore et al) O2, Code fragments: Simple offline tests that test say for 8086 control transfer op-codes. First 3 tests need low memory, small processing
Spectre of Polymorphism Syntactic Polymorphism: Fragmentation on links with diff MTU sizes, offsets, No-Ops (use Rabin fingerprints at sampled offsets but does not help in case of encryption.) Semantic Polymorphism: Code rewriting at each new source (hard to detect, but raises bar to include a small compiler with worm payload.)
EarlyBird Experience System: Uses 39 byte Rabin fingerprints on tcpdump, looks for content repetition above low threshold, large memory currently. Deployment: sniffs on uplink of lab switch. 9 day period between May 2nd and May 10th million packets Latent Worms Found: -- (742 pairs) TCP/139 NetBios Attack -- (51 pairs) Code Red TCP/80 GET /default.ida -- Linux Slapper and 1 Unicode exploit False positives: "robots.txt", ``SSH SSH Secure Shell for Windows'‘, some VNC strings
Recent Experience with EarlyBird On Aug 11 th, Monday afternoon, found 133 repetitions of content for an RPC service. Lab machines stayed up but received many infection attempts Major security companies were already on the lookout for this, so MSBlaster was detected quickly. On the evening of Monday Aug 11 th, my home computer began rebooting every few minutes saying “mumble RPC mumble”
Conclusions Measurement introspection can improve ISP ROI and security introspection can reduce TCO. Can implement base patterns at high speeds. Base patterns can be combined to solve useful application issues (traffic matrix, DoS, worms, etc.) Only scratching surface: fault introspection, etc.,
Joint work with collaborators Stefan Savage (AutoFocus, EarlyBird) Students in Internet Algorithmics Lab: Ramana KompellaCristian EstanSumeet Singh