Security III “Outbreak!”

Slides:

Advertisements

Similar presentations

Code-Red : a case study on the spread and victims of an Internet worm David Moore, Colleen Shannon, Jeffery Brown Jonghyun Kim.

Advertisements

Automated Worm Fingerprinting [Singh, Estan et al] Internet Quarantine: Requirements for Self- Propagating Code [Moore, Shannon et al] David W. Hill CSCI.

 Looked at some research approaches to: o Evaluate defense effectiveness o Stop worm from spreading from a given host o Defend a circle of friends against.

 Population: N=100,000  Scan rate  = 4000/sec, Initially infected: I 0 =10  Monitored IP space 2 20, Monitoring interval:  = 1 second Infected hosts.

 Well-publicized worms  Worm propagation curve  Scanning strategies (uniform, permutation, hitlist, subnet) 1.

Internet Quarantine: Requirements for Containing Self- Propagating Code David Moore, Colleen Shannon, Geoffrey M. Voelker, Stefan Savage.

Copyright Silicon Defense Worm Overview Stuart Staniford Silicon Defense

Shivkumar KalyanaramanRensselaer Q1-1 ECSE-6600: Internet Protocols Quiz 1 Time: 60 min (strictly enforced) Points: 50 YOUR NAME: Be brief, but DO NOT.

Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi.

Worm Defense. Outline  Internet Quarantine: Requirements for Containing Self-Propagating Code  Netbait: a Distributed Worm Detection Service  Midgard.

Algorithms for Network Security

How to Own the Internet in your spare time Ashish Gupta Network Security April 2004.

Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.

UCSD Potemkin Honeyfarm Jay Chen, Ranjit Jhala, Chris Kanich, Erin Kenneally, Justin Ma, David Moore, Stefan Savage, Colleen Shannon, Alex Snoeren, Amin.

Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage Presenter: Martin Krogel.

A Study of Mass- mailing Worms By Cynthia Wong, Stan Bielski, Jonathan M. McCune, and Chenxi Wang, Carnegie Mellon University, 2004 Presented by Allen.

FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.

Introduction to Honeypot, Botnet, and Security Measurement

Packet Filtering. 2 Objectives Describe packets and packet filtering Explain the approaches to packet filtering Recommend specific filtering rules.

Automated Worm Fingerprinting

Chapter 6: Packet Filtering

Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.

Content Sifting Stefan Savage Sumeet Singh, Cristian Estan, George Varghese, Justin Ma, Kirill Levchenko.

1 How to 0wn the Internet in Your Spare Time Authors: Stuart Staniford, Vern Paxson, Nicholas Weaver Publication: Usenix Security Symposium, 2002 Presenter:

The Internet Motion Sensor: A Distributed Blackhole Monitoring System Presented By: Arun Krishnamurthy Authors: Michael Bailey, Evan Cooke, Farnam Jahanian,

Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering.

Vigilante: End-to-End Containment of Internet Worms Authors : M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham In Proceedings.

Modeling Worms: Two papers at Infocom 2003 Worms Programs that self propagate across the internet by exploiting the security flaws in widely used services.

Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage.

IEEE Communications Surveys & Tutorials 1st Quarter 2008.

Fundamentals of Proxying. Proxy Server Fundamentals  Proxy simply means acting on someone other’s behalf  A Proxy acts on behalf of the client or user.

The UCSD Network Telescope A Real-time Monitoring System for Tracking Internet Attacks Stefan Savage David Moore, Geoff Voelker, and Colleen Shannon Department.

Presented by: Reem Alshahrani. Outlines What is Virtualization Virtual environment components Advantages Security Challenges in virtualized environments.

Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore, Colleen Shannon, Geoffrey M.Voelker, Stefan Savage University of California,

1 On the Performance of Internet Worm Scanning Strategies Authors: Cliff C. Zou, Don Towsley, Weibo Gong Publication: Journal of Performance Evaluation,

1 Very Fast containment of Scanning Worms By: Artur Zak Modified by: David Allen Nicholas Weaver Stuart Staniford Vern Paxson ICSI Nevis Netowrks ICSI.

Worm Defense Alexander Chang CS239 – Network Security 05/01/2006.

SPYCE/May’04 coverage: A Cooperative Immunization System for an Untrusting Internet Kostas Anagnostakis University of Pennsylvania Joint work with: Michael.

Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm Michael Vrable, Justin Ma, Jay chen, David Moore, Erik Vandekieft, Alex C. Snoeren,

Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.

1 Modeling, Early Detection, and Mitigation of Internet Worm Attacks Cliff C. Zou Assistant professor School of Computer Science University of Central.

Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.

Role Of Network IDS in Network Perimeter Defense.

CSE 454 Security III “Outbreak!”. Internet Outbreaks: Epidemiology and Defenses Stefan Savage Cooperative Center for Internet Epidemiology and Defenses.

Very Fast containment of Scanning Worms Presented by Vinay Makula.

The Worms Crawl In The Worms Crawl Out David Andersen.

SketchVisor: Robust Network Measurement for Software Packet Processing

DISA Cyclops Program.

CompTIA Security+ Study Guide (SY0-401)

Internet Quarantine: Requirements for Containing Self-Propagating Code

TMG Client Protection 6NPS – Session 7.

POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms

Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft,

Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)

Data Streaming in Computer Networking

Very Fast containment of Scanning Worms

Worm Origin Identification Using Random Moonwalks

AP CSP: Making a reliable Internet & DNS

Automated Worm Fingerprinting

CompTIA Security+ Study Guide (SY0-401)

SCREAM: Sketch Resource Allocation for Software-defined Measurement

CS 268: Lecture 19 (Malware) Ion Stoica Computer Science Division

Mapping Internet Sensors With Probe Response Attacks

Memento: Making Sliding Windows Efficient for Heavy Hitters

Modeling, Early Detection, and Mitigation of Internet Worm Attacks

Automated Worm Fingerprinting

Jonathan Griffin Andy Norman Jamie Twycross Matthew Williamson

Session 20 INST 346 Technologies, Infrastructure and Architecture

Introduction to Internet Worm

Presentation transcript:

Security III “Outbreak!” CSE 454 Security III “Outbreak!”

Internet Outbreaks: Epidemiology and Defenses Stefan Savage Cooperative Center for Internet Epidemiology and Defenses Department of Computer Science & Engineering University of California at San Diego In collaboration with Cristian Estan, Justin Ma, David Moore, Vern Paxson (ICSI), Colleen Shannon, Sumeet Singh, Alex Snoeren, Stuart Staniford (Nevis), Amin Vahdat, George Varghese, Geoff Voelker, Michael Vrable, Nick Weaver (ICSI)

Collaborative Center for Internet Epidemiology and Defenses (CCIED) Joint project (UCSD/ICSI) Other PIs: Vern Paxson, Nick Weaver, Geoff Voelker, George Varghese ~15 staff and students in addition Funded by NSF with additional support from Microsoft, Intel, HP, and UCSD’s CNS Three key areas of interest Infrastructure and analysis for understanding large-scale Internet threads Automated defensive technologies Forensic and legal requirements Slides © Stefan Savage, UCSD

How Chicken Little sees the Internet… Slides © Stefan Savage, UCSD

Why Chicken Little is a naïve optimist Imagine the following species: Poor genetic diversity; heavily inbred Lives in “hot zone”; thriving ecosystem of infectious pathogens Instantaneous transmission of disease Immune response 10-1M times slower Poor hygiene practices What would its long-term prognosis be? What if diseases were designed… Trivial to create a new disease Highly profitable to do so Slides © Stefan Savage, UCSD

Threat transformation Traditional threats Attacker manually targets high-value system/resource Defender increases cost to compromise high-value systems Biggest threat: insider attacker Modern threats Attacker uses automation to target all systems at once (can filter later) Defender must defend all systems at once Biggest threats: software vulnerabilities & naïve users Slides © Stefan Savage, UCSD

Large-scale technical enablers Unrestricted connectivity Large-scale adoption of IP model for networks & apps Software homogeneity & user naiveté Single bug = mass vulnerability in millions of hosts Trusting users (“ok”) = mass vulnerability in millions of hosts Few meaningful defenses Effective anonymity (minimal risk) Slides © Stefan Savage, UCSD

Driving Economic Forces No longer just for fun, but for profit SPAM forwarding (MyDoom.A backdoor, SoBig), Credit Card theft (Korgo), DDoS extortion, etc… Symbiotic relationship: worms, bots, SPAM, etc Fluid third-party exchange market (millions of hosts for sale) Going rate for SPAM proxying 3 -10 cents/host/week Seems small, but 25k botnet gets you $40k-130k/yr Generalized search capabilities are next “Virtuous” economic cycle The bad guys have large incentive to get better Slides © Stefan Savage, UCSD

Today’s focus: Outbreaks Acute epidemics of infectious malcode designed to actively spread from host to host over the network E.g. Worms, viruses (ignore pedantic distinctions) Why epidemics? Epidemic spreading is the fastest method for large-scale network compromise Why fast? Slow infections allow much more time for detection, analysis, etc (traditional methods may cope) Slides © Stefan Savage, UCSD

A pretty fast outbreak: Slammer (2003) First ~1min behaves like classic random scanning worm Doubling time of ~8.5 seconds CodeRed doubled every 40mins >1min worm starts to saturate access bandwidth Some hosts issue >20,000 scans per second Self-interfering (no congestion control) Peaks at ~3min >55million IP scans/sec 90% of Internet scanned in <10mins Infected ~100k hosts (conservative) See: Moore et al, IEEE Security & Privacy, 1(4), 2003 for more details Slides © Stefan Savage, UCSD

Was Slammer really fast? Yes, it was orders of magnitude faster than CR No, it was poorly written and unsophisticated Who cares? It is literally an academic point The current debate is whether one can get < 500ms Bottom line: way faster than people! Slides © Stefan Savage, UCSD

How to think about worms Reasonably well described as infectious epidemics Simplest model: Homogeneous random contacts Classic SI model N: population size S(t): susceptible hosts at time t I(t): infected hosts at time t ß: contact rate i(t): I(t)/N, s(t): S(t)/N courtesy Paxson, Staniford, Weaver Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD What’s important? There are lots of improvements to the model… Chen et al, Modeling the Spread of Active Worms, Infocom 2003 (discrete time) Wang et al, Modeling Timing Parameters for Virus Propagation on the Internet , ACM WORM ’04 (delay) Ganesh et al, The Effect of Network Topology on the Spread of Epidemics, Infocom 2005 (topology) … but the bottom line is the same. We care about two things: How likely is it that a given infection attempt is successful? Target selection (random, biased, hitlist, topological,…) Vulnerability distribution (e.g. density – S(0)/N) How frequently are infections attempted? ß: Contact rate Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD What can be done? Reduce the number of susceptible hosts Prevention, reduce S(t) while I(t) is still small (ideally reduce S(0)) Reduce the contact rate Containment, reduce ß while I(t) is still small Slides © Stefan Savage, UCSD

Prevention: Software Quality Goal: eliminate vulnerability Static/dynamic testing (e.g. Cowan, Wagner, Engler, etc) Software process, code review, etc. Active research community Taken seriously in industry Security code review alone for Windows Server 2003 ~ $200M Traditional problems: soundness, completeness, usability Practical problems: scale and cost Slides © Stefan Savage, UCSD

Prevention: Software Heterogeneity Goal: reduce impact of vulnerability Use software diversity to tolerate attack Exploit existing heterogeneity Junqueria et al, Surviving Internet Catastrophes, USENIX ’05 Create Artificial heterogeneity (hot topic) Forrest et al, Building Diverse Computer Systems, HotOS ‘97 Large contemporary literature Open questions: class of vulnerabilities that can be masked, strength of protection, cost of support Slides © Stefan Savage, UCSD

Prevention: Software Updating Goal: reduce window of vulnerability Most worms exploit known vulnerability (1 day -> 3 months) Window shrinking: automated patch->exploit Patch deployment challenges, downtime, Q/A, etc Rescorla, Is finding security holes a good idea?, WEIS ’04 Network-based filtering: decouple “patch” from code E.g. TCP packet to port 1434 and > 60 bytes Wang et al, Shield: Vulnerability-Driven Network Filters for Preventing Known Vulnerability Exploits, SIGCOMM ‘04 Symantec: Generic Exploit Blocking Automated patch creation: fix the vulnerability on-line Sidiroglou et al, Building a Reactive Immune System for Software Services, USENIX ‘05 Anti-worms: block the vulnerability and propagate Castaneda et al, Worm vs WORM: Preliminary Study of an Active counter-Attack Mechanism, WORM ‘04 proactive reactive Slides © Stefan Savage, UCSD

Prevention: Hygiene Enforcement Goal: keep susceptible hosts off network Only let hosts connect to network if they are “well cared for” Recently patched, up-to-date anti-virus, etc… Automated version of what they do by hand at NSF Cisco Network Admission Control (NAC) Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD What can be done? Reduce the number of susceptible hosts Prevention, reduce S(t) while I(t) is still small (ideally reduce S(0)) Reduce the contact rate Containment, reduce ß while I(t) is still small Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Containment Reduce contact rate Slow down Throttle connection rate to slow spread Twycross & Williamson, Implementing and Testing a Virus Throttle, USENIX Sec ‘03 Important capability, but worm still spreads… Quarantine Detect and block worm Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Defense requirements We can define reactive defenses in terms of: Reaction time – how long to detect, propagate information, and activate response Containment strategy – how malicious behavior is identified and stopped Deployment scenario - who participates in the system Given these, what are the engineering requirements for any effective defense? Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Methodology Simulate spread of worm across Internet topology Infected hosts attempt to spread at a fixed rate (probes/sec) Target selection is uniformly random over IPv4 space Source data Vulnerable hosts: 359,000 IP addresses of CodeRed v2 victims Internet topology: AS routing topology derived from RouteViews Simulation of defense System detects infection within reaction time Subset of network nodes employ a containment strategy Evaluation metric % of vulnerable hosts infected in 24 hours 100 runs of each set of parameters (95th percentile taken) Systems must plan for reasonable situations, not the average case See: Moore et al, Internet Quarantine: Requirements for Containing Self-Propagating Code, Infocom 2003 for more details Slides © Stefan Savage, UCSD

Naïve model: Universal deployment Assume every host employs the containment strategy Two containment strategies : Address filtering: Block traffic from malicious source IP addresses Reaction time is relative to each infected host MUCH easier to implement Content filtering: Block traffic based on signature of content Reaction time is from first infection How quickly does each strategy need to react? How sensitive is reaction time to worm probe rate? Slides © Stefan Savage, UCSD

How quickly does each strategy need to react? Address Filtering Reaction time (minutes) Reaction time (hours) % Infected (95th perc.) Content Filtering: % Infected (95th perc.) To contain worms to 10% of vulnerable hosts after 24 hours of spreading at 10 probes/sec (CodeRed-like): Address filtering: reaction time must be < 25 minutes. Content filtering: reaction time must be < 3 hours Slides © Stefan Savage, UCSD

How sensitive is reaction time to worm probe rate? Content Filtering: probes/second reaction time Reaction times must be fast when probe rates get high: 10 probes/sec: reaction time must be < 3 hours 1000 probes/sec: reaction time must be < 2 minutes Slides © Stefan Savage, UCSD

Limited network deployment Depending on every host to implement containment is probably a bit optimistic: Installation and administration costs System communication overhead A more realistic scenario is limited deployment in the network: Customer Network: firewall-like inbound filtering of traffic ISP Network: traffic through border routers of large transit ISPs How effective are the deployment scenarios? How sensitive is reaction time to worm probe rate under limited network deployment? Slides © Stefan Savage, UCSD

How effective are the deployment scenarios? % Infected at 24 hours (95th perc.) Top 100 CodeRed-like Worm 25% 50% 75% 100% Top 10 Top 20 Top 30 Top 40 All Slides © Stefan Savage, UCSD

How sensitive is reaction time to worm probe rate? probes/second Top 100 ISPs Above 60 probes/sec, containment to 10% hosts within 24 hours is impossible for top 100 ISPs even with instantaneous reaction. Slides © Stefan Savage, UCSD

Defense requirements summary Reaction time Required reaction times are a couple minutes or less for CR-style worms (seconds for worms like Slammer) Containment strategy Content filtering is far more effective than address blacklisting for a given reaction speed Deployment scenarios Need nearly all customer networks to provide containment Need at least top 40 ISPs provide containment; top 100 ideal Is this possible? Lets see… Slides © Stefan Savage, UCSD

Outbreak Detection/Monitoring Two classes of detection Scan detection: detect that host is infected by infection attempts Signature inference: automatically identify content signature for exploit (sharable) Two classes of monitors Ex-situ: “canary in the coal mine” Network Telescopes HoneyNets/Honeypots In-situ: real activity as it happens Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Network Telescopes Infected host scans for other vulnerable hosts by randomly generating IP addresses Network Telescope: monitor large range of unused IP addresses – will receive scans from infected host Very scalable. UCSD monitors 17M+ addresses Slides © Stefan Savage, UCSD

Telescopes + Active Responders Problem: Telescopes are passive, can’t respond to TCP handshake Is a SYN from a host infected by CodeRed or Welchia? Dunno. What does the worm payload look like? Dunno. Solution: proxy responder Stateless: TCP SYNACK (Internet Motion Sensor), per-protocol responders (iSink) Stateful: Honeyd Can differentiate and fingerprint payload False positives generally low since no regular traffic Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD HoneyNets Problem: don’t know what worm/virus would do? No code ever executes after all. Solution: redirect scans to real “infectable” hosts (honeypots) Individual hosts or VM-based: Collapsar, HoneyStat, Symantec Can reduce false positives/negatives with host-analysis (e.g. TaintCheck, Vigilante, Minos) and behavioral/procedural signatures Challenges Scalability Liability (honeywall) Isolation (2000 IP addrs -> 40 physical machines) Detection (VMWare detection code in the wild) Slides © Stefan Savage, UCSD

The Scalability/Fidelity tradeoff Telescopes + Responders (iSink, Internet Motion Sensor) Network Telescopes (passive) VM-based Honeynet Live Honeypot Nada Most Scalable Highest Fidelity Slides © Stefan Savage, UCSD

New CCIED project: large scale high-fidelity honeyfarm Goal: emulate significant fraction of Internet hosts (1M) Multiplex large address space on smaller # of servers Temporal & spatial multiplexing Global Internet 64x /16 advertised Physical Honeyfarm Servers VM MGMT Gateway GRE Tunnels Potemkin VMM: large #’s VMs/host Delta Virtualization: copy-on-write VM image Flash Cloning: on-demand VM (<1ms) Slides © Stefan Savage, UCSD

Overall limitations of telescope, honeynet, etc monitoring Depends on worms scanning it What if they don’t scan that range (smart bias) What if they propagate via e-mail, IM? Inherent tradeoff between liability exposure and detectability Honeypot detection software exists It doesn’t necessary reflect what’s happening on your network (can’t count on it for local protection) Hence, we’re always interested in native detection as well Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Scan Detection Idea: detect worm’s infection attempts In the small: ZoneAlarm, but how to do in the network? Indirect scan detection Wong et al, A Study of Mass-mailing Worms, WORM ’04 Whyte et al. DNS-based Detection of Scanning Worms in an Enterprise Network, NDSS ‘05 Direct scan detection Weaver et al. Very Fast Containment of Scanning Worms, USENIX Sec ’04 Threshold Random Walk – bias source based on connection success rate (Jung et al); use approximate state for fast hardware implementation Can support multi-Gigabit implementation, detect scan within 10 attempts Few false positives: Gnutella (finding accessing), Windows File Sharing (benign scanning) Venkataraman et al, New Streaming Algorithms for Fast Detection of Superspreaders, just recently Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Signature inference Challenge: need to automatically learn a content “signature” for each new worm – potentially in less than a second! Singh et al, Automated Worm Fingerprinting, OSDI ’04 Kim et al, Autograph: Toward Automated, Distributed Worm Signature Detection, USENIX Sec ‘04 Slides © Stefan Savage, UCSD

Approach Monitor network and look for strings common to traffic with worm-like behavior Signatures can then be used for content filtering PACKET HEADER SRC: 11.12.13.14.3920 DST: 132.239.13.24.5000 PROT: TCP 00F0 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................ 0100 90 90 90 90 90 90 90 90 90 90 90 90 4D 3F E3 77 ............M?.w 0110 90 90 90 90 FF 63 64 90 90 90 90 90 90 90 90 90 .....cd......... 0120 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................ 0130 90 90 90 90 90 90 90 90 EB 10 5A 4A 33 C9 66 B9 ..........ZJ3.f. 0140 66 01 80 34 0A 99 E2 FA EB 05 E8 EB FF FF FF 70 f..4...........p . . . PACKET PAYLOAD (CONTENT) Kibvu.B signature captured by Earlybird on May 14th, 2004 Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Content sifting Assume there exists some (relatively) unique invariant bitstring W across all instances of a particular worm (true today, not tomorrow...) Two consequences Content Prevalence: W will be more common in traffic than other bitstrings of the same length Address Dispersion: the set of packets containing W will address a disproportionate number of distinct sources and destinations Content sifting: find W’s with high content prevalence and high address dispersion and drop that traffic Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD The basic algorithm B A Detector in network cnn.com C E D >> Network detector, for example this could be a firewall or a router or a switch. Exemplify the ideal algorithm Say that, we use the ptable to count the number of times we see the worm or content, And the dispersion table to count the number of unique sources and destinations. We have a detector at a vantage point that is watching traffic say at the entrance to a network or at a data center, that is implementing ‘content sifting’ Let me play through an animation that shows you how the algorithm works. Address Dispersion Table Sources Destinations Prevalence Table Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD The basic algorithm B A Detector in network cnn.com C E D Exemplify the ideal algorithm We have a detector at a vantage point that is watching traffic say at the entrance to a network or at a data center, that is implementing ‘content sifting’ Let me play through an animation that shows you how the algorithm works. 1 (B) 1 (A) Address Dispersion Table Sources Destinations 1 Prevalence Table Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD The basic algorithm B A Detector in network cnn.com C E D Exemplify the ideal algorithm We have a detector at a vantage point that is watching traffic say at the entrance to a network or at a data center, that is implementing ‘content sifting’ Let me play through an animation that shows you how the algorithm works. 1 (A) 1 (C) 1 (B) Address Dispersion Table Sources Destinations 1 Prevalence Table Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD The basic algorithm B A Detector in network cnn.com C E D Exemplify the ideal algorithm We have a detector at a vantage point that is watching traffic say at the entrance to a network or at a data center, that is implementing ‘content sifting’ Let me play through an animation that shows you how the algorithm works. 1 (A) 1 (C) 2 (B,D) 2 (A,B) Address Dispersion Table Sources Destinations 1 2 Prevalence Table Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD The basic algorithm B A Detector in network cnn.com C E D Exemplify the ideal algorithm We have a detector at a vantage point that is watching traffic say at the entrance to a network or at a data center, that is implementing ‘content sifting’ Let me play through an animation that shows you how the algorithm works. 1 (A) 1 (C) 3 (B,D,E) 3 (A,B,D) Address Dispersion Table Sources Destinations 1 3 Prevalence Table Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Challenges Computation To support a 1Gbps line rate we have 12us to process each packet Dominated by memory references; state expensive Content sifting requires looking at every byte in a packet State On a fully-loaded 1Gbps link a naïve implementation can easily consume 100MB/sec for tables ============================= This basic algorithm is cute, and it works, but it is not scalable. Prevailing network devices operate in the range of 100Megabits to 10Gigabits per second, we need an algorithm that scales to those line rates. The first problem with the basic algorithm is computation, on a gigabit link we have a total of 12 microseconds of processing time per packet, this severely limits the number of memory references per packet. On top of that we are talking about processing packet payloads not just header, imagine processing a 1500 byte packet in under 12 seconds. The more pressing limitation, is the amount of state that we can keep at these line rates. We can fill a gigabyte, with just the prevalence table, in under 10 seconds Thus, our two challenges are to dramatically reduce memory and computation to track things over time. The remainder of this talk is focused on that. ======================== Doing that (line-rate) There are challnges 100Mbps  10Gbps that is the range of prevailing network devices, need an algo that scales in that range. What is the domain: “for example at 100mbps at commodity ram we can get these many per packet’ Content  change to.. Packet payload, deep packet inspection.. Put some text on the slide. WHAT ARE THE CHALLENGES WE COULD MAINTIN A HISTOGRAM FOR ALL SUBSTRINGS, AND A LIST FOR ALL ADDRESSES IT takes Takes huge amount of state IF YOU WANT TO BE ABLE TO GO AT 200Mbps, you can only afford these many memory references per packet, Naïve way to do this is to keep table for every substring and addresses, but in fact to make it fly at line speed Here are the requirements that we have.. Clearly the naïve algorithm does not scale, We need to be able to talk about things in the string domain. We are looking for a byte-string. Sequential byte-string that is invariant. We are only gong to deal with worms for which that is true. =========== 1500 byte packet 1461 signatures of 40 bytes  6000 memory references (~ 4 memory references per byte) 60ns DRAM  Max rate 0.03 Mbits/sec Slides © Stefan Savage, UCSD

Kim et al’s solution: Autograph Pre-filter flows for those that exhibit scanning behavior (i.e. low TCP connection ratio) HUGE reduction in input, fewer prevalent substrings Don’t need to track dispersion at all Fewer possibilities of false positives However, only works with TCP scanning worms Not UDP (Slammer), e-mail viruses (MyDoom), IM-based worms (Bizex), P2P (Benjamin) Alternatives? More efficient algorithms. Slides © Stefan Savage, UCSD

Which substrings to index? Approach 1: Index all substrings Way too many substrings  too much computation  too much state Approach 2: Index whole packet Very fast but trivially evadable (e.g., Witty, Email Viruses) Approach 3: Index all contiguous substrings of a fixed length ‘S’ Can capture all signatures of length ‘S’ and larger A B C D E F G H I J K Slides © Stefan Savage, UCSD

How to represent substrings? Store hash instead of literal to reduce state Incremental hash to reduce computation Rabin fingerprint is one such efficient incremental hash function [Rabin81,Manber94] One multiplication, addition and mask per byte P1 R A N D A B C D O M Fingerprint = 11000000 P2 R A B C D A N D O M Fingerprint = 11000000 Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD How to subsample? Approach 1: sample packets If we chose 1 in N, detection will be slowed by N Approach 2: sample at particular byte offsets Susceptible to simple evasion attacks No guarantee that we will sample same sub-string in every packet Approach 3: sample based on the hash of the substring Slides © Stefan Savage, UCSD

Value sampling [Manber ’94] Sample hash if last ‘N’ bits of the hash are equal to the value ‘V’ The number of bits ‘N’ can be dynamically set The value ‘V’ can be randomized for resiliency Ptrack  Probability of selecting at least one substring of length S in a L byte invariant For 1/64 sampling (last 6 bits equal to 0), and 40 byte substrings Ptrack = 99.64% for a 400 byte invariant Fingerprint = 11000000 SAMPLE A B C D E F G H I J K Fingerprint = 11000001 IGNORE Fingerprint = 11000010 IGNORE Fingerprint = 10000000 SAMPLE Slides © Stefan Savage, UCSD

Observation: High-prevalence strings are rare Cumulative fraction of signatures Only 0.6% of the 40 byte substrings repeat more than 3 times in a minute SO FAR we have been successful in considerably reducing the number of candidate substring that we process but we need to dramatically reduce the state as well. We observe that in actual traffic most substrings appear just once! What this plot shows is blah blah! By focusing on the substrings that repeat we can get 2 or more orders of imp If we can get 2 or more orders of magnitue improvement. In state reducting if we use frequency. We observe that frequently appearing substrings are rare. Insight is that if we use the data structure that tracks only the really common ones then the state requirement goes away. It’s ok to not count all If we can capture the substrings that occur a lot then we don’t need to keep state for 99% of the things that go on ** COMMENT: The fact that content prevalence acts as a high pass filter simply because the way the traffic.. Number of repeats Slides © Stefan Savage, UCSD

Efficient high-pass filters for content Only want to keep state for prevalent substrings Chicken vs egg: how to count strings without maintaining state for them? Multi Stage Filters: randomized technique for counting “heavy hitter” network flows with low state and few false positives [Estan02] Instead of using flow id, use content hash Three orders of magnitude memory savings Slides © Stefan Savage, UCSD

Finding “heavy hitters” via Multistage Filters Increment Field Extraction Comparator Counters Hash 1 Hash 2 Hash 3 Stage 1 Stage 2 Stage 3 ALERT ! If all counters above threshold Slides © Stefan Savage, UCSD

Multistage filters in action Counters . . . Threshold Grey = other hahes Stage 1 Yellow = rare hash Green = common hash Stage 2 Stage 3 Slides © Stefan Savage, UCSD

Observation: High address dispersion is rare too Naïve implementation might maintain a list of sources (or destinations) for each string hash But dispersion only matters if its over threshold Approximate counting may suffice Trades accuracy for state in data structure Scalable Bitmap Counters Similar to multi-resolution bitmaps [Estan03] Reduce memory by 5x for modest accuracy error There are well known techniques but they are heavy weight, we have developed a problem specific one for worms. 0.285 INSIGHT, we only care if there are a lot Use a data structure that trades accuracy for state. THIS IS A DIFFERENT PROBLEM. There is this bunch of work that can provide counts, even with this, it still consumes significant amount of state, but we are going to leverage property of worms that the numbers are increasing. We can bound the error with a function of number of bitmaps, with 3 bitmaps, Worm outbreak will have an increasing number of source IPs and destination IPs associated with it. Slides © Stefan Savage, UCSD

Scalable Bitmap Counters 1 1 Hash(Source) Hash : based on Source (or Destination) Sample : keep only a sample of the bitmap Estimate : scale up sampled count Adapt : periodically increase scaling factor With 3, 32-bit bitmaps, error factor = 28.5% Incase you are curious here is how we do it, for those of you familiar with mrbitmaps Imagine that we had a bitmap of size the max number of sources, and all we need to do is hash the source address and count But unfortunately if we want to count a million, we need a million bits If the hash comes in the first 32 we mark the bits else ignore In the end, we count the number of bits and scale up.. If you wanted to count up to 32K using 32 bits, we have salced by 1K so we scale up by 1K Trouble is that if there are only 10 sources, therefore we start small and grow up and that is how a worm outbreak happens.. Error Factor = 2/(2numBitmaps-1) Slides © Stefan Savage, UCSD

Content sifting summary Index fixed-length substrings using incremental hashes Subsample hashes as function of hash value Multi-stage filters to filter out uncommon strings Scalable bitmaps to tell if number of distinct addresses per hash crosses threshold Now its fast enough to implement Slides © Stefan Savage, UCSD

Software prototype: Earlybird To other sensors and blocking devices AMD Opteron 242 (1.6Ghz) Linux 2.6 Libpcap EB Sensor code (using C) EarlyBird Sensor EarlyBird Aggregator EB Aggregator (using C) Mysql + rrdtools Apache + PHP Linux 2.6 TAP Summary data Setup 1: Large fraction of the UCSD campus traffic, Traffic mix: approximately 5000 end-hosts, dedicated servers for campus wide services (DNS, Email, NFS etc.) Line-rate of traffic varies between 100 & 500Mbps. Setup 2: Fraction of local ISP Traffic, Traffic mix: dialup customers, leased-line customers Line-rate of traffic is roughly 100Mbps. We are not keeping track of inbound and outbound Reporting & Control Slides © Stefan Savage, UCSD

Content Sifting in Earlybird IAMAWORM Update Multistage Filter (0.146) 2MB Multi-stage Filter Key = RabinHash(“IAMA”) (0.349, 0.037) Found ADTEntry? Prevalence Table value sample key NO is prevalence > thold Scalable bitmaps with three, 32-bit stages Each entry is 28bytes. ADTEntry=Find(Key) (0.021) YES This slide will be an animation Show multi stage filters graphically YES KEY Repeats Sources Destinations Update Entry (0.027) Create & Insert Entry (0.37) Address Dispersion Table Slides © Stefan Savage, UCSD

Content sifting overhead Mean per-byte processing cost 0.409 microseconds, without value sampling 0.042 microseconds, with 1/64 value sampling (~60 microseconds for a 1500 byte packet, can keep up with 200Mbps) Additional overhead in per-byte processing cost for flow-state maintenance (if enabled): 0.042 microseconds Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Experience Generally… ahem... good. Detected and automatically generated signatures for every known worm outbreak over eight months Can produce a precise signature for a new worm in a fraction of a second Known worms detected: Code Red, Nimda, WebDav, Slammer, Opaserv, … Unknown worms (with no public signatures) detected: MsBlaster, Bagle, Sasser, Kibvu, … Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Sasser Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Sasser Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Kibvu Slower spread (1.5 packets/minute inbound) Consequently, slower detection (42mins to dispersion of 30) Response time is wrong metric… dispersion=30 More obscure worm WHAT TO DO WITH THIS ONE. What is DURATION: 2 things you care about fn and fp Fn in this case is worm or virus that we monitored and did not detect Easy to deomnstrate tha tyou had one Very difficult to prove that you did not We offer the best evidence that we can by compring our results Operationlally we detect every worm outbreak that We compared for a trace of packet activit and played the same trace for our system and snort configured to look for worms and we found … T dispersion=9 dispersion=4 dispersion=1 Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD False Negatives Easy to prove presence, impossible to prove absence Live evaluation: over 8 months detected every worm outbreak reported on popular security mailing lists Offline evaluation: several traffic traces run against both Earlybird and Snort IDS (w/all worm-related signatures) Worms not detected by Snort, but detected by Earlybird The converse never true Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD False Positives Common protocol headers Mainly HTTP and SMTP headers Distributed (P2P) system protocol headers Procedural whitelist Small number of popular protocols Non-worm epidemic Activity SPAM BitTorrent GNUTELLA.CONNECT /0.6..X-Max-TTL: .3..X-Dynamic-Qu erying:.0.1..X-V ersion:.4.0.4..X -Query-Routing:. 0.1..User-Agent: .LimeWire/4.0.6. .Vendor-Message: .0.1..X-Ultrapee r-Query-Routing: Too many fp’s can rended a system unusable.. Since our goal is to automate So fp’s need to be rare or have minimal impact. Out of the box, over this period of time, these many alerts, these many real, these many false positives Of these XXX are whitelistable, we have a procedural whitelist, where we describe thest type of protocls These fraction goes away, what is left in terms of false positives is SPAM, and BitTorrent SPAM is being sent through lots of relays, with most p2p protocols content does not match, 1 to many BT has particuar char that they replicate piece of content, many-to-many Number of heuristics that one can try to use to address these two remaining classes If large classes appear In eterprises, have no interest in rec’ spam and p2p protocols, as such this weakness is considered an asset by them. COULD EMPLOY HEURISICS, Don’t pretend to have solved the SPAM problem. The weakness in our approach that causes us to mischaracterize spam and bittorrent as a bad activity is viewed as a feature If epidemic protocols become popular MAILING LIST Challenge in bittorrent is that many sites send and load from each other the same thing. Spam is other class, that cannot be whitlisted Saving grace is that it is bursty Totally resonable approach for this is to rate-limit these because its not the worst thing in the world if it gets to you slower Slides © Stefan Savage, UCSD

Limitations/ongoing work Variant content Polymorphism, metamorphism Newsom et al, Polygraph: Automatically Generating Signatures for Polymorphic Worms, Oakland ‘05 Network evasion Normalization at high-speed tricky End-to-end encryption vs content-based security Privacy vs security policy Self-tuning thresholds Slow/stealthy worms DoS via manipulation Principal down side is that there are large classes of worms that spread topologically that you can’t detect that way. We spent significant effort in developing the scalable algorithm. The inspiration of the algorithms is similar Slides © Stefan Savage, UCSD

Slides © Stefan Savage, UCSD Summary Internet-connected hosts are highly vulnerable to worm outbreaks Millions of hosts can be “taken” before anyone realizes If only 10,000 hosts are targeted, no one may notice Prevention is a critical element, but there will always be outbreaks Containment requires fully automated response Scaling issues favor network-based defenses Different detection strategies, monitoring approaches Very active research community Content sifting: automatically sift bad traffic from good Slides © Stefan Savage, UCSD