Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College.

Slides:



Advertisements
Similar presentations
Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.
Advertisements

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
IP Traceback in Cloud Computing Through Deterministic Flow Marking Mouiad Abid Hani Presentation figures are from references given on slide 21. By Presented.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Stefan Savage, David Wetherall, Anna Karlin and Tom Anderson University of Washington- Seattle, WA Presented by Mohammad Hajjat- Purdue University Slides.
Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy.
Defending against Large-Scale Distributed Denial-of-Service Attacks Department of Electrical and Computer Engineering Advanced Research in Information.
Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.
Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Sampling and Flow Measurement Eric Purpus 5/18/04.
Distributed Algorithms for Secure Multipath Routing
Zhang Fu, Marina Papatriantafilou, Philippas Tsigas Chalmers University of Technology, Sweden 1 ACM SAC 2010 ACM SAC 2011.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
1 A survey of Internet Topology Discovery. 2 Outline Motivations Internet topology IP Interface Level Router Level AS Level PoP Level.
An Effective Placement of Detection Systems for Distributed Attack Detection in Large Scale Networks Telecommunication and Security LAB. Dept. of Industrial.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
IP Traceback With Deterministic Packet Marking Andrey Belenky and Nirwan Ansari IEEE communication letters, VOL. 7, NO. 4 April 2003 林怡彣.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Mitigating Bandwidth- Exhaustion Attacks using Congestion Puzzles XiaoFeng Wang Michael K. Reiter.
TAODV: A Trust Model Based Routing Protocol for Secure Ad Hoc Networks Li Xiaoqi, GiGi October 28, 2003.
Privacy-Preserving Cross-Domain Network Reachability Quantification
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
Payload Attribution via Hierarchical Bloom Filters
On the Effectiveness of Route- Based Packet Filtering for Distributed DoS Attack Prevention in Power-Law Internets Kihong Park and Heejo Lee Network Systems.
SUNY at Buffalo; Computer Science; CSE620 – Advanced Networking Concepts; Fall 2005; Instructor: Hung Q. Ngo 1 Agenda Last time: finished brief overview.
Hash-Based IP Traceback Alex C. Snoeren, Craig Partidge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, and W. Timothy Strayer.
Practical Network Support for IP Traceback Internet Systems and Technologies - Monitoring.
On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.
DDoS Attack and Its Defense1 CSE 5473: Network Security Prof. Dong Xuan.
Review of IP traceback Ming-Hour Yang The Department of Information & Computer Engineering Chung Yuan Christian University
Pi : A Path Identification Mechanism to Defend against DDos Attacks.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Tracking and Tracing Cyber-Attacks
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Distributed Denial of Service CRyptography Applications Bistro Presented by Lingxuan Hu April 15, 2004.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Aadil Zia Khan and Shahab Baqai LUMS School of Science and Engineering QoS Aware Path Selection in Content Centric Networks Fahad R. Dogar Carnegie Mellon.
Trust- and Clustering-Based Authentication Service in Mobile Ad Hoc Networks Presented by Edith Ngai 28 October 2003.
A Dynamic Packet Stamping Methodology for DDoS Defense Project Presentation by Maitreya Natu, Kireeti Valicherla, Namratha Hundigopal CISC 859 University.
Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Benchmarks and Metrics Traceback Methods  Packet Marking  Hash-based Conclusion.
Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Sensor Networks Karim Seada, Ahmed Helmy.
Packet-Marking Scheme for DDoS Attack Prevention
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
By Rod Lykins.  Brief DDoS Introduction  Packet Marking Overview  Other DDoS Defense Mechanisms.
1 Utilizing Shared Vehicle Trajectories for Data Forwarding in Vehicular Networks IEEE INFOCOM MINI-CONFERENCE Fulong Xu, Shuo Gu, Jaehoon Jeong, Yu Gu,
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Hash-Based IP Traceback Alex C. Snoeren †, Craig Partridge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, W. Timothy Strayer.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
On Mobile Sink Node for Target Tracking in Wireless Sensor Networks Thanh Hai Trinh and Hee Yong Youn Pervasive Computing and Communications Workshops(PerComW'07)
Hash-Based IP Traceback Alex C. Snoeren +, Craig Partridge, Luis A. Sanchez ++, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent and W. Timothy.
Constructing Inter-Domain Packet Filters to Control IP Spoofing Based on BGP Updates Zhenhai Duan, Xin Yuan Department of Computer Science Florida State.
Network Support For IP Traceback Stefan Savage, David Wetherall, Anna Karlin and Tom Anderson University of Washington- Seattle, WA Slides originally byTeng.
Jessica Kornblum DSL Seminar Nov. 2, 2001 Hash-Based IP Traceback Alex C. Snoeren +, Craig Partridge, Luis A. Sanchez ++, Christine E. Jones, Fabrice Tchakountio,
Secure Single Packet IP Traceback Mechanism to Identify the Source Zeeshan Shafi Khan, Nabila Akram, Khaled Alghathbar, Muhammad She, Rashid Mehmood Center.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Presented by Edith Ngai MPhil Term 3 Presentation
A Study of Group-Tree Matching in Large Scale Group Communications
Defending Against DDoS
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Defending Against DDoS
Network Support For IP Traceback
IP Traceback Problem: How do we determine where malicious packet came from ? It’s a problem because attacker can spoof source IP address If we know where.
DDoS Attack and Its Defense
Hash Functions for Network Applications (II)
Information Theoretical Analysis of Digital Watermarking
A flow aware packet sampling mechanism for high speed links
Presentation transcript:

Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College of Computing Georgia Institute of Technology (Joint work with Jun Li, Minho Sung, Li Li) 2004 IEEE Symposium on Security and Privacy

Introduction Internet DDoS attack is real threat - on websites · Yahoo, CNN, Amazon, eBay, etc (Feb. 2000)  services were unavailable for several hours - on Internet infrastructure · 13 root DNS servers (Oct, 2002)  7 of them were shut down completely First step to counter attack : identification of attackers - IP spoofing enables attackers to hide their identity - IP Traceback : mechanism to trace the attack sources

State of IP Traceback Assumptions “inherited” from the literature - attackers send lots of packets - Traceback scheme uses limited space in IP header - attackers are aware of the effort and can sabotage Two main types of proposed traceback techniques (1) Probabilistic Packet Marking (PPM) scheme a. routers : probabilistically mark each packet with partial path info using some coding algorithms b. victim : reconstruct the attacking paths using some decoding algorithms

State of IP Traceback (Cont.) Two main types of proposed traceback techniques (2) Hash-based scheme a. routers : store packet digests b. victim : uses recursive lookup to reconstruct the attack path Victim attacker packet digest “Have you seen this packet?” “yes”

Scalability Problems of Two Approaches PPM schemes - limited marking field (17-bits) - cannot scale to large number of attackers Hash-based scheme - recording 100% of the packet digests - infeasible for high-speed links Our objective : design a traceback scheme that is scalable both to the number of attackers and to high link speed

Outline of the talk  Overview of our solution  Design detail  Information theoretic framework  Performance Evaluation  Related work, Future work, Conclusion

Design Overview Our idea : store digests of sampled packets only - use small sampling rate p (such as 3.3%) - small storage and computational cost - can scale to OC-192 or OC-768 link speed - Let us go across the DRAM/SRAM speed barrier the challenge of the sampling - one packet traceback is not possible : need to obtain larger number of attack packets - independent random sampling will not work -- need to improve “correlation factor” Victim attacker packet digest correlation

Information-theoretic framework overview Information-theoretic framework to solve an optimization problem (1) given fixed resource constraints (e.g. we can use 0.4 bits per packet in bloom filter in average), what is the best parameter setting for number of hash functions and sampling probability? - relationship between resource constrains and two parameters resource constraints = number of hash functions  sampling probability - two tradeoffs higher number of hash functions gives less false positive rate in bloom filter higher sampling probability gives higher sampling correlation (easier traceback) ex) when s=0.4, which set is best? (8 hash, 5% sampling) vs (12 hash, 3.3% sampling) vs (16 hash, 2.5% sampling)

Information-theoretic framework overview Information-theoretic framework to establish a lower bound (2) what’s the lower bound of the size of the evidence to achieve a certain level of traceback accuracy? - there is a tradeoff between the number of attack packets used for traceback (evidence) vs the accuracy of the traceback ex) we want to find the minimum size of the evidence for identifying more than 90% of the attack sources

Outline of the talk  Overview of our solution  Design Detail  Information theoretic framework  Performance Evaluation  Related work, Future work, Conclusion

One-bit Random Marking and Sampling(ORMS) Basic idea - each router sample the packet with probability p - ORMS make correlation factor be larger than 50% : we sample more than 50% of the packet which are sampled at previous router - use 1 bit marking for coordinating the sampling p Sample all marked packets p/2 Sample unmarked packet with probability p/(2-p) correlation : total sampling probability : Sample and mark Sample and not mark p/ correlation factor (sampled by both) : ( > 50% because 0<p<1 )

One-bit Random Marking and Sampling(ORMS) why not trajectory sampling? - the attacker can use hash values that escape sampling design tampering-resistant scheme - Why save p/2 of marked packets, and p/2 of unmarked packets? Why not simply save all packets that are marked with 1? tamperingjump-startstationary 0  r  1 r : rate of marked packets make r to p/2 using dual-leaky-bucket r = p/2 - “jump-start” in first hop using dual leaky bucket scheme Victim attacker Other host send marked normal traffics send unmarked attack traffics

Traceback Processing Victim attacker packet digest 1. Collect a set of attack packets L v 2. Check router S, a neighbor of the victim, with L v 3. Check each router R ( neighbor of S ) with L s LvLv S “Have you seen any of these packets? “yes” R LsLs “You are convicted! Use these evidences to make your L s ”

Traceback Processing Victim attacker packet digest 4. Pass L v to R to be used to make new L s 5. Repeat these processes S R “You are convicted! Use these evidences to make your L s ” “Have you seen any of these packets? “yes” LvLv LsLs

Outline of the talk  Overview of our solution  Design Detail  Information-theoretic framework  Performance Evaluation  Related work, Future work, Conclusion

Why do we need theoretical foundation? Information-theoretic framework - view the traceback system as a communication channel - tradeoff between sampling rate and the size of packet digest : optimal parameter setting maximizes channel capacity (i.e. mutual information ) - tradeoff between the number of packets and the traceback accuracy : Information theory allows us to derive the lower bound on the number of packets (evidence) to achieve a certain level of traceback accuracy through Fano’s inequality

Concepts - Entropy H(X) : measures the uncertainty of X - Conditional entropy H(X|Y) : measures how much uncertainty remains for X given the observation of Y Fano’s inequality - Given an observation of Y, our estimation of X is We denote p e as - H(p e )  H(X|Y), if X is binary-valued Information Theory Background

Applications of Information Theory R1R1 LvLv LsLs What we can observe : X t1 + X f1, Y t + Y f We want to estimate Z Question : How to maximize our accuracy in estimating Z? Answer : minimize H(Z|X t1 +X f1,Y t +Y f ) X t1 X f1 YtYt YfYf N p : # of pkts in L v X t2 LvLv R2R2 false positive true positiveLegend: Victim Z=1 X t2

Parameter tuning : - k : number of hash functions in a Bloom filter - to maximize our accuracy in estimating Z, we would like to compute k* = argmin H( Z | X t1 +X f1, Y t +Y f ) k subject to the resource constraint ( s = k  p ) Applications of Information Theory s : average number of bits for each packet p : sampling probability

Applications of Information Theory Resource constraint: s = k  p = 0.4

Lower bound on the number of packets to achieve a certain level of traceback accuracy : Fano’s inequality : H(p e )  H( Z | X t1 +X f1, Y t +Y f ) Applications of Information Theory Parameters: s=0.4, k=12, p=3.3% (12  3.3% = 0.4)

Outline of the talk  Overview of our solution  Design Detail  Information-theoretic framework  Performance Evaluation  Related work, Future work, Conclusion

Three Topologies - Skitter data I, Skitter data II, Bell-lab’s data (routes from a host to 192,900, 158,181, 86,813 destinations) Host setting : - Victim : all three topologies are routes from a single origin to many destinations, assume this origin to be the victim - Attackers : randomly distributed among the destination hosts Performance Metrics - False Negative Ratio (FNR): the ratio of the number of missed routers to the number of infected routers - False Positive Ratio (FPR): the ratio of the number of incorrectly convicted routers to the number of convicted routers Simulation set-up

False Negative & False Positive on Skitter I topology Simulation results Parameters: s=0.4, k=12, p=3.3% (12  3.3% = 0.4)

Parameter tuning Verification of Theoretical Analysis Parameters: 1000 attackers, s = k  p = 0.4

Error levels by different k values Verification of Theoretical Analysis Parameters: 2000 attackers, N p =200,000

Lower bound on the number of packets to achieve a certain level of traceback accuracy Verification of Theoretical Analysis Parameters: s = 0.4, k = 12, p = 3.3%

Outline of the talk  Overview of our solution  Design Detail  Information-theoretic framework  Performance Evaluation  Related work, Future work, Conclusion

Related work (not exhaustive) PPM (Probabilistic Packet Marking) traceback schemes - S. Savage et al., Practical network support for IP traceback, SIGCOMM M.T.Goodrich, Efficient packet marking for large-scale IP traceback, ACM CCS 2002 Hash-based traceback scheme - Snoeren et al., Hash-based IP traceback, SIGCOMM 2001 Analysis of the traceback scheme and lower bounds - M. Adler, Tradeoffs in PPM for IP traceback, ACM STOC 2002

Discussion and future work 1. Is correlation factor 1/(2-p) optimal for coordination using one bit? 2. What if we use more that one bit for coordinating sampling? 3. How to optimally combine PPM and hash-based scheme – a Network Information Theory question. 4. How to know with 100% certainty that some packets are attack packets? How about we only know with p-certainty?

Conclusion New approach to IP traceback is presented - using sampling, the scheme can scale to very high link speed - ORMS, a novel sampling technique, is introduced Analysis using Information-theoretic framework - allows us to compute the optimal parameters - can be used to compute the trade-off between the amount of evidence and the traceback accuracy Simulation study - demonstrate the high performance of the scheme even with thousands of attackers and very low (3.3%) of sampling rate