CSE7701: Research Seminar on Networking

Slides:



Advertisements
Similar presentations
Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood.
Advertisements

Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
Data plane algorithms in routers
Spring 2006CS 685 Network Algorithmics1 Longest Prefix Matching Trie-based Techniques CS 685 Network Algorithmics Spring 2006.
1 IP-Lookup and Packet Classification Advanced Algorithms & Data Structures Lecture Theme 08 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.
An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
Tries Standard Tries Compressed Tries Suffix Tries.
Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
Parallel-Search Trie-based Scheme for Fast IP Lookup
Efficient Multi-Match Packet Classification with TCAM Fang Yu
Study of IP address lookup Schemes
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
ECE 526 – Network Processing Systems Design Network Security: string matching algorithm Chapter 17: George Varghese.
1 HEXA: Compact Data Structures or Faster Packet Processing Author: Sailesh Kumar, Jonathan Turner, Patrick Crowley, Michael Mitzenmacher. Publisher: ICNP.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
Address Lookup in IP Routers. 2 Routing Table Lookup Routing Decision Forwarding Decision Forwarding Decision Routing Table Routing Table Routing Table.
Presentation by : Samad Najjar Enhancing the performance of intrusion detection system using pre-process mechanisms Supervisor: Dr. L. Mohammad Khanli.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
CSE7701: Research Seminar on Networking
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
IP Address Lookup Masoud Sabaei Assistant professor
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
IP Address Lookup Masoud Sabaei Assistant professor
IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.
Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Author: Hoang Le, Weirong Jiang and Viktor K. Prasanna Publisher: IEEE.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
Author: Heeyeol Yu; Mahapatra, R.; Publisher: IEEE INFOCOM 2008
Tries 07/28/16 11:04 Text Compression
CPS216: Data-intensive Computing Systems
IP Routers – internal view
Mark Redekopp David Kempe
Multiway range trees: scalable IP lookup with fast updates
Toward Advocacy-Free Evaluation of Packet Classification Algorithms
Ambika Shrestha Chitrakar Prof. Slobodan Petrovic
HEXA: Compact Data Structures for Faster Packet Processing
James Logan CS526 Dr. Chow April 29, 2009
Data plane algorithms in routers
Data Plane Algorithms in Network Processing Systems
Packet Classification Using Coarse-Grained Tuple Spaces
A Small and Fast IP Forwarding Table Using Hashing
Using decision trees to improve signature-based intrusion detection
Pipelined Architecture for Multi-String Matching
Presentation transcript:

CSE7701: Research Seminar on Networking http://arl.wustl.edu/~jst/cse/770/ Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection Paper by: Nathan Tuck (UCSD) Timothy Sherwood (UCSB) Brad Calder (UCSD) George Varghese (UCSD) Published in: IEEE INFOCOM 2004 Reviewed by: Haoyu Song Discussion Leader: Chip Kastner

Outline Introduction State of the Art in String Matching IDS Snort String Matching State of the Art in String Matching Boyer-Moore Aho-Corasick SFK Search Wu-Manber Modified Aho-Corasick Algorithm Multibit Trie and Tree Bitmaps Bitmap Compression Path Compression Results Hardware Software Conclusions

Intrusion Detection Systems (IDS) A growing market IDS vs. Internet Firewall Header only Header + Payload IDS types Signature based Anomaly based Signature-based IDS rules Header fields (5 tuples + flags) String(s) pattern, length and location Associated action

Motivation and Challenges Computing intensive string matching More resource and Lower throughput More complicated than packet header classification Increasing line-rates GE, OC48, 10GE, OC192, OC768… Increasing number of rules In order of thousands and keep growing Multi Pattern Matching in Real Time

Snort An Open Source Light Weight Intrusion Detection System Over 1500 rules extracted by network security experts. Software Based System String Length Distribution From 1 byte to 121 bytes # of Rules Growing Factor 2.5 in 3 years

How Does Snort Do It? Two Dimension Link List Rule Tree Nodes (RTN) Header rules Option Tree Nodes (OTN) Signatures String Matching Algorithm Boyer-Moore, Aho-Corasick SFK, Wu-Manber etc. Performance 30%~80% CPU time on string matching only Offline Inspection Selective Online Inspection RTN OTN

Multi Pattern String Matching Searching the text streams for a set of strings. Precise Matching Aho-Corasick Commentz-Walter Wu-Manber Imprecise Matching (with false positive) Parallel Bloom Filter Exclusion-based String Matching Approximate Matching Tolerant some errors: character substituting, deleting or inserting

Boyer-Moore Algorithm The Best Single Pattern Matching Algorithm Bad Character Heuristics 0 1 2 3 4 5 6 7 8 9... Text a b b a x a b a c b a b x b a c Good Suffix Heuristics Text a b a a b a b a c b a c a b a b Both can be preprocessed and lookup tables are built O(mn) time complexity O(n/m) best performance Both Heuristics can be used in multi-pattern matching algorithms Use with caution. May affect the network security!

SFK Search Algorithm Compact Memroy Usage – Binary Trie A Bad Character Table for fast shift When match fails, back track the pointer to the starting match point Worst case m*n memory reference In Snort, may need traverse 20 trie nodes per character. h !h 1 3 e !e s 2 7 4 r i h 10 8 5 s s e 11 9 6

Wu-Manber Algorithm Shift Table using Bad Character Heuristics, but for a block of characters. Using Hash Table when shift fails All strings have same length Good for average case te 3 at at cat ic 2 ar ar bar car ba 1 oo oo foo or or for Shift Table Hash Table Member Set { cat, car, bar, foo, for }

Aho-Corasick Algorithm Pattern Tree State Machine Goto Function Black Arrow Failure Function Blue Arrow Output Function Red Dot O(n) search time High fanout (256), low memory efficiency. h s 1 3 h e i 2 6 4 r s e 8 7 5 s 9 String set{ he, she, his, hers }

Aho-Corasick Data Structure Optimization Precompute the next state for every character form every state in the FSM. struct aho_state{ struct aho_state * next_state[256]; struct rule * rule_list; }; One memory reference per each character Unoptimized data structure needs two memory references per character (via amortized analysis) Unoptimized data structure can be optimized for space efficiency.

IP Lookup vs. String Matching Both can be abstracted as longest prefix matching (LPM) problems Both have tire based solutions IP Lookup Multi Bit Trie Lulea Algorithm – Leaf Pushing Eatherton Algorithm – Tree Bitmaps Multi Pattern String Matching Aho-Corasick SFK Search Idea: Applying IP lookup techniques to string matching Modified Aho-Corasick Algorithm with memory efficiency

Unibit Trie for IP Lookup Worst case lookup time is proportional to the length of IP address a 1 1 1 d b 1 Prefix Next hop * a 00* b 010* c 11* d 111* e 11010* f e c 1 f

Multibit Trie Walk n bits a time Accelerate the lookup time by a factor of n Memory inefficiency a 1 1 1 d b 1 n1 e c n2 n4 1 f n3

Tree Bitmap Prefixes in same node stored in consecutive memory locations from top to bottom, from left to right, indexed by internal bitmap Child nodes of same node stored in consecutive memory locations from left to right, indexed by expending path bitmap a b d c e f 1 n1 n2 n4 n3 Root Node n1 Internal Bitmap: 1 0 0 1 0 0 1 Expanding Path Bitmap 0 0 1 0 0 0 1 1 Next Hop Pointer -> a Child Node Pointer -> n2

Optimizations for Aho-Corasick Algorithm (1) Bitmap Compression Benefit: 1028 Bytes/Node -> 44 Bytes/Node Cost1: unoptimized data structure, 2 memory references per character in worst case Cost2: popcount up to 256 prior bits in bitmap 1 2 8 9 3 6 7 4 5 h e r s i Fail ptr Rule ptr = Null Next ptr 00000001000000000010000000 1 3

Optimizations for Aho-Corasick Algorithm (2) Path Compression Benefit1: decrease the total space (4:1 compression ratio) Benefit2: decrease the number of memory references Cost1: complex data structure, failure pointer may point to the middle of other path compressed node. Cost2: software implementation penalty by too many unpredictable, data dependent branches. 1 2 8 9 3 6 7 4 5 h e r s i fpt1 fpt2 fpt3 Next ptr=null r s rpt1 null rpt3 he hers

Data Structure Size for Snort Rule Set 20 times saving over Wu-Manber 50 times saving over Aho-Corasick Similar as SFKSearch # of rules increase 2.5x, while data structure size goes up by only 30%.

Intrusion Detection in Hardware Accessible memory width of 128 bytes Has to be on-chip Worst Case 20 nodes/character in SFK Search 80 rules/character for Wu-Manber 1 or 2 nodes/character in Aho-Corasick Performance 2 times of Naïve Aho-Corasick 8 times of SFK Search 3.25 times of Wu-Manber

Intrusion Detection in Software 1GHz 2.5GHz 1.3GHz Average Case Real packet trace Worst Case Synthetic packet trace

Conclusions A good review of the multi pattern string matching algorithms Borrowing the tree-bitmap idea to effectively compress the data structure and improve the memory efficiency of Aho-Corasick algorithm Deterministic time complexity is good for the security of the IDS itself. Evaluate both hardware and software implementation. The promising solution lies in hardware.

Question & Discussion