A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Author : Xinming Chen,Kailin Ge,Zhen Chen and Jun Li Publisher : ANCS, 2011 Presenter : Tsung-Lin Hsieh Date : 2011/12/14 1.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Detecting Evasion Attacks at High Speeds without Reassembly Detecting Evasion Attacks at High Speeds without Reassembly George Varghese J. Andrew Fingerhut.
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Protomatching Network Traffic for High Throughput Network Intrusion Detection Shai RubinSomesh JhaBarton P. Miller Microsoft Security Analysis Services.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
1 A Virus Scanning Engine Using a Parallel Finite-Input Memory Machine and MPUs Author: Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura, and Yoshifumi.
Learning Classifier Systems to Intrusion Detection Monu Bambroo 12/01/03.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Aho-Corasick String Matching An Efficient String Matching.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Artificial Immune Systems Our body’s immune system is a perfect example of a learning system. It is able to distinguish between good cells and potentially.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
1 ARCHITECTURES FOR BIT-SPLIT STRING SCANNING IN INTRUSION DETECTION Author: Lin Tan, Timothy Sherwood Publisher: IEEE MICRO, 2006 Presenter: Hsin-Mao.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
CSE7701: Research Seminar on Networking
IIT Indore © Neminah Hubballi
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
A Scalable Architecture For High- Throughput Regular-Expression Pattern Matching Authors : Benjamin C. Brodie, Ron K. Cytron, David E. Taylor Presenter.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
CALTECH CS137 Spring DeHon CS137: Electronic Design Automation Day 9: May 6, 2002 FSM Equivalence Checking.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Memory Compression Algorithms for Networking Features Sailesh Kumar.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Detecting Evasion Attack at High Speed without Reassembly.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author : Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher : ANCS’06 Presenter.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Author : Tzu-Fang Sheu,Nen-Fu Huang and Hsiao-Ping Lee Publisher : IEEE Globecom, 2006 Presenter : Tsung-Lin Hsieh Date : 2012/05/16 1.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Range Hash for Regular Expression Pre-Filtering Publisher : ANCS’ 10 Author : Masanori Bando, N. Sertac Artan, Rihua Wei, Xiangyi Guo and H. Jonathan Chao.
Some Great Open Source Intrusion Detection Systems (IDSs)
Code Optimization.
HUFFMAN CODES.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
CSE7701: Research Seminar on Networking
HEXA: Compact Data Structures for Faster Packet Processing
James Logan CS526 Dr. Chow April 29, 2009
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Using decision trees to improve signature-based intrusion detection
High-Performance Pattern Matching for Intrusion Detection
Presentation transcript:

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh Kumar Discussion Leader: Max Podlesny

2 - Sailesh Kumar - 9/19/2015 Overview n Overview of IDS/IPS systems »String matching and regular expression matching n String matching algorithms »Aho-Corasick algorithm n Overview of the proposed architecture n Analysis of the proposed architecture n Conclusion

3 - Sailesh Kumar - 9/19/2015 IDS Overview n Intrusion detection systems, IDS must be capable of distinguishing between normal (not security-critical) and abnormal user activities »However translating the user behaviors into a security-related decision is often not that easy n One approach is anomaly detection »Anomaly detectors construct profiles that represent normal usage »If profile of current data doesn’t match the normal profile => a possible attack »Typically results in high false positives and few negatives too »An intelligent intruder can train the system to behave otherwise NormalUnpredictableAbnormal

4 - Sailesh Kumar - 9/19/2015 IDS Overview n Another approach is some kind of signature detection n Misbehavior signatures fall into two categories: »Attack signatures – action patterns that may pose a security threat. Typically presented as a series of activities interleaved (may be) with neutral ones –E.g. sequence of flags set on some specific sequence of packets »Selected text strings – signatures to match text strings which look for suspicious action (e.g. – calling /etc/passwd). –Effective against attacks that exploits programming flaws. n Effective but not immune to novel attacks »Difficulties in updating information on new types of attacks »Still most widely used

5 - Sailesh Kumar - 9/19/2015 IDS Overview n At the heart of most signature based IDS systems, lies a string (regular expression) matching engine »Since this engine has to scan every byte of data, it might become the throughput bottleneck n This paper presents a high throughput string matching architecture n Advantages of proposed scheme: »Claimed to be space efficient –Entire rule set can be stored on-chip »Claimed to be able to achieve high throughput n Limitations (will cover later) »New systems do regex matching, FSM is typically too large to fit on chip, even with the proposed scheme. »Simple FSM compression schemes may beat the proposed idea

6 - Sailesh Kumar - 9/19/2015 String Matching Algorithm (Aho-Corasick) n For a set P of patterns, it builds a DFA, which accepts all patterns in P.

7 - Sailesh Kumar - 9/19/2015 String Matching Algorithm (Aho-Corasick) n Lets say, P is {he, she, his, hers} 0 1 h e s i h s ers Initial State Accepting State State Transition Function h S h h h h h S S S S S S i h r h Example from Lin Tan et. al.

8 - Sailesh Kumar - 9/19/2015 Properties of DFA created by Aho-Corasick n Consumes one character per state traversal »A modified automaton is also used in many systems, which rescans a character if there is no outgoing edge for it (fail ptr) –Reduces the DFA size tremendously 0 1 h e s i h s ers h S h h h h h S S S S S S i h r h 0 1 h e s i h s ers Example from Lin Tan et. al.

9 - Sailesh Kumar - 9/19/2015 How to implement efficiently on-chip n Problem: Size too big to be on-chip »~ 10,000 nodes for SNORT rule set with ~1000 strings »256 out edges per node »Requires 10,000*256*14 = ~5MB n Solution: partition each state machine into small state machines such that »Each machine handles a subset of strings only »Also, each machine has few outgoing edges only

10 - Sailesh Kumar - 9/19/2015 Bit-Splitting Algorithm Overview n Bit-Split String Matching Algorithm »Reduces out edges from 256 to 2. n Partition the rule set P into smaller sets P 0 to P n n Build AC state-machine for each subset P i n For each P i, rip its DFA apart into 8 tiny state- machines, B i0 through B i7 n Each binary FSM operates on 1 bit of the 8 bit input characters »A match is announced only if all 8 machines finds a match

11 - Sailesh Kumar - 9/19/2015 Bit-splitting String Matching Architecture n Lets say we have 4 strings in P {he, she, his, hers} n Consider FSM 3 »It looks at 3 rd bit of the input char and makes state transitions based upon whether it is 0 or 1 n Example, lets say our input stream is hxhe Example from Lin Tan et. al.

12 - Sailesh Kumar - 9/19/2015 Bit FSM Construction n Use subset construction algorithm to build bit FSMs 0 1 h e s i h s e rs h S h h h h h S S S S S S i h r h b0 {0} P B3B3 0 b1{ }0 1 b2{ },10, { } 0,3 { } 0,1,2,6 b3 1 b3{0,1,2,6} 0 1 b4{0,1,4} b6{0,1,2,5,6} b5{0,3,7,8} b7{0,3,9} Example from Lin Tan et. al.

13 - Sailesh Kumar - 9/19/2015 Bit FSM Construction n Remove the non-accepting states from the small FSMs Example from Lin Tan et. al. 0 1 h e s i h s e rs h S h h h h h S S S S S S i h r h b0 { } P Pruned B 3 0 b1{ } 1 b2{ } 1 b3{ 2 } 0 b4 { } b6{ 2,5 } b5{7} b7{9}

14 - Sailesh Kumar - 9/19/2015 Bit-splitting String Matching Architecture n Build all bit FSMs B i n Match announced, if all bit FSMs announce a match Example from Lin Tan et. al.

15 - Sailesh Kumar - 9/19/2015 An Example Matching Example from Lin Tan et. al. 0 1 h e s i h s e r s h S h h h h h S S S S S S i h r h P0P0 B3B3 b0 {} b1{} b2{} b3{2} b4 {} b6{2,5} b5{7} b7{9} B4B4 1 b8{2,7} b5 {} b0 {} b1{} b2{} b4{2} b3 {} b6{2,5} b9{9} 0 1 b7 {} hxhe Only if all the state machines agree, is there actually a match. 2

16 - Sailesh Kumar - 9/19/2015 How to map it to hardware efficiently n Problem: »Every state in the bit-FSMs B i might have up to S accepting states of the original FSM D. –Note that # accepting of states = # of strings = S »Even if we keep bit-vector to represent the accepting states of the original D, we will need S-bits per state of B i. »S is typically >1000, so not practical n Solution: »Partition the rule set P into smaller subsets P i each with g rules –Build FSM D i for each P i and bit-splice D i into B i0, … B i7 »Now every state in B ij can have at most g accepting states, hence only g-bits are needed

17 - Sailesh Kumar - 9/19/2015 How to map it to hardware efficiently n Also note that any D FSM can be ripped into 2 or 4 bit-FSMs as well »When ripped into 4 different bit-FSM, each one consumes 2- bits of the input characters »When ripped into 2 different bit-FSM, each one consumes 4- bits

18 - Sailesh Kumar - 9/19/2015 Each State Machine Tile n g = 16 n # of states = 256 (8-bit state encoding) n D FSM is ripped into 4 FSMs (each accepts 2-bit I/P) Example from Lin Tan et. al.

19 - Sailesh Kumar - 9/19/2015 Hardware Implementation Example from Lin Tan et. al.

20 - Sailesh Kumar - 9/19/2015 Non Interrupting Update n Issues: »Must copy the state of every flip-flop of the affected module onto the replacement module –E.g. the current state pointer of the affected module must be copied into the replacement module –These issues are not mentioned in the paper

21 - Sailesh Kumar - 9/19/2015 Optimal partitioning n There are two types of partitions. »One is the partition of the set of strings »Another is the partitions of each D FSM into multiple bit FSMs –8 binary partition is clearly easy to understand –However, it is possible to partition into 2 or 4 partitions, and each partition consumes 4 and 2 input bits of the input character respectively. n Terminology »S = total # of strings »g = # of strings per group (# of partition = S/g) »n = # of partitions of state machine (1, 2, 4, 8) »L = Average # of characters per string n How to choose g and n? »which consumes minimum area

22 - Sailesh Kumar - 9/19/2015 Optimal partitioning n Total number of bits needed is: Total # of states in each bit-FSM Total # of modules (string set partition) # of FSM partitions # of bits to encode state # of outgoing edges per state # of partial match bits

23 - Sailesh Kumar - 9/19/2015 Optimal partitioning n Plot from the review of Mike Wilson (1000 signatures, average length 12) »Typical optimal # of FSM partitions = 2, 4

24 - Sailesh Kumar - 9/19/2015 Partitioning String Sets n In the analysis presented in the paper, it is claimed that partitioning string sets always reduce space »Not quite right when there is overlap among the strings –With overlapping strings, fewer states are needed w/o partitioning

25 - Sailesh Kumar - 9/19/2015 Performance Results - Memory From Lin Tan et. al.

26 - Sailesh Kumar - 9/19/2015 Performance Results - Throughput From Lin Tan et. al.

27 - Sailesh Kumar - 9/19/2015 Strength of the Algorithm n Can be very effective for dense DFAs, when there are plenty of outgoing edges from every state n In this case, path compression will not help a lot n However, ripping apart the state machines into bit state machines will reduce the number of outgoing edges to 16 »(2 edges x 8 FSMs) or (4 edges x 4 FSMs) »For dense graphs, upto 16 times reduction in state space

28 - Sailesh Kumar - 9/19/2015 Some Issues n Tuck [30] used bitmap compression and path compression to reduce the amount of memory needed for SNORT rules to 1.1MB n Note that, Tuck didn’t do any string set partitioning »w/o any partitioning, bit-splitting will consume >2 MB From Lin Tan et. al.

29 - Sailesh Kumar - 9/19/2015 Conclusion n Novel Bit-Split String Matching Algorithm »Reduces out edges from 256 to 2 »Can be extremely effective for dense graphs n Performance/area beats the best techniques by a factor of 10 or more. n 0.4MB and 10Gbps for Snort rule set ( >10,000 characters)

30 - Sailesh Kumar - 9/19/2015 Thank you and Questions