2019/5/14 New Shift table Algorithm For Multiple Variable Length String Pattern Matching Author: Punit Kanuga Presenter: Yi-Hsien Wu Conference: 2015.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Space-for-Time Tradeoffs
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
A Fast String Matching Algorithm The Boyer Moore Algorithm.
1 FPGA-based ROM-free network intrusion detection using shift-OR circuit Department of Computer Science and Information Engineering National Cheng Kung.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
A Regular Expression Matching Algorithm Using Transition Merging Department of Computer Science and Information Engineering National Cheng Kung University,
MCS 101: Algorithms Instructor Neelima Gupta
Pattern-Based DFA for Memory- Efficient and Scalable Multiple Regular Expression Matching Author: Junchen Jiang, Yang Xu, Tian Pan, Yi Tang, Bin Liu Publisher:IEEE.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Regular Expression Matching for Reconfigurable Packet Inspection Authors: Jo˜ao Bispo, Ioannis Sourdis, Jo˜ao M.P. Cardoso and Stamatis Vassiliadis Publisher:
MCS 101: Algorithms Instructor Neelima Gupta
Fundamental Data Structures and Algorithms
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Binary-tree-based high speed packet classification system on FPGA Author: Jingjiao Li*, Yong Chen*, Cholman HO**, Zhenlin Lu* Publisher: 2013 ICOIN Presenter:
Lightweight Traffic-Aware Packet Classification for Continuous Operation Author: Shariful Hasan Shaikot, Min Sik Kim Presenter: Yen-Chun Tseng Date: 2014/11/26.
Packet Classification Using Dynamically Generated Decision Trees
An Improved Multi-Pattern Matching Algorithm for Large-Scale Pattern Sets Author : Zhan Peng, Yu-Ping Wang and Jin-Feng Xue Conference: IEEE 10th International.
ICS220 – Data Structures and Algorithms Analysis Lecture 14 Dr. Ken Cosh.
LOP_RE: Range Encoding for Low Power Packet Classification Author: Xin He, Jorgen Peddersen and Sri Parameswaran Conference : IEEE 34th Conference on Local.
String Searching 2 of 2. String search Simple search –Slide the window by 1 t = t +1; KMP –Slide the window faster t = t + s – M[s] –Never recheck the.
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
JA-trie: Entropy-Based Packet Classification Author: Gianni Antichi, Christian Callegari, Andrew W. Moore, Stefano Giordano, Enrico Anastasi Conference.
CSG523/ Desain dan Analisis Algoritma
2018/4/23 Dynamic Load-balanced Path Optimization in SDN-based Data Center Networks Author: Yuan-Liang Lan , Kuochen Wang and Yi-Huai Hsu Presenter: Yi-Hsien.
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
COMP261 Lecture 20 String Searching 2 of 2.
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
13 Text Processing Hongfei Yan June 1, 2016.
String Processing.
Knuth-Morris-Pratt algorithm
Regular Expression Matching in Reconfigurable Hardware
Statistical Optimal Hash-based Longest Prefix Match
SigMatch Fast and Scalable Multi-Pattern Matching
Chapter 7 Space and Time Tradeoffs
Parallel Processing Priority Trie-based IP Lookup Approach
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
2018/12/29 A Novel Approach for Prefix Minimization using Ternary trie (PMTT) for Packet Classification Author: Sanchita Saha Ray, Abhishek Chatterjee,
Binary Prefix Search Author: Yeim-Kuan Chang
2019/1/1 High Performance Intrusion Detection Using HTTP-Based Payload Aggregation 2017 IEEE 42nd Conference on Local Computer Networks (LCN) Author: Felix.
2019/1/3 Exscind: Fast Pattern Matching for Intrusion Detection Using Exclusion and Inclusion Filters Next Generation Web Services Practices (NWeSP) 2011.
Memory-Efficient Regular Expression Search Using State Merging
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
Space-for-time tradeoffs
A Small and Fast IP Forwarding Table Using Hashing
Space-for-time tradeoffs
A New String Matching Algorithm Based on Logical Indexing
Knuth-Morris-Pratt Algorithm.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Space-for-time tradeoffs
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Compact DFA Structure for Multiple Regular Expressions Matching
2019/5/3 A De-compositional Approach to Regular Expression Matching for Network Security Applications Author: Eric Norige Alex Liu Presenter: Yi-Hsien.
2019/5/5 A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata Author: Hsiang-Jen Tsai, Chien-Chih Chen, Yin-Chi Peng,
Pipelined Architecture for Multi-String Matching
Space-for-time tradeoffs
MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore
A Hybrid IP Lookup Architecture with Fast Updates
2019/7/26 OpenFlow-Enabled User Traffic Profiling in Campus Software Defined Networks Presenter: Wei-Li,Wang Date: 2016/1/4 Author: Taimur Bakhshi and.
An Improved Wu-Manber Multiple Patterns Matching Algorithm
2019/9/3 Adaptive Hashing Based Multiple Variable Length Pattern Search Algorithm for Large Data Sets 比對 Simple Pattern 的方法是基於 Hash 並且可以比對不同長度的 Pattern。
Towards TCAM-based Scalable Virtual Routers
Packet Classification Using Binary Content Addressable Memory
Presentation transcript:

2019/5/14 New Shift table Algorithm For Multiple Variable Length String Pattern Matching Author: Punit Kanuga Presenter: Yi-Hsien Wu Conference: 2015 International Conference on Circuit, Power and Computing Technologies [ICCPCT] Date: 2017/9/5 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. CSIE CIAL Lab 1

Outline Introduction Related Work Concept And Algorithm Result And Discussions Conclusion National Cheng Kung University CSIE Computer & Internet Architecture Lab

2019/5/14 Introduction Multiple string pattern matching can be summarised as a problem where objective is to simultaneously search presence of a set of strings (called patterns) in larger string set (called text). A shift table is a table which reflects maximum number of characters which can be skipped in text when mismatch happens while searching pattern in it ensuring that no possible match is missed. Overall, shift table reduces search time for searching patterns in text. When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

2019/5/14 Related Work Knuth, Morris, Pratt (KMP) algorithm : search patterns within a larger string in time proportional to sum of lengths of all patterns . Example : Text : BBC ABCDAB ABCDABCDABDE Pattern : ABCDABD When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

2019/5/14 Related Work Case 1 : If text and searched character are not matched , shift right one character . Case 2 : If matched , continue to compare the character of pattern , until not match . When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Related Work Case 2 : Start matched : Not matched : Pattern A B C D 2019/5/14 Related Work Case 2 : Start matched : Not matched : Shift = Number of have matched - Partial match Value (Last matched character). (6 – 2 = 4) When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) Pattern A B C D Partial match value 1 2 National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Related Work Partial matched value : (Pattern : ABCDABD) 2019/5/14 Related Work Partial matched value : (Pattern : ABCDABD) “A” : Prefix - NULL ; Suffix - NULL ; Length of common elements : 0 “AB” : Prefix - A ; Suffix - B ; Length of common elements : 0 “ABC” : Prefix – A,AB ; Suffix – BC,C ; Length of common elements : 0 “ABCD” : Prefix – A,AB,ABC ; Suffix – BCD,CD,D ; Length of common elements : 0 “ABCDA” : Prefix – A,AB,ABC,ABCD ; Suffix – BCDA,CDA,DA,A ; Length of common elements : 1 (A) . “ABCDAB” : Prefix - A, AB, ABC, ABCD, ABCDA ; Suffix - BCDAB, CDAB, DAB, AB, B ; Length of common elements : 2 (AB) . “ABCDABD” : Prefix - A, AB, ABC, ABCD, ABCDA, ABCDAB ; Suffix - BCDABD, CDABD, DABD, ABD, BD, D ; Length of common elements : 0 When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Related Work Boyer Moore : 2019/5/14 Related Work Boyer Moore : Text : HERE IS A SIMPLE EXAMPLE ; Pattern : EXAMPLE Compare the character from right to left , use 2 rules (bad character rule 、 good suffix rule) to choose the max shift. Bad character rule : If the character in text is not match , find the character if it exist in the pattern , if true shift to align , else shift the full length. Good suffix rule : shift the good suffix to align the pattern from head. When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Related Work Bad character rule : 1. Find if “s” exists in 2019/5/14 Related Work Bad character rule : 1. Find if “s” exists in pattern (EXAMPLE) , shift 7 . 2. “P” exists in pattern , shift 2 to Align. Size of shift = address of character – address of the rightest character exist in pattern . (If is not existed , minus -1) Shift : 6 – (-1) = 7 Shift : 6 – 4 = 2 When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

2019/5/14 Related Work Good suffix rule : “E” , “LE” , “PLE” , “MPLE” are all good suffix . Find if these suffix exist in the head of the pattern . So “E” exist , shift to align . Size of Shift = address of good suffix – character exist in pattern (If is not existed , minus -1). Shift : 6 – 0 = 6 When the controller is responsible for all the tasks it becomes a bottleneck and the solution does not scale well. (reason 2) National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Concept And Algorithm Define Pattern : string which has to be searched . Mismatch : Character in pattern where matching fails . Join : Part of pattern matched before mismatch . Leftover : Part of pattern which remained to match after mismatch. Thus, overall pattern will be sum of “ leftover, mismatch and join “. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Concept And Algorithm For multiple patterns, consider window of len-1 length, where len is the length of smallest pattern across multiple patterns. Consider set of patterns P {aaba, aabab, aababc, aababcd,aababcde, abcb, zmnd, qope,jmqfm}. Minimum length of pattern here is 4. Thus, size of window would be 3. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Concept And Algorithm Consider pattern “aababc” : Window 1 : aab If mismatch happens at last character b, then we can shift this window by 1 towards right with respect to text to get nearest different character. Thus we consider shift table as {(b,1)}, ie, shift value of character b is 1. If mismatch happen at middle character a, join = b and leftover is a. Here we are sure that last character that was matched in text was b. There is no match of any part of join in leftover. So now shift table would be {(a,3), (b,1)}. If mismatch happens at first character a, join = ab and leftover is NULL. There is no match of any part of join in leftover, neither ab or b alone. Now shift table remains as {(a,3), (b,1)}. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Concept And Algorithm Window 2: aba If mismatch happens at last character a, shift value for a will be 1 which is smaller than existing value for a in table. Thus, shift table will update as {(a,1), (b,1)}. If mismatch happen at middle character b, join = a and leftover is a. Last matched character in text was a. Thus, we can shift pattern by 2 position to get align a (of join) with a (of text). But existing shift value for a is 1 which is less than 2. Thus, shift table remains as {(a,1), (b,1)}. If mismatch happens at first character a, join = ba and leftover = NULL. There is no match of any part of join in leftover. So shift table remains as {(a,1), (b,1)}. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Concept And Algorithm Window 3 : bab , Window 4 : abc Same as window 2 and window 1 . So shift table after input pattern “aababc” is {(a,1), (b,1),(c,1)}. And continue to input another pattern “jmqfm” . After calculate the shift table is {(a,1), (b,1),(c,1),(q,1), (m,1),(j,3),(f,1)}. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Concept And Algorithm Shift table after input all of pattern in pattern set : ‘*’ represents all other characters of alphabet set except those mentioned. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Algorithm

Result And Discussions 2019/5/14 Result And Discussions For a single pattern with : P: Length of single pattern , W: Size of window Mathematically runtime of this algorithm on single pattern will be O(P-W+1). Effectively, runtime complexity will be O(P) considering pattern to be much larger than window. For a set of variable length patterns with : X: Number of pattern , N: Sum of lengths of all patterns M: Average pattern length , L: Smallest pattern length, thus W=L-1 Mathematically runtime of generating shift table by presented algorithm is O(N/M*(M-W+1)). Considering, N is much larger in comparison to L, effectively runtime complexity for presented algorithm can be given as O(N). National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

2019/5/14 Conclusion A shift table algorithm which is able to solve problem of variable length multiple string pattern matching is presented. This is inspired by Boyer Moore concept for single pattern matching. It further successfully enhances that concept to gain promising results. With use of table to search occurrence of join in leftover this algorithm takes O(|P|) time where |P| defines sum of all pattern lengths. As shown in results section, it produces shift table which works well with existing multiple string pattern matching algorithms. This shift table algorithm ensures that we are able to pre-process pattern set for successful search in case of both static as well as dynamic text. National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab