Presentation is loading. Please wait.

Presentation is loading. Please wait.

String Matching of Regular Expression

Similar presentations


Presentation on theme: "String Matching of Regular Expression"— Presentation transcript:

1 String Matching of Regular Expression

2 Introduction Regular Expression (RE)
A generalized string description with Basic string Kleene star (*) Concatenation Union (|) Nondeterministic Finite Automata (NFA) More then one next transition RE to NFA require m state Deterministic Finite Automata (DFA) Only one next transition RE to DFA may 2m state Using (m+1)(2m+1|Σ|) bits

3 RE to NFA Construction Thompson’s construction Glushkov’s construction
Produce up to 2m states Not null-free NFA Using (m)(2m+1+|Σ|) bits Glushkov’s construction Produce exactly m+1 states null-free NFA Using (m+1)(2m+1+|Σ|) bits

4 Thompson’s Construction

5 Thompson’s Construction
Example

6 Glushkov Construction
RE = ((AT|GA((AG|AAA)∗)) Marked RE = (A1T2|G3A4((A5G6|A7A8A9)∗)) Used in Glushkov construction First(RE) The set of positions at which the reading can start. Ex: First (A1T2|G3A4((A5G6|A7A8A9)∗))= {1 ,3 }. Last(RE) The set of positions at which a string read can be recognized. Ex: Last (A1T2|G3A4((A5G6|A7A8A9)∗))={2 ,4 ,6 ,9 }. Follow(RE,x) All the positions in RE accessible from x Ex: Follow ((A1T2|G3A4((A5G6|A7A8A9)∗)),6)= {7,5}. EmptyRE is {ε} if ε belongs to L(RE) and ∅ otherwise.

7 Glushkov Construction
Initial set of m+1 states Marked final states, use Last (RE) Create transition link by Follow (RE,x) RE = (A1T2|G3A4((A5G6|A7A8A9)∗))

8 Bit Parallel Automata Ex: Shift-And Automata Update Function
State Mask Occurrence Table

9 Thompson BPA |Σ| Notation D : State mask E: null-closure of D
B: Precomute Table S: string length Tj: current char null-closure, reachable state from D with null input B Table: bit mask of the state reachable by each letter |Σ| Alphabet m+1 Pattern

10 Glushkov BPA |Σ| & D Notation D : State mask T[D}: Follow of D
B: Build by Glushkov Tj: current char T Table: Which states can be reached from an active state B Table: bit mask of the state reachable by each letter Active states D=2m+1 |Σ| Alphabet m+1 m+1 Pattern States & D

11 Glushkov Search Algorithm
Build B Table

12 Glushkov Search Algorithm
Build T Table Initial to zero Active states D=2m+1 m+1 States

13 Glushkov Search Algorithm
Compute First, Last, Follow and Empty

14 Performance Comparison
Forward Algorithm DFA Glushkov with BuildT Thompson ’s Construction Glushkov with BuildTree Test Pattern Preprocessing time Searching time

15 Reference G. Navarro and M. Raffinot. Compact DFA representation for fast regular expression search . In Proceedings of the 5th Workshop on Algorithm Engineering , number 2141 in Lecture Notes in Computer Science, pages 1-12, 2001.


Download ppt "String Matching of Regular Expression"

Similar presentations


Ads by Google