String Matching of Regular Expression
Introduction Regular Expression (RE) A generalized string description with Basic string Kleene star (*) Concatenation Union (|) Nondeterministic Finite Automata (NFA) More then one next transition RE to NFA require m state Deterministic Finite Automata (DFA) Only one next transition RE to DFA may 2m state Using (m+1)(2m+1|Σ|) bits
RE to NFA Construction Thompson’s construction Glushkov’s construction Produce up to 2m states Not null-free NFA Using (m)(2m+1+|Σ|) bits Glushkov’s construction Produce exactly m+1 states null-free NFA Using (m+1)(2m+1+|Σ|) bits
Thompson’s Construction
Thompson’s Construction Example
Glushkov Construction RE = ((AT|GA((AG|AAA)∗)) Marked RE = (A1T2|G3A4((A5G6|A7A8A9)∗)) Used in Glushkov construction First(RE) The set of positions at which the reading can start. Ex: First (A1T2|G3A4((A5G6|A7A8A9)∗))= {1 ,3 }. Last(RE) The set of positions at which a string read can be recognized. Ex: Last (A1T2|G3A4((A5G6|A7A8A9)∗))={2 ,4 ,6 ,9 }. Follow(RE,x) All the positions in RE accessible from x Ex: Follow ((A1T2|G3A4((A5G6|A7A8A9)∗)),6)= {7,5}. EmptyRE is {ε} if ε belongs to L(RE) and ∅ otherwise.
Glushkov Construction Initial set of m+1 states Marked final states, use Last (RE) Create transition link by Follow (RE,x) RE = (A1T2|G3A4((A5G6|A7A8A9)∗))
Bit Parallel Automata Ex: Shift-And Automata Update Function State Mask Occurrence Table
Thompson BPA |Σ| Notation D : State mask E: null-closure of D B: Precomute Table S: string length Tj: current char null-closure, reachable state from D with null input B Table: bit mask of the state reachable by each letter |Σ| Alphabet m+1 Pattern
Glushkov BPA |Σ| & D Notation D : State mask T[D}: Follow of D B: Build by Glushkov Tj: current char T Table: Which states can be reached from an active state B Table: bit mask of the state reachable by each letter Active states D=2m+1 |Σ| Alphabet m+1 m+1 Pattern States & D
Glushkov Search Algorithm Build B Table
Glushkov Search Algorithm Build T Table Initial to zero Active states D=2m+1 m+1 States
Glushkov Search Algorithm Compute First, Last, Follow and Empty
Performance Comparison Forward Algorithm DFA Glushkov with BuildT Thompson ’s Construction Glushkov with BuildTree Test Pattern Preprocessing time Searching time
Reference G. Navarro and M. Raffinot. Compact DFA representation for fast regular expression search . In Proceedings of the 5th Workshop on Algorithm Engineering , number 2141 in Lecture Notes in Computer Science, pages 1-12, 2001.