Download presentation
Presentation is loading. Please wait.
1
Functional Design and Programming Lecture 10: Regular expressions and finite state machines
2
Literature These notes Randal C. Nelson’s notes on finite automata and regular expressions: http://www.cs.rochester.edu/u/nelson/courses /csc_173/fa/
3
Exercises Consider: alphabetic identifiers in Standard ML; file names that end with “.txt”; words in a text that contain subsequence “s”, “e”, “c”, “r”, “e”, “t”; comments in XML. For each of the above: Write a regular expression that generates the strings Give a finite state automaton that recognizes the strings. See home page for more exercises.
4
Overview Regular expressions Finite state automata Applications of finite automata and regular expressions Regular expressions and context-free grammars Construction of finite automata Implementation of deterministic finite automata
5
Regular expressions Expressions that describe (possibly) infinite sets of strings. Examples:.*\.sml: strings ending with “.sml”.*glei.*: strings containting substring “glei”.
6
Regular expressions: Definition Reg. exp.Set denoted {} a{a} RQ {st: s L(R), t L(Q)} R|Q L(R) L(Q) R* {s 1 s 2...s n : s i R}
7
Derived regular expressions Reg. expDefinition R+RR* R? R| [a-z]a | b |... | z [^a-z]disjunction of all symbols but a, b,..., z.disjunction of all symbols
8
Finite State Machines Finite state automaton: Description of abstract machine with a finite number of different states and transitions on input symbols between states. Finite state transducer: Like finite state automaton, but additionally with output symbols on transitions.
9
Finite State Automata A finite state automaton is a 5-tuple consisting of: a finite set of characters or symbols (alphabet), a finite set Q of states, a start state q 0 Q, a subset F Q of accepting (or final) states, a set of transitions (q, a, q’) with q, q’ Q, a S, written qq’ a
10
Finite state automata An FSM accepts a string s = a 1 a 2...a n if there are transitions ending in a final state: q0q’ a1a2an final state
11
Finite state automata... An FSM recognizes the language (set of strings) L which it accepts. An FSM is deterministic if no state more than 1 transition on any given symbol. Theorem: The same classes of languages over are definable (recognizable) by finite state machines, deterministic finite state machines and regular expressions.
12
Applications of regular expressions and FSM’s Text searching and processing State-based protocols Dialog/interaction control Hardware verification Protocol verification Programming language processing Natural language processing
13
Regular expressions and context-free grammars Regular expressions can be understood as restricted CFG’s: RE = CFG incl. * (Kleene closure) with no (mutual) recursive definitions of nonterminals. Regular definitions: A sequence of definitions r i = R i for variables r i such that R i is regular expression with possible occurrences of r 1,...,r i-1.
14
Regular expressions and finite state automata Regular expressions are often convenient methods of specifying a desired language. Deterministic finite state machines are a good model for efficient implementation of recognizing the language.
15
Construction of finite automata Regular expression Nondeterministic finite state automaton (NFA) Deterministic finite state automaton (DFA) subset construction(trivial) Thomson’s construction Path algebra construction
16
Implementation of deterministic FSM’s (1) Table-based implementation: Represent states by indexes 0,...,n-1. Represent characters by indexes 0,...,m-1 (e.g., m=256). Represent transitions by two-dimensional vector (vector of vector of indices or 2-dimensional array/vector) T such that T(q,a)= SOME q’ if (q,a,q’) is a transition. [SML: Vector.sub(Vector.sub(T, q), a)]
17
Implementation of deterministic FSM’s (2) Sparse table-based implementation: Represent states by indexes 0,...,n-1. Represent transitions by one-dimensional vector of association lists such that lookup(Vector.sub(T, q), a) = SOME q’ if T(q,a)=q’. Optimization: Use a better data structure than association lists (e.g., hash tables, search trees).
18
Implementation of deterministic FSM’s (3) Functional implementation: datatype S = STATE of (symbol -> S) * bool. Represent states by by functions of type S. Transitions are represented as part of the state (first component). Whether a state is accepting or not is represented by the second component. Execute a transition by applying the second component of the state: fun trans (STATE (t, f)) a = t a Note: Intuitively trans corresponds to a curried version of T: Q * Q. It can be implemented more efficiently than the uncurried version (in principle).
19
Other problems Minimize a DFA. Decide whether two DFA’s are equivalent.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.