1 Lexical Analysis Uses formalism of Regular Languages Uses formalism of Regular Languages Regular Expressions Regular Expressions Deterministic Finite Automata (DFA) Deterministic Finite Automata (DFA) Non-deterministic Finite Automata (NDFA) Non-deterministic Finite Automata (NDFA) RE NDFA DFA minimal DFA RE NDFA DFA minimal DFA (F)Lex uses RE as input, builds lexor (F)Lex uses RE as input, builds lexor
2 DFAs: Formal Definition DFA M = (Q, , , q 0, F) Q= states finite set = alphabet finite set = transition function function in Q Q q 0 = initial/starting stateq 0 Q F= final states F Q
3 DFAs: Example strings over {a,b} with next-to-last symbol = a …aa…ab a …ba…bb a b b a b b b a a a a a b b b
4 Nondeterministic Finite Automata “Nondeterminism” implies having a choice. Multiple possible transitions from a state on a symbol. (q,a) is a set of states : Q Pow(Q) Can be empty, so no need for error/nonsense state. Acceptance: exist path to a final state? I.e., try all choices. Also allow transitions on no input: : Q ( { }) Pow(Q)
5 NFAs: Formal Definition NFA M = (Q, , , q 0, F) Q= statesa finite set = alphabeta finite set = transition functiona total function in Q ( { }) Pow(Q) q 0 = initial state q 0 Q F= final statesF Q
6 NFAs: Example strings over {a,b} with next-to-last symbol = a Loop until we “guess” which is the next-to-last a. a …a …a …
7 NFAs: Example strings over {0,1,2} having (either 0-or-more 0’s or 0-or-more 1’s) followed by 0-or-more 2’s 0 2s2s 1 2 0s0s 1s1s
8 Regular Expressions Regular expression (over ) awhere a r+r’ r r’ r* where r,r’ regular (over ) Notational shorthand: r 0 = , r i = rr i-1 r + = rr *
9 RE NFA Defined inductively on structure of RE. This construction produces NFA with single final state. This construction produces NFA with single final state. 6 cases: , , a, r’+r’’, r’r’’, r’ * 6 cases: , , a, r’+r’’, r’r’’, r’ *
10 RE NFA: Accepts nothing since no edge to final state. qfqf q0q0
11 RE NFA: q0q0
12 RE NFA: a qfqf q0q0 a
13 RE NFA: r’+r’’ q’ 0 q’ f q’’ 0 q’’ f edges guess whether to use r’ or r’’. qfqf q0q0
14 RE NFA: r’r’’ q’ 0 q’ f q’’ 0 q’’ f Could conflate q 0 with q’ 0, q’’ f with q f. q0q0 qfqf
15 RE NFA: r’ * q’ 0 q’ f Can loop r’ as many times as desired or skip it. q0q0 qfqf
16 RE NFA: Example (0+01) * 0 01
17 RE NFA: Notes Most constructions produce very large NFAs. Not optimal for size. Not optimal for size. But easy to construct. But easy to construct.
18 NFA -> DFA Subset Construction Complicated but well described in the text Complicated but well described in the text Section (pp ), Algorithm 3.20 (2nd edition) Section (pp ), Algorithm 3.20 (2nd edition) In section 3.6 (pp ) in 1st edition In section 3.6 (pp ) in 1st edition
19 Minimizing DFA Partition states of DFA, D, into two sets, final states, and non-final states. Partition states of DFA, D, into two sets, final states, and non-final states. Continue until no more partitions are needed Continue until no more partitions are needed For each partition, P, split the DFA states of P so that, for each subpartition, all DFA states in that partition have the same transition for each input symbol, x. For each partition, P, split the DFA states of P so that, for each subpartition, all DFA states in that partition have the same transition for each input symbol, x.