1 Regular Expressions Definitions Equivalence to Finite Automata
2 RE’s: Introduction uRegular expressions describe languages by an algebra. uThey describe exactly the regular languages. uIf E is a regular expression, then L(E) is the language it defines. uWe’ll describe RE’s and their languages recursively.
3 Operations on Languages uRE’s use three operations: union, concatenation, and Kleene star. uThe union of languages is the usual thing, since languages are sets. uExample: {01,111,10} {00, 01} = {01,111,10,00}.
4 Concatenation uThe concatenation of languages L and M is denoted LM. uIt contains every string wx such that w is in L and x is in M. uExample: {01,111,10}{00, 01} = {0100, 0101, 11100, 11101, 1000, 1001}.
5 Kleene Star uIf L is a language, then L*, the Kleene star or just “star,” is the set of strings formed by concatenating zero or more strings from L, in any order. L* = { ε } L LL LLL … Example: {0,10}* = { ε, 0, 10, 00, 010, 100, 1010,…} uL + = LL* = L LL LLL …
6 RE’s: Definition uBasis 1: If a is any symbol, then a is a RE, and L(a) = {a}. wNote: {a} is the language containing one string, and that string is of length 1. Basis 2: ε is a RE, and L( ε ) = { ε }. Basis 3: ∅ is a RE, and L( ∅ ) = ∅.
7 RE’s: Definition – (2) uInduction 1: If E 1 and E 2 are regular expressions, then E 1 +E 2 is a regular expression, and L(E 1 +E 2 ) = L(E 1 ) L(E 2 ). uInduction 2: If E 1 and E 2 are regular expressions, then E 1 E 2 is a regular expression, and L(E 1 E 2 ) = L(E 1 )L(E 2 ). uInduction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))*.
8 Precedence of Operators uParentheses may be used wherever needed to influence the grouping of operators. uOrder of precedence is * (highest), then concatenation, then + (lowest).
9 Examples: RE’s uL(01) = {01}. uL(01+0) = {01, 0}. uL(0(1+0)) = {01, 00}. wNote order of precedence of operators. L(0*) = { ε, 0, 00, 000,… }. L((0+10)*( ε +1)) = all strings of 0’s and 1’s without two consecutive 1’s.
Exercise uWhich ones are true? w∅L = ∅ or L or ε w{ε}L = ∅ or L or ε uL((0+1)*0 + 1 (0+1)*) a.All nonempty strings of {0,1} with odd length b.All nonempty strings of {0,1} with 0 as the last char OR 1 as the first char 10
11 Equivalence of RE’s and Finite Automata uWe need to show that for every RE, there is a finite automaton that accepts the same language. Pick the most powerful automaton type: the ε -NFA. uAnd we need to show that for every finite automaton, there is a RE defining its language. wPick the most restrictive type: the DFA.
12 Converting a RE to an ε -NFA uProof is an induction on the number of operators (+, concatenation, *) in the RE. uWe always construct an automaton of a special form (next slide).
13 Form of ε -NFA’s Constructed No arcs from outside, no arcs leaving Start state: Only state with external predecessors “Final” state: Only state with external successors
14 RE to ε -NFA: Basis uSymbol a: ε : ∅ : a ε
15 RE to ε -NFA: Induction 1 – Union For E 1 For E 2 For E 1 E 2 ε εε ε
16 RE to ε -NFA: Induction 2 – Concatenation For E 1 For E 2 For E 1 E 2 ε
17 RE to ε -NFA: Induction 3 – Closure For E For E* ε ε εε
Exercise uWhat is the correct RE for ε-NFA in the previous slide if we remove the ε-edge from the left most state to right most state a.E* b.EE* c.ε d.Eε 18
19 DFA-to-RE uA strange sort of induction. uStates of the DFA are named 1,2,…,n. uInduction is on k, the maximum state number we are allowed to traverse along a path.
20 k-Paths uA k-path is a path through the graph of the DFA that goes though no state numbered higher than k. uEndpoints are not restricted; they can be any state. un-paths are unrestricted. uRE is the union of RE’s for the n-paths from the start state to each final state.
21 Example: k-Paths paths from 2 to 3: RE for labels = 0. 1-paths from 2 to 3: RE for labels = paths from 2 to 3: RE for labels = (10)*0+1(01)*1 3-paths from 2 to 3: RE for labels = ??
22 DFA-to-RE uBasis: k = 0; only arcs or a node by itself. uInduction: construct RE’s for paths allowed to pass through state k from paths allowed only up to k-1. R ij k = R ij k-1 + R ik k-1 (R kk k-1 )* R kj k-1 Doesn’t go through k Goes from i to k the first time Zero or more times from k to k Then, from k to j
23 Illustration of Induction States < k k i j Paths not going through k From k to j From k to k Several times Path to k
24 Summary Each of the three types of automata (DFA, NFA, ε -NFA) we discussed, and regular expressions as well, define exactly the same set of languages: the regular languages.
25 Algebraic Laws for RE’s uUnion and concatenation behave sort of like addition and multiplication. w+ is commutative and associative; concatenation is associative. wConcatenation distributes over +. wException: Concatenation is not commutative.
26 Identities and Annihilators ∅ is the identity for +. R + ∅ = R. ε is the identity for concatenation. ε R = R ε = R. ∅ is the annihilator for concatenation. ∅ R = R ∅ = ∅.