Subject Name: FORMAL LANGUAGES AND AUTOMATA THEORY Subject Code: 10CS56 Prepared By:Mrs.Pramela Devi,Mrs.Annapoorani,Mrs.Madhusmitha Department:CSE Date:30.08.2014 2/24/2019
1.AN APPLICATION OF FINITE AUTOMATA .Text Search i)NFA for Text Search ii)DFA to recognize a set of Keywords .Compiler Construction i)Lexical Analysis ii)Syntax Analysis iii)Code optimization iv)Code Generation 2/24/2019
2.FINITE AUTOMATA WITH EPSILON TRANSITION i)The Empty string does not expand the class of language ,that can be accepted by finite automata but it does give us some added programming convenience. ii)We can allow explicit -transitions in finite automata i.e., a transition from one state to another state without consuming any additional input symbol Makes it easier sometimes to construct NFAs iii) Definition: -NFAs are those NFAs with at least one explicit -transition defined. 2/24/2019
2.1THE FORMAL NOTATION FOR AN EPSILON NFA Ε-NFA can be defined as A=(Q, , δ, q0,F) Transition function δ is now a function that takes as arguments: A state in Q and A member of {ε}; that is, an input symbol or the symbol ε. We require that ε not be a symbol of the alphabet to avoid any confusion. 2/24/2019
2.2 EPSILON CLOSURE q Start r s ε 1 Eclose(q) can be defined as follows Basis: State q is in ECLOSE(q) Induction: If state p is in ECLOSE(q) and there is a transition from state p to state r labeled ε then r is in ECLOSE(q).(ie) if δ is a transition function of the ε NFA involved and p is in ECLOSE(q),then ECLOSE(q) also contains all the states ,in δ(p, ε). q Start r s ε 1 2/24/2019
2.4 EXTENDED TRANSISTIONS AND LANGUAES FOR Ε NFA δ * (q,w) is the ETF which represents the set of the states that can be reached along a path whose labels,when concatenated from the string w. The recursive definition of δ * is: Basis: δ *(q, ε )=ECLOSE(q) Induction:Let w=xa,a is the last symbol of w, then δ *(q,w) can be computed as follows: i) δ *(q,x)={(p1,p2,……pk) k ii)U δ(pi,a)={r1,r2,…….rm) i=1 iii) δ *(q,w)=ECLOSE(rj) 2/24/2019
2.5 Eliminating ε-NFA Let E = {QE,∑,δE,q0,FE} be an -NFA Goal: To build DFA D={QD,∑,δD,{qD},FD} s.t. L(D)=L(E) Construction: QD= all reachable subsets of QE factoring in -closures qD = ECLOSE(q0) FD=subsets S in QD s.t. S∩FE≠Φ δD: for each subset S of QE and for each input symbol a∑: Let R= U δE(p,a) // go to destination states δD(S,a) = U ECLOSE(r) // from there, take a union of all their -closures 2/24/2019
3.REGULAR EXPRESSION A regular expression, or RE, describes strings of characters (words or phrases or any arbitrary text). It's a pattern that matches certain strings and doesn't match others. A regular expression is a set of characters that specify a pattern or Language defining symbols. Regular expressions are used to generate patterns of strings. A regular expression is an algebraic formula whose value is a pattern consisting of a set of strings, called the language of the expression. Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2) is also a regular expression. L(R1|R2) = L(R1) U L(R2). Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a regular expression. L(R1R2) = L(R1) concatenated with L(R2). Kleene closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a regular expression. L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ... Closure has the highest precedence, followed by concatenation, followed by union The set of strings over {0,1} that end in 3 consecutive 1's. (0 | 1)* 111 OR (0 + 1)* 111 The set of strings over {0,1} that have at least one 1. 0* 1 (0 + 1) * The set of strings over {0,1} that have at most one 1. 0* | 0* 1 0* Consider = { a } L is a language that each word is of odd length a (aa)* 2/24/2019
3.3 FINITE AUTOMATA TO REGULAR EXPRESSION We construct RE’s for the labels of restricted sets of paths. Basis: single arcs or no arc at all. Induction: paths that are allowed to traverse next state in order. A k-path is a path through the graph of the DFA that goes though no state numbered higher than k. Endpoints are not restricted; they can be any state. Let Rijk be the regular expression for the set of labels of k-paths from state i to state j. Basis: k=0. Rij0 = sum of labels of arc from i to j. ∅ if no such arc. But add ε if i=j. k-Path Inductive Case A k-path from i to j either: Never goes through state k, or Goes through k one or more times. Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1. Final Step The RE with the same language as the DFA is the sum (union) of Rijn, where: n is the number of states; i.e., paths are unconstrained. i is the start state. j is one of the final states. 2/24/2019
3.5 DFA TO RE: STATE ELIMINATION Eliminates states of the automaton and replaces the edges with regular expressions that includes the behavior of the eliminated states. Consider the figure below, which shows a generic state s about to be eliminated. q R p q 1 k p m . R 11 +Q S*P km S *P k1 1m 1 R 1 11 1m Q P 1 S 1 . . . s . . . Q P K m R km q p k m 2/24/2019 R k1
DFA to RE : State Elimination Continued… Starting with intermediate states and then moving to accepting states, apply the state elimination process to produce an equivalent automaton with regular expression labels on the edges. The result will be a one or two state automaton with a start state and accepting state. If the two states are different, we will have an automaton that looks like the following: We can describe this automaton as: (R+SU*T)*SU* If the start state is also an accepting state, then we must also perform a state elimination from the original automaton that gets rid of every state but the start state. This leaves the following: We can describe this automaton as simply R*. Start S R T U Start R 2/24/2019
DFA to RE : State Elimination Continued… If there are n accepting states, we must repeat the above steps for each accepting states to get n different regular expressions, R1, R2, … Rn. For each repeat we turn any other accepting state to non-accepting. The desired regular expression for the automaton is then the union of each of the n regular expressions: R1 R2… RN Convert the following to a RE First convert the edges to RE’s: 3 Start 1 2 0,1 3 Start 1 2 0+1 2/24/2019
DFA to RE : State Elimination Continued… 3 Start 1 2 0+1 Example Continued…. Eliminate State 1: Now the DFA have two states starting state 3 and final state 2 The equivalent Regular Expression is (0+10)*11(0+1)* 3 Start 2 11 0+10 0+1 2/24/2019
DFA to RE : State Elimination Continued… Example 2: Automata that accepts even number of 1’s: Eliminate state 2 1 Start 2 3 1 Start 3 0+10*1 10*1 2/24/2019
DFA to RE : State Elimination Continued… Two accepting states, turn off state 3 first This is just 0*; can ignore going to state 3 since we would “die” Turn off state 1 This is just 0*10*1(0+10*1)* Combine from previous slide to get 0* + 0*10*1(0+10*1)* 1 Start 3 0+10*1 10*1 1 Start 3 0+10*1 10*1 2/24/2019
Converting a RE to an Automata We have shown we can convert an automata to a RE. To show equivalence we must also go the other direction, convert a RE to an automaton. We can do this easiest by converting a RE to an ε-NFA Inductive construction Start with a simple basis, use that to build more complex parts of the NFA Basis: R=a R= ε R=Ø a ε 2/24/2019
Converting a RE to an Automata Continued… R=S+T R=ST R=S* S T ε S T ε S ε 2/24/2019
Converting a RE to an Automata Continued… Convert R= (ab+a)* to an NFA We proceed in stages, starting from simple elements and working our way up a can converted as a b can converted as b b ab can be converted as (ab+a)* can be converted as a b ε a b ε 2/24/2019
4. APPLICATIONS OF REGULAR EXPRESSION Regular expressions in Unix Unix regular expression s allows us to write a character classes. The rules for character classes are : The symbol dot (.)stands for any character. The sequence [a1 a2 a3 …..an] stands for a1 +a2 +a3 + ........+an [x-y] stands for all the characters from x to y in the ASCII sequence . For example the ser of letters and digits can be expressed as [A-Za-z0-9] and decimal number can be expressed as [-+.0-9]. Special notations for character classes are [:digit:] is the set of ten digits, same as [0-9] [:alpha:] stands for alphabetic character , same as [A-Za-z] [:alnum:] stands for digits and letters, same as [A-Za-z0-9] Operators used in Regular expression are ? – zero or one of + one or more of * zero or more of {n} means n copies of , example r{5} is shorthand for RRRRR Lexical Analysis The another application of regular expression is in a compiler design in lexical analysis phase. The lexical analysis scans the source program and recognizes all tokens. Keywords and identifier are common examples of tokens. 2/24/2019
Contd…. Example for lex input to recognize the token. else { return {ELSE};} [A-Za-z][A-Za-z0-9]* {code to enter the found identifier in the symbol table; return(ID); } >= { return (GE); } = {return (EQ); } Finding Pattern in Text Regular expression is used to search efficiently for a set of words in a large repository such as web. Example the regular expression used to search the street address is ‘[0-9]+[A-Z]? [A-Z] [a-z]* ( [A-Z] [a-z]* ) * (Street | St\. | Avenue | Ave\ . |Road | Rd \ . )’ 2/24/2019