Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Alphabets: An Alphabet is a finite set of symbols. We will usually use  to denote the alphabet of input symbols or “terminal characters.” String: A.

Similar presentations


Presentation on theme: "1 Alphabets: An Alphabet is a finite set of symbols. We will usually use  to denote the alphabet of input symbols or “terminal characters.” String: A."— Presentation transcript:

1

2 1 Alphabets: An Alphabet is a finite set of symbols. We will usually use  to denote the alphabet of input symbols or “terminal characters.” String: A String (or “sentence” or “word”) is a finite sequence of symbols. The set  + : The set of all strings over  of length 1 or more. The length of x is |x|. The empty string  ; |  | = 0    * Concatenation xy or x·y. x = x  =  x x 2 = xx  * : the set of all strings over  of length 0 or more.   *     {  } x is a Prefix of y if there exists a z such that y = xz x is a Proper Prefix of y if x is a prefix of y and z  y.

3 2 A Language is some subset of  *. Terminals are members of . Another set of symbols (alphabet) are the non-terminals (or variables or syntactic categories) which represent strings of terminals. Vocabulary symbols are terminals or non-terminals. Concatenation of languages L · M = {xy| x  L, y  M} L i = concatenation of L i times, L 0 = {  }   L  M = {x| x  L or x  M} The closure of L is L * =  i=0  L i = L 0  L 1  L 2 ... The positive closure of L is L +.

4 3 A Production rule is written as    or  ::= . A phrase structure grammar G is a quadruple (N, , P, S), where N: finite set of non-terminals.  : finite set of alphabet (terminals). P: a set of products. S: the start symbol. Example: G 1 = ({A, S}, {0,1}, P, S) where P is S  0A1 0A  00A1 A   If  is a string in (N  ) * and    is a production in G, then we say  directly derives   and write    .  : derives in one or more steps.  : derives in zero or more steps. + *

5 4 If S  then  is called a Sentential Form of G. If S  x then x is called a Sentence of G. The language generated by G, written L(G), is {x| x   * and S  x} Now, G 1 = ({A, S}, {0,1}, P, S) where P is S  0A1 0A  00A1 A  , therefore L(G 1 ) = {0 n 1 n | n > 0} CONVENTIONS: Terminals: a, b, c, d, 0, 1, +, (, ), begin Non-terminals: A, B, C, D, S, Vocabulary symbols: U, V, W, X, Y, Z Strings of terminals: u, v, w, x, y, z Strings of vocabulary symbols: , ,  * * *

6 5 Type 0: Unrestricted Grammars any    Type 1: Context Sensitive Grammars(CSG) for all   , |  |  |  | Type 2: Context Free Grammars(CFG) for all   ,   N (i.e., A   ) Type 3: Right (or Left)-Linear Grammars if all productions are of the form A  x or A  xB G 2 = ({S, B, C}, {a, b, c}, P, S) P: S  aSBC S  abC CB  BC bB  bb bC  bc cC  cc Which Type ? What the language is? G 3 : S  S + S S  S * S S  (S) S  a Which Type ? What language ?

7 6 An Ambiguous Grammar is one for which some sentence has two or more different parse trees. // Show that the last one at previous page is ambiguous grammar// // Try to prove the following CFG grammar is ambiguous: S  AB | CD A  0A |  B  1B2 |  C  2C |  D  0D1 |  // // Try to prove the following CFG grammar is ambiguous(?!): S  if X then S | M M  if X then M else S X  X + T | T T  T * F | F F  (X) | a //

8 7 L = {0 n 1 n | n  1} is a Context Free Language ? Yes, since S  0S1 | 01 generates L. A RECOGNIZER is a machine (system) with a finite description that can accept a terminal string for some grammar and determine whether the string is in the language accepted by the grammar. A PARSER can, in addition, find a derivation for the string. PARSING Alternatives: Suppose we want to parse id * id + id in G 0 : E  E + T | T T  T * F | F F  (E) | id, then E E + T T P T * P id P id id This parse tree might be created with left-most derivation or right-most derivation as follows:

9 8 E E + T T + T T * P + T P * P + T id * P + T id * id + T id * id + P id * id + id lm Try it yourself !

10 9 Pumping Lemma for Regular Sets Let L be a regular set, then there exists a constant p>0 depending on L such that for every w  L where |w|  p, w=xyz where 0 xy k z  L for all k  0. Pf: Let M be a finite automaton that accepts L. Let p be the number of states in M. Select w  L such that |w|  p, then w can be written as a 1 a 2 a 3... a n-1 a n s 0 s 1 s 2 s n-1 s n Since n+1>p, not all of the states can be unique. Let s i = s j for some i<j  i+p, now let x = a 1 a 2... a i, y = a i+1... a j, z = a j+1... a n. Now we can delete y from w or insert y any number of times and we will still go from the start to final state. So xy k z  L for every k  0. QED. The feeling of pumping x y z

11 10 Qz1: Prove L={ 0 n 1 n | n  1} is NOT regular Pf: Assume L is regular. Pick a “large enough” string in L, say w = 0 p 1 p. Now, show no substring y of w can be pumped. One of the following must be true. (1) y = 0 i for some i  1 but xy 2 z = 0 p+i 1 p  L (2) y = 1 i for some i  1 but xy 2 z = 0 p 1 p+i  L (3) y = 0 i 1 j for some i, j  1 but xy 2 z = 0 p-i 0 i 1 j 0 i 1 j 1 p-j  L Therefore, L is NOT regular. QED.

12 11 Qz2: Prove L={ 0 p | p is a prime number} is NOT regular Pf: Assume L is regular.[thus any string of L is pumpable] Let w = xyz where x=a p, y=a q, z=a r, p,r  0, q> 0, then 0 p+nq+r  L for each n  0, that is p+nq+r is prime for each n  0. But this is impossible, since let n = p+2q+r+2, then p+nq+r = (q+1)(p+2q+r) which is a product of two natural numbers each greater than 1. So, if n=p+2q+r+2, then p+nq+r is NOT prime. Therefore, it is controversy to the assumption of w  L for each n  0. QED.

13 12 Nondeterministic Finite Automata q0q0 q1q1 q2q2 q3q3 q4q4 0,1 start 0  1 1 It is Greek to you ? ,  : delta ,  : sigma  : tau ,  : gamma ,  : phi Z,  : zeta

14 13 Def: DFA: M=(K, , , S 0, F) where K =set of states,  = set of alphabet, S 0  K, the start state. F  K, set of finite states, and  : K *  K, the transition function Theorem: Let L be a set accepted by a nondeterministic finite state automaton. Then there exists a deterministic finite state automaton that accepts L.

15 14 Prove the grammar G with productions S  0S1 | 01 accepts exactly L={0 n 1 n | n  1} PROOF: First show L(G)  L (i.e., the grammar generates only string in L.) Inductive hypothesis: If w  L(G) derived in k steps, then w  L. Basis: k=1, the only one-step derivation is S  01 and 01  L. Inductive step: assume inductive hypothesis is true for k = k 0  1; show true for k = k 0 +1>1. Since k >1 the first step must be S  0S1  0x1 = w. But S  x is of no more then k 0 steps, so by hypothesis x  L, say x = 0 i 1 i, i  1. Then w = 0x1 = 0 i+1 1 i+1  L. Now show L  L(G) (i.e., the grammar generates all strings of L.) Inductive hypothesis: If w  L and |w| = 2k, w  L(G). Basis: k=1, the only string in L of length 2 is 01. But S  01 so 01  L(G). Inductive step: assume inductive hypothesis is true for k=k 0  1; show true for k = k 0 +1>1. Since the length of w is 2k, w = 0 k 1 k. By inductive hypothesis 0 k-1 1 k-1  L(G) and thus S  0 k-1 1 k-1. So S  0S1  0 0 k-1 1 k-1 1 = w is a valid derivation for w. Thus w  L(G). L  L(G), so L = L(G). K-1 **

16 15 A Push-Down Automaton (PDA) is a septuple P=(Q, , , , q 0, Z, F), where Q is finite set of states,  is a finite input alphabet,  is a finite stack alphabet,  maps elements of Q * (  ⋃ {  }) *  into finite subsets of Q *  * q 0  Q is start state, Z   is start stack symbol, F  Q is set of final states. Example: Let P=({q 0, q 1, q 2 }, {0,1}, {Z, 0}, , q 0, Z, {q 0 }) where  (q 0, 0, Z) = {(q 1, 0Z)}  (q 1, 0, 0) = {(q 1, 00)}  (q 1, 1, 0) = {(q 2,  )}  (q 2, 1, 0) = {(q 2,  )}  (q 2, , Z) = {(q 0,  )} L(P)={0 n 1 n | n  1} ? Why ?

17 16 A Configuration of P is a triple (q, w,  )  Q *  * *  *. A Move (q, aw, Z  ) (q i, w,  i  ) occurs if (q i,  i )   (q, a, Z). An Initial Configuration is (q 0, w, Z). A string w is Accepted by P if (q 0, w, Z) (q, ,  ) for q  F,    *. The Language Accepted by P, L(P) is the set of all strings P accepts. * 接續上一頁之話題 : (q 0, 0011, Z)  (q 1, 011, 0Z)  (q 1, 11, 00Z)  (q 2, 1, 0Z)  (q 2, , Z)  (q 0, ,  ) 用  暫代 Now, try to build a PDA that accepts L={ww R | w  (0, 1) + }.

18 17  (q 0, 0, Z) = {(q 0, 0Z) }  (q 0, 1, Z) = {(q 0, 1Z) }  (q 0, 0, 1) = {(q 0, 01) }  (q 0, 1, 0) = {(q 0, 10) }  (q 0, 0, 0) = {(q 0, 00), (q 1,  ) }  (q 0, 1, 1) = {(q 0, 11), (q 1,  ) }  (q 1, 0, 0) = {(q 1,  ) }  (q 1, 1, 1) = {(q 1,  ) }  (q 1, , Z) = {(q 1,  ) } Two items are included, thus it is a Nondeterministic PDA.

19 18 A Deterministic PDA is one in which (1).  q  Q, Z , whenever  (q, , Z)  , then  (q, a, Z)=   a  . (2).  q  Q, a  (   {  }), Z ,  (q, a, Z) contains at most one element. Converting a CFG to a PDA :  For each production A  , make (q,  )   (q, , A).  For each a  , make (q,  )   (q, a, a). Show whether some specific language L is a CFL ? 1.If L is NOT a CFL, then we may prove it by pumping lemma of CFL. 2.If L is a CFL, then we may prove it by (a) giving a deterministic/nondeterministic pushdown automaton for L( but sometime this DPDA doesn’t exist, since DPDA accepts only a subset of all CFL’s) or, (b) giving a context-free grammar for L.

20 19 Theorem: For any CFL L, there exists a constant p depending on L such that  z  L, where |z|  p, z may be written as z = uvwxy such that 1. |vx|  1 (i.e., both are not  ) 2. |vwx|  p 3. uv i wx i y  L  i  0. { 證明相似於 RL.} Prove L ={ a i b i c i | i  0} is NOT a CFL. Proof: If it were, by pumping lemma of CFL,  p>0  z  L where |z|  p, let z = a p b p c p = uvwxy such that (i). |vx|  1 (ii). |vwx|  p (iii). uv i wx i y  L  i  0.

21 20 But (1) suppose vwx = a j, j  p, then uwy = a p-l b p c p  L, since |vx|  0,  l  0. It is a contradiction to (iii) uwy  L when let i=0. The same argument holds for vwx = b j or vwx = c j. (2) suppose vwx = a j b k, j,k  p, then uwy = a p-l’ b p-l’’ c p  L, since |vx|  0,  either l’  0 or l’’  0 or both. It is a contradiction to (iii) uwy  L when let i=0. The same argument holds for vwx = b j c k. (3) suppose vwx = a j b p c k, but |vwx|  p, so vwx cannot contain both a’s and c’s. Thus, there are no pumpable substrings. It concludes that L cannot be context free.

22 21 Begin by extending to FIRST k and FOLLOW k : FIRST k (  ) = { w | ( |w| < k and   w) or ( |w| = k and   wx for some x) } * * * The domain of FIRST k is extended to sets of strings in the natural way. FOLLOW k (A) = { w | S   A  and w  FIRST k (  ) } * G is LL(k) for some fixed k iff whenever there are two leftmost derivations S  wA   w    w x and S  wA   w    w y and  , then FIRST k (x)  FIRST k (y).

23 22 S  Abc | aAcb A   | b | c For left-sentential form S: FIRST 1 (Abc) = { b, c } FIRST 1 (aAcb) = { a } For left-sentential form Abc: FIRST 1 (  bc) = { b } FIRST 1 (bbc) = { b } FIRST 1 (cbc) = { c } FIRST 2 (  bc) = { bc } FIRST 2 (bbc) = { bb } FIRST 2 (cbc) = {cb } In left-sentential form Acb: FIRST 2 (  cb) = { cb } FIRST 2 (bcb) = { bc } FIRST 2 (ccb) = {cc } No multiply defined entries We know LL(2) grammar.

24 23 FIRST 1 FOLLOW 1 FIRST 2 FOLLOW 2 S a, b, c $ ab,ac,bb,bc,cb $$ A , b, c b, c , b, c bc, cb Some grammars are not LL(k) for any k. For instance, S  A | B A  aAb | 0 B  aBbb | 1 L(G) = {a n 0b n | n  0}  {a n 1b 2n | n  0} is not LL(k). Assume it were, S  A  a n 0b n, S  B  a n 1b 2n for any n. Let k = 2m, m  I +, then FIRST k ( a 2m 0b 2m ) = FIRST k ( a 2m 1b 4m ), But A  B. Since k is arbitrary, the G is not LL(k) for any k.


Download ppt "1 Alphabets: An Alphabet is a finite set of symbols. We will usually use  to denote the alphabet of input symbols or “terminal characters.” String: A."

Similar presentations


Ads by Google