Presentation is loading. Please wait.

Presentation is loading. Please wait.

C Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.

Similar presentations


Presentation on theme: "C Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University."— Presentation transcript:

1 c Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN

2 c Chuen-Liang Chen, NTUCS&IE / 52 Parsing function:checking syntactically validity of the input string producing structure of the corresponding parse tree callee:scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer  top-down parsing –beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) –left-most derivation –pre-order traversal of parse tree –e.g.LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing  bottom-up parsing –beginning from terminal string, determining the production used to generate leaves –right-most derivation in reverse order –post-order traversal of parse tree –e.g.LR(k) [read from Left; Right-most derivation; k lookaheads]

3 c Chuen-Liang Chen, NTUCS&IE / 53 Definitions about context-free grammar (1/2) context-free grammar -- G = (V t, V n, S, P)  V t --set of terminal symbols  V n --set of nonterminal symbols –a, b, c,...  V t –A, B, C,...  V n –U, V, W,...  V = V t  V n –u, v, w,...  V t * – , , ,...  V*  S --start symbol, goal symbol; S  V n  P --set of production rules of the form : A   derivation by production rule A   one step derivation :  A   left-most derivation : u A   lm u    right-most derivation :  A v  rm   v  one or more steps derivation :     lm   rm  zero or more steps derivation :  *  * lm  * rm

4 c Chuen-Liang Chen, NTUCS&IE / 54 Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = {  | S  *  }  left-most sentential form -- the  so that S  * lm   right-most sentential form -- the  so that S  * rm  context-free language -- L(G) = SF(G)  V t * parse tree, derivation tree --  graphic representation of derivations  root -- start symbol  leaf nodes -- grammar symbols or  interior nodes -- nonterminals  offspring of a nonterminal -- a production for a given sentential form --  phrase -- a sequence of symbols derived from a single nonterminal  simple phrase, prime phrase -- minimal phrase  handle -- left-most simple phrase

5 c Chuen-Liang Chen, NTUCS&IE / 55 Example of context-free grammar grammar G 0 -- E  Prefix ( E ) | V Tail Prefix  F | Tail  + E | left-most derivation -- right-most derivation -- E  lm Prefix ( E ) E  rm Prefix ( E )  lm F ( E )  rm Prefix ( V Tail )  lm F ( V Tail )  rm Prefix ( V + E )  lm F ( V + E )  rm Prefix ( V + V Tail )  lm F ( V + V Tail )  rm Prefix ( V + V )  lm F ( V + V )  rm F ( V + V ) right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail ) 4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V ) 8. and so on L(G 0 )  { F ( V + V ) }

6 c Chuen-Liang Chen, NTUCS&IE / 56 parse trees of left-most derivations  blue symbols : left-most sentential forms Example of left-most derivation Tail E Prefix(E) FVTail +E V E Prefix(E) E (E) FVTail E Prefix(E) FVTail +E E Prefix(E) FVTail +E V EE Prefix(E) F

7 c Chuen-Liang Chen, NTUCS&IE / 57 Parsing function:checking syntactically validity of the input string producing structure of the corresponding parse tree callee:scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer  top-down parsing –beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) –left-most derivation –pre-order traversal of parse tree –e.g.LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing  bottom-up parsing –beginning from terminal string, determining the production used to generate leaves –right-most derivation in reverse order –post-order traversal of parse tree –e.g.LR(k) [read from Left; Right-most derivation; k lookaheads]

8 c Chuen-Liang Chen, NTUCS&IE / 58 trace of top-down parsing (left-most derivation)  orange : just derived (predicted)blue : just read (matched) black : derived or readgreen : un-processed (parse stack) Example of top-down parsing Tail E Prefix(E) FVTail +E V E Prefix(E) E (E) FVTail E Prefix(E) FVTail +E E Prefix(E) FVTail +E V EE Prefix(E) F

9 c Chuen-Liang Chen, NTUCS&IE / 59 Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = {  | S  *  }  left-most sentential form -- the  so that S  * lm   right-most sentential form -- the  so that S  * rm  context-free language -- L(G) = SF(G)  V t * parse tree, derivation tree --  graphic representation of derivations  root -- start symbol  leaf nodes -- grammar symbols or  interior nodes -- nonterminals  offspring of a nonterminal -- a production for a given sentential form --  phrase -- a sequence of symbols derived from a single nonterminal  simple phrase, prime phrase -- minimal phrase  handle -- left-most simple phrase

10 c Chuen-Liang Chen, NTUCS&IE / 60 Example of right-most derivation (1/2) parse trees of right-most derivations and corresponding sentential form, phrases, simple phrases, handle  blue symbols : sentential form  : phrase  : simple phrase  : handle E Prefix(E) E (E) VTail E Prefix(E) VTail +E E Prefix ( V + E )Prefix ( V Tail )EPrefix ( E )

11 c Chuen-Liang Chen, NTUCS&IE / 61 Example of right-most derivation (2/2) E Prefix(E) FVTail +E V E Prefix(E) VTail +E V E Prefix(E) VTail +E V Prefix ( V + V Tail ) Prefix ( V + V )F ( V + V )

12 c Chuen-Liang Chen, NTUCS&IE / 62 Parsing function:checking syntactically validity of the input string producing structure of the corresponding parse tree callee:scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer  top-down parsing –beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) –left-most derivation –pre-order traversal of parse tree –e.g.LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing  bottom-up parsing –beginning from terminal string, determining the production used to generate leaves –right-most derivation in reverse order –post-order traversal of parse tree –e.g.LR(k) [read from Left; Right-most derivation; k lookaheads]

13 c Chuen-Liang Chen, NTUCS&IE / 63 trace of bottom-up parsing (inverse order of right-most derivation)  blue : just read (shifted)orange : just derived (reduced to) pink : not readgreen : derived or read (parse stack) Example of bottom-up parsing ()FV+V Prefix() F V+E VTail Prefix() F V+V Prefix() F V+VTail Prefix() F VTail +E V Prefix(E) FVTail +E V E Prefix(E) FVTail +E V

14 c Chuen-Liang Chen, NTUCS&IE / 64 Examples - 排骨麵特餐 example 1  排骨麵特餐  冰紅茶 排骨麵 柳丁切片 排骨麵  炸排骨 湯麵  lookahead is unnecessary example 2  排骨麵特餐  冰紅茶  排骨麵  service  柳丁切片 排骨麵  炸排骨 湯麵 | 湯麵 炸排骨 service  芋仔冰 | 別想了 ( )  lookahed is required

15 c Chuen-Liang Chen, NTUCS&IE / 65 Ambiguity of grammar a string with two different parse trees (i.e., two different structures) example :  -  id for an unambiguous grammar, parse trees of leftmost derivation and right-most derivation are the same id - - id - -

16 c Chuen-Liang Chen, NTUCS&IE / 66 First set and Follow set (1/2) First(  ) = { a  V t |   * a  }  ( if   * then { } else  )  set of all terminals that can begin a sentential form derived from   First k (  ) -- set of k-symbol terminal strings that can begin a sentential form derived from  QUIZ: for what?  QUIZ: for what? Follow(A) = { a  V t | S  +  A a  }  ( if S  +  A then { } else  )  set of all terminals that may follow A in some sentential form  Follow k (A) -- set of k-symbol terminal strings that may follow A in some sentential form QUIZ: for what?  QUIZ: for what?

17 c Chuen-Liang Chen, NTUCS&IE / 67 First set and Follow set (2/2) example 1 -- E  Prefix ( E ) E  V Tail Prefix  F | Tail  + E | example 2 -- S  a S e | B B  b B e | C C  c C e | d example 3 -- S  A B c A  a | B  b |

18 c Chuen-Liang Chen, NTUCS&IE / 68 Algorithms for First & Follow sets (1/6) typedef int symbol; /* a symbol in the grammar */ /* The symbolic constants used * below, NUM_TERMINALS, * NUM_NONTERMINALS, and * NUM_PRODUCTIONS are * determined by the grammar. * MAX_RHS_LENGTH should * simply be "big enough." */ #define VOCABULARY (NUM_NONTERMINALS + NUM_TERMINALS) typedef struct gram { symbol terminals[NUM_TERMINALS]; symbol nonterminals[NUM_NONTERMINALS]; symbol start_symbol; int num_productions; struct prod { symbol lhs; int rhs_length; symbol rhs[MAX_RHS_LENGTH]; } productions[NUM_PRODUCTIONS]; symbol vocabulary[VOCABULARY]; } grammar; typedef struct prod production; typedef symbol terminal; typedef symbol nonterminal;

19 c Chuen-Liang Chen, NTUCS&IE / 69 Algorithms for First & Follow sets (2/6) typedef short boolean; typedef boolean marked_vocabulary[VOCABULARY]; /* * Mark those vocabulary symbols found to derive (directly or indirectly). */ marked_vocabulary mark_lambda(const grammar g) { static marked_vocabulary derives_lambda; boolean changes;/* any changes during last iteration? */ boolean rhs_derives_lambda;/* does the RHS derive ? */ symbol v;/* a word in the vocabulary */ production p;/* a production in the grammar */ int i, j; /* loop variables */ for (v = 0; v < VOCABULARY; v++) derives_lambda[v] = FALSE; /* initially, nothing is marked */

20 c Chuen-Liang Chen, NTUCS&IE / 70 Algorithms for First & Follow sets (3/6) do { changes = FALSE; for (i = 0; i < g.num_productions; i++) { p = g.productions[i]; if (! derives_lambda[p.lhs]) { if (p.rhs_length == 0) { /* derives directly */ changes = derives_lambda[p.lhs] = TRUE; continue; } /* does each part of RHS derive ? */ rhs_derives_lambda = derives_lambda[p.rhs[0]]; for (j = 1; j < p.rhs_length, j++) rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]]; if (rhs_derives_lambda) changes = derives_lambda[p.lhs] = TRUE; } } while (changes); return derives_lambda; }

21 c Chuen-Liang Chen, NTUCS&IE / 71 Algorithms for First & Follow sets (4/6) typedef set_of_terminal_or_lambda termset; termset follow_set[NUM_NONTERMINAL]; termset first_set[SYMBOL]; marked_vocabulary derives_lambda = mark_lambda(g); /* mark_lambda(g) as defined above */ termset compute_first(string_of_symbols alpha) { inti, k; termset result; k = length(alpha); if (k == 0) result = SET_OF( ); else { result = first_set[alpha[0]] - SET_OF( ) ; for (i = 1; i < k &&  first_set[alpha[i-1] ]; i++) result = result  ( first_set[alpha[i]] - SET_OF( ) ); if (i == k &&  first_set[alpha[k - 1]]) result = result  SET_OF( ); } return result; }

22 c Chuen-Liang Chen, NTUCS&IE / 72 Algorithms for First & Follow sets (5/6) extern grammar g; void fill_first_set(void) { nonterminalA; terminala; productionp; booleanchanges; inti, j; for (i = 0; i < NUM_NONTERMINAL; i++) { A = g.nonterminals[i]; if (derives_lambda[A]) first_set[A] = SET_OF( ); else first_set[A] =  ; } for (i = 0; i < NUM_TERMINAL; i++) { a = g.terminals[i]; first_set[a] = SET_OF( a ); for (j = 0; j < NUM_NONTERMINAL; j++) { A = g.nonterminals[j]; if (there exists a production A  a  ) first_set[A] = first_set[A]  SET_OF( a ); } do { changes = FALSE; for (i = 0; i < g.num_productions; i++) { p = g.productions[i]; first_set[p.lhs] = first_set[p.lhs]  compute_first(p.rhs); if ( first_set changed ) changes = TRUE; } } while (changes); } QUIZ: termination? QUIZ: correctness?

23 c Chuen-Liang Chen, NTUCS&IE / 73 Algorithms for First & Follow sets (6/6) void fill_follow_set(void) { nonterminal A, B; int i; boolean changes; for (i = 0; i < NUM_NONTERMINAL; i++) { A = g.nonterminals[i]; follow_set[A] =  ; } follow_set[g.start_symbol] = SET_OF(  ); do { changes = FALSE; for (each production A  B  ) { /* * I.e. for each production and each * occurrence of a nonterminal in its * right-hand side. */ follow_set[B] = follow_set[B]  (compute_first(  ) - SET_OF( )); if (  compute_first(  ) ) follow_set[B] = follow_set[B]  follow_set[A]; if ( follow_set[B] changed ) changes = TRUE; } } while (changes); } QUIZ: termination? QUIZ: correctness?

24 c Chuen-Liang Chen, NTUCS&IE / 74 Tracing examples example 1 -- E  Prefix  ( E ‚ )  E  V Tail   Prefix  F  |  Tail  + E „  | ‘ example 2 -- S  a S  e  | B ‚  ’ B  b B  e  | C „  C  c C  e  | d‘ example 3 -- S  A  B ‚ c  A  a  |  B  b  |   ‚  „„‚ ‚   ’’    ‘ ‚   „„  ‘

25 c Chuen-Liang Chen, NTUCS&IE / 75 From extended BNF to CFG  { }   QUIZ: how, systematically?

26 c Chuen-Liang Chen, NTUCS&IE / 76 Other types of grammars regular grammar --A  a B or C  QUIZ: how?  QUIZ: how? context-free grammar --A   context-sensitive grammar --  A      type 0 grammar --    regular grammar : too simple, e.g., { [ i ] i | i  1 }  QUIZ: how to specify { [ i ] i | i  1 } by context-free grammar? context-sensitive, type 0 : without sufficient parser context-free grammar : a balance between generality and practicality


Download ppt "C Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University."

Similar presentations


Ads by Google