Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.

Similar presentations


Presentation on theme: "1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson."— Presentation transcript:

1 1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson

2 2 Theme  Context free grammars provide a nice formalism for describing syntax of programming languages. Moreover, there is a mechanism for automatically constructing a parser (a recognizer of valid strings in the grammar) from context free grammars (typically a few additional restrictions are enforced to make it easier to construct the parser and the parser more efficient). In this lecture we review grammars as a means of describing syntax and show how, either by hand or using automated tools such as bison, to construct a parser from the grammar.

3 3 Outline  Motivating Example  Regular Expressions and Scanning  Context Free Grammars  Derivations and Parse Trees  Ambiguous Grammars  Parsing  Recursive Decent Parsing  Shift Reduce Parsing  Parser Generators  Syntax Directed Translation and Attribute Grammars

4 4 Motivating Example  Write a function, L = ReadList(), that reads an arbitrary order list and constructs a recursive data structure L to represent it  (a1,…,an), ai an integer or recursively a list  Assume the input is a stream of tokens - e.g. ‘(‘, integer, ‘,’, ‘)’ and the variable Token contains the current token  Assume the functions  GetToken() – advance to the next token  Match(token) – if token = Token then GetToken() else error  M = Comp(e,L) – construct list M by inserting element e in the front of L. E.g. Comp(1,(2,3)) = (1,2,3)  M = Reverse(L) – M = the reverse of the list L.

5 5 Solution L = ListRead() { match(‘(‘); L = NULL; while token  ‘)’ do /* read element */ if Token == NUMBER then x = Token.value; match(NUMBER); else if Token == ‘(‘ x = ListRead(); else error(); endif; L = Comp(x,L); if Token  ‘)’ then match(‘,’); endif; enddo; match(‘)’); return Reverse(L); }

6 6 List Grammar  → ( ) | ( )  →, |  → | NUMBER

7 7 Derivation and Parse Tree → ( ) → (, ) → ( NUMBER, ) = (1, ) → (1,, ) → (1, NUMBER, ) = (1, 2, ) → (1, 2, ) → (1, 2, NUMBER) = (1,2,3)

8 8 Derivation and Parse Tree ( ), 1, 2 3

9 9 Parsing and Scanning  Recognizing valid programming language syntax is split into two stages  scanning - group input character stream into tokens  parsing – group tokens into programming language structures  Tokens are described by regular expressions  Programming language structures by context free grammars  Separating into parsing and scanning simplifies both the description and recognition and makes maintenance easier

10 10 Regular Expressions  Alphabet =   A language over  is subset of strings in   Regular expressions describe certain types of languages   is a regular expression   = {  } is a regular expression  For each a in , a denoting {a} is a regular expression  If r and s are regular expressions denoting languages R and S respectively then (r + s), (rs), and (r*) are regular expressions  E.G. 00, (0+1)*, (0+1)*00(0+1)*, 00*11*22*, (1+10)*

11 11 Grammar  Non-terminal symbols  Terminal symbols  Start symbol  Productions (rules)  Context-Free Grammars (rule can not depend on context)  Regular grammar

12 12 Example   if then | if then else   identifier | identifier,   begin end   | ;   =   A | B | C   + | - |

13 13 Expression Grammars   =   A | B | C   + | * | ( ) |   + | * | ( ) |

14 14 Exercise 1  Show a derivation and corresponding parse tree, using the first expression grammar, for the string  A = B*(A+C)  Show that the second expression grammar is ambiguous by showing two distinct parse trees for the string  A = B+C*A

15 15 Parse Tree = A * ( ) + A C B A = B * (A + C)

16 16 Ambiguous Grammar = A + A = B + C * A * B C A = A * + A B C

17 17 Unambiguous Expression Grammar   + |   * |   ( ) |

18 18 Exercise 2  Show the derivation and parse tree using the unambiguous expression grammar for  A = B+C*A  Convince yourself that this grammar is unambiguous (ideally give a proof)

19 19 Solution 2 A = B + C * A = A + * B A C

20 Sketch of Proof  Induction on the length of the input string  Base case: length = 1   Otherwise, 3 cases to consider  ( expr 1 )   Induct on expr 1  expr 1 + term 1 (+ rightmost)   Induct on expr 1 and term 1  term 1 * factor 1 (no +, * rightmost)   Induct on term 1 and factor 1 20

21 21 Recursive Descent Parsing  Turn nonterminals into mutually recursive procedures corresponding to the production rules.  Procedure attempts to match sequence of terminals and nonterminals in rhs of rule.  Determine which rule to apply by looking at next token.  Predictive parsing.  Not all CFGs can be parsed this way

22 22 List Grammar  → ( ) | ( )  →, |  → | NUMBER

23 23 Recursive Descent Parser list() { match(‘(‘); if token  ‘)’ then seq(); endif; match(‘)’); }

24 24 Recursive Descent Parser seq() { elt(); if token = ‘,’ then match(‘,’); seq(); endif; }

25 25 Recursive Descent Parser elt() { if token = ‘(‘ then list(); else match(NUMBER); endif; }

26 26 Exercise 3  Removing left recursion  Rules S → S  [left recursive] cause an infinite loop for a recursive decent parser  Left recursion can be systematically removed  →   |     →   →   |   Remove left recursion from the unambiguous expression grammar

27 27 Solution 3  Remove left recursion from the unambiguous expression grammar  → + |  → * |  Gets transformed into  →  → + |   →  → * | 

28 28 EBNF List Grammar  Zero or more repetitions: { }  Optional : [ ]  → ( ) | ( )  → {, }  → | NUMBER

29 29 Recursive Descent EBNF Parser list() { match(‘(‘); if token  ‘)’ then elt(); while token = ‘,’ do /* { ‘,’ } */ match(‘,’); elt(); enddo; endif; match(‘)’); }

30 30 Parser and Scanner Generators  Tools exist (e.g. yacc/bison 1 for C/C++, PLY for python, CUP for Java) to automatically construct a parser from a restricted set of context free grammars (LALR(1) grammars for yacc/bison and the derivatives CUP and PLY)  These tools use table driven bottom up parsing techniques (commonly shift/reduce parsing)  Similar tools (e.g. lex/flex for C/C++, Jflex for Java) exist, based on the theory of finite automata, to automatically construct scanners from regular expressions 1 bison in the GNU version of yacc

31 31 Yacc (bison) Example %token NUMBER /* needed to communicate with scanner */ % list: '(' sequence ')' { printf("L -> ( seq )\n"); } | '(' ')' { printf("L -> () \n "); } sequence: listelement ',' sequence { printf("seq -> LE,seq\n"); } | listelement { printf("seq -> LE\n"); } ; listelement: NUMBER { printf("LE -> %d\n",$1); } | list { printf("LE -> L\n"); } ; % /* since no code here, default main constructed that simply calls parser. */

32 32 Lex (flex) Example %{ #include "list.tab.h" extern int yylval; %} % [0-9]+ { yylval = atoi(yytext); return NUMBER; } [ \t\n] ; "(" return yytext[0]; ")" return yytext[0]; "," return yytext[0]; "$" return 0; %

33 33 Building bison/flex Parse  Tools available on tux  You can download them for free  Available as part of many linux distributions (if not installed get the appropriate package)  Can be used through cygwin under windows  Build instructions  bison -d list.y => list.tab.c and list.tab.h  flex list.l => lex.yy.c  gcc list.tab.c lex.yy.c -ly -lfl => a.out or a.exe

34 34 Executing Parser Program expects user to enter string followed by ctrl D indicating end of file, or to redirect input from a file. E.G. with valid input $./a.exe (1,2,3) LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq L -> ( seq ) E.G. input with syntax error $./a.exe (1,2,3( LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq syntax error

35 35 Recursive Descent Reader List list() { L = NULL; match(‘(‘); if token ≠ ‘)’ then L = seq(); endif; match(‘)’); return L; }

36 36 Recursive Descent Reader List seq() { x = elt(); if token = ‘,’ then match(‘,’); M = seq(); L = Comp(x,M); else L = Comp(x,NULL); endif; return L; }

37 37 Recursive Descent Reader Element elt() { if token = ‘(‘ then x = list(); else match(NUMBER); x = NUMBER.val; endif; return x; }

38 38 Attribute Grammars  Associate attributes with symbols  Associate attribute computation rules with productions  Fill in values as input parsed (decorate parse tree)  Synthesized vs. inherited attributes

39 39 Example Attribute Grammar  → ( ) | ( )  list.val = NULL  list.val = sequence.val  →, |  seq0.val = Comp(listelement.val,seq1.val)  seq0.val = Comp(listelement.val,NULL)  → | NUMBER  listelement.val = list.val  listelement.val = NUMBER.val

40 40 Decorated Parse Tree ( ), Val = 1, Val = 2 Val = 3 Val = (3) Val = (2,3) Val = 2 Val = 1 Val = (1,2,3)

41 41 Yacc Example with Attributes /* This grammar is ambiguous and will cause shift/reduce conflits */ %token NUMBER % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ; %

42 42 Shift Reduce Parsing  Bottom up parsing  LR(1), LALR(1)  Conflicts & ambiguities  |1+2*3  1|+2*3 [shift]  |+2*3 [reduce]  +|2*3 [shift]  +2|*3 [shift]  + |*3 [reduce]  + |*3 [shift/reduce conflict]  + *|3 [shift]  + *3| [shift]  + * [reduce]  + | [reduce]  [reduce & accept]

43 43 Yacc Example (precedence rules) /* precedence rules added to resolve conflicts and remove ambiguity */ %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ;

44 44 Exercise 4  Show that the following grammar is ambiguous.  → |  → IF THEN  | → IF THEN ELSE  This is called the “dangling else” problem  See if.y for a yacc/bison version of this grammar and are replaced by the tokens EXP and BS stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN stmt { printf("ifstmt -> IF EXP THEN stmt\n"); } | IF EXP THEN stmt ELSE stmt { printf("ifstmt -> IF EXP THEN stmt ELSE stmt\n"); }

45 45 First Parse Tree

46 46 Second Parse Tree

47 47 Shift/Reduce Conflict

48 48 Output from bison $ bison -d if.y if.y: conflicts: 1 shift/reduce

49 49 Exercise 5  Can you use yacc's precedence rules to remove the ambiguity?

50 50 Solution 5  Convention is to associate the ELSE clause with the nearest if statement.  Force ELSE to have higher precedence than THEN  This removes the shift/reduce conflict and forces yacc to shift on the previous example %token IF THEN ELSE EXP BS %nonassoc THEN %nonassoc ELSE

51 51 Shift/Reduce Conflict Removed

52 52 Exercise 6  Can you come up with an unambigous grammar for if statements that always associates the else with the closest if?

53 53 Solution 6  Separate if statements into matched (with ELSE clause and recursively matched stmts) and unmatched  This forces the matched if statement to the end stmt: matched { printf("stmt -> matched \n "); } | unmatched { printf("stmt -> unmatched \n "); } ; matched: BS { printf("matched -> BS \n"); } | IF EXP THEN matched ELSE matched { printf("matched -> IF EXP THEN matched ELSE matched \n"); } ; unmatched: IF EXP THEN stmt { printf("unmatched -> IF EXP THEN stmt \n"); } | IF EXP THEN matched ELSE unmatched { printf("unmatched -> IF EXP THEN matched ELSE unmatched \n"); } ;

54 54 Unambiguous Parse Tree

55 55 No Shift/Reduce Conflict

56 56 Exercise 7  Can you change the syntax for if statements to remove the ambiguity. Hint - try to use syntax to denote the begin and end of the statements in the if statement?

57 57 Solution 7  This is the best solution since the matching IF statement and ELSE clause is visually clear. You do not have to remember unnatural precedence rules.  Such a language choice helps prevent logic bugs stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt} \n"); } | IF EXP THEN '{' stmt '}' ELSE '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt } ELSE { stmt }\n"); }


Download ppt "1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson."

Similar presentations


Ads by Google