1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.

1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson

2 Theme  Context free grammars provide a nice formalism for describing syntax of programming languages. Moreover, there is a mechanism for automatically constructing a parser (a recognizer of valid strings in the grammar) from context free grammars (typically a few additional restrictions are enforced to make it easier to construct the parser and the parser more efficient). In this lecture we review grammars as a means of describing syntax and show how, either by hand or using automated tools such as bison, to construct a parser from the grammar.

3 Outline  Motivating Example  Regular Expressions and Scanning  Context Free Grammars  Derivations and Parse Trees  Ambiguous Grammars  Parsing  Recursive Decent Parsing  Shift Reduce Parsing  Parser Generators  Syntax Directed Translation and Attribute Grammars

4 Motivating Example  Write a function, L = ReadList(), that reads an arbitrary order list and constructs a recursive data structure L to represent it  (a1,…,an), ai an integer or recursively a list  Assume the input is a stream of tokens - e.g. ‘(‘, integer, ‘,’, ‘)’ and the variable Token contains the current token  Assume the functions  GetToken() – advance to the next token  Match(token) – if token = Token then GetToken() else error  M = Comp(e,L) – construct list M by inserting element e in the front of L. E.g. Comp(1,(2,3)) = (1,2,3)  M = Reverse(L) – M = the reverse of the list L.

5 List Grammar  → ( ) | ( )  →, |  → | NUMBER

6 Derivation and Parse Tree → ( ) → (, ) → ( NUMBER, ) = (1, ) → (1,, ) → (1, NUMBER, ) = (1, 2, ) → (1, 2, ) → (1, 2, NUMBER) = (1,2,3)

7 Derivation and Parse Tree ( ), 1, 2 3

8 Parsing and Scanning  Recognizing valid programming language syntax is split into two stages  scanning - group input character stream into tokens  parsing – group tokens into programming language structures  Tokens are described by regular expressions  Programming language structures by context free grammars  Separating into parsing and scanning simplifies both the description and recognition and makes maintenance easier

9 Regular Expressions  Alphabet =   A language over  is subset of strings in   Regular expressions describe certain types of languages   is a regular expression   = {  } is a regular expression  For each a in , a denoting {a} is a regular expression  If r and s are regular expressions denoting languages R and S respectively then (r + s), (rs), and (r*) are regular expressions  E.G. 00, (0+1)*, (0+1)*00(0+1)*, 00*11*22*, (1+10)*

10 Grammar  Non-terminal symbols  Terminal symbols  Start symbol  Productions (rules)  Context-Free Grammars (rule can not depend on context)  Regular grammar

11 Example   if then | if then else   identifier | identifier,   begin end   | ;   =   A | B | C   + | - |

12 Expression Grammars   =   A | B | C   + | * | ( ) |   + | * | ( ) |

13 Exercise 1  Show a derivation and corresponding parse tree, using the first expression grammar, for the string  A = B*(A+C)  Show that the second expression grammar is ambiguous by showing two distinct parse trees for the string  A = B+C*A

14 Parse Tree = A * ( ) + A C B A = B * (A + C)

15 Ambiguous Grammar = A + A = B + C * A * B C A = A * + A B C

16 Unambiguous Expression Grammar   + |   * |   ( ) |

17 Exercise 2  Show the derivation and parse tree using the unambiguous expression grammar for  A = B+C*A  Convince yourself that this grammar is unambiguous (ideally give a proof)

18 Recursive Descent Parser list() { match(‘(‘); if token  ‘)’ then seq(); endif; match(‘)’); }

19 Recursive Descent Parser seq() { elt(); if token = ‘,’ then match(‘,’); seq(); endif }

20 Recursive Descent Parser elt() { if token = ‘(‘ then list(); else match(NUMBER); endif; }

21 Parser and Scanner Generators  Tools exist (e.g. yacc/bison 1 for C/C++, PLY for python, CUP for Java) to automatically construct a parser from a restricted set of context free grammars (LALR(1) grammars for yacc/bison and the derivatives CUP and PLY)  These tools use table driven bottom up parsing techniques (commonly shift/reduce parsing)  Similar tools (e.g. lex/flex for C/C++, Jflex for Java) exist, based on the theory of finite automata, to automatically construct scanners from regular expressions 1 bison in the GNU version of yacc

22 Yacc (bison) Example %token NUMBER /* needed to communicate with scanner */ % list: '(' sequence ')' { printf("L -> ( seq )\n"); } | '(' ')' { printf("L -> () \n "); } sequence: listelement ',' sequence { printf("seq -> LE,seq\n"); } | listelement { printf("seq -> LE\n"); } ; listelement: NUMBER { printf("LE -> %d\n",$1); } | list { printf("LE -> L\n"); } ; % /* since no code here, default main constructed that simply calls parser. */

23 Lex (flex) Example %{ #include "list.tab.h" extern int yylval; %} % [0-9]+ { yylval = atoi(yytext); return NUMBER; } [ \t\n] ; "(" return yytext[0]; ")" return yytext[0]; "," return yytext[0]; "$" return 0; %

24 Building bison/flex Parse  Tools available on tux  You can download them for free  Available as part of many linux distributions (if not installed get the appropriate package)  Can be used through cygwin under windows  Build instructions  bison -d paren.y => paren.tab.c and paren.tab.h  flex paren.l => lex.yy.c  gcc paren.tab.c lex.yy.c -ly -lfl => a.out or a.exe

25 Executing Parser Program expects user to enter string followed by ctrl D indicating end of file, or to redirect input from a file. E.G. with valid input $./a.exe (1,2,3) LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq L -> ( seq ) E.G. input with syntax error $./a.exe (1,2,3( LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq syntax error

26 Recursive Descent Reader List list() { match(‘(‘); if token = ‘)’ then L = seq(); endif; match(‘)’); L = NULL; return L; }

27 Recursive Descent Reader List seq() { x = elt(); if token = ‘,’ then match(‘,’); M = seq(); L = Comp(x,M); else L = Comp(x,NULL) endif return L }

28 Recursive Descent Reader Element elt() { if token = ‘(‘ then x = list(); else match(NUMBER); x = NUMBER.val; endif; return x; }

29 Attribute Grammars  Associate attributes with symbols  Associate attribute computation rules with productions  Fill in values as input parsed (decorate parse tree)  Synthesized vs. inherited attributes

30 Example Attribute Grammar  → ( ) | ( )  list.val = NULL  list.val = sequence.val  →, |  seq0.val = Comp(listelement.val,seq1.val)  seq0.val = Comp(listelement.val,NULL)  → | NUMBER  listelement.val = list.val  listelement.val = NUMBER.val

31 Decorated Parse Tree ( ), Val = 1, Val = 2 Val = 3 Val = (3) Val = (2,3) Val = 2 Val = 1 Val = (1,2,3)

32 Yacc Example with Attributes /* This grammar is ambiguous and will cause shift/reduce conflits */ %token NUMBER % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ; %

34 Yacc Example (precedence rules) /* precedence rules added to resolve conflicts and remove ambiguity */ %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ;

35 Exercise 3  Removing left recursion  Rules S → S  [left recursive] cause an infinite loop for a recursive decent parser  Left recursion can be systematically removed  →   |     →   →   |   Remove left recursion from the unambiguous expression grammar

36 Exercise 4  Show that the following grammar is ambiguous.  → |  → IF THEN  | → IF THEN ELSE  This is called the “dangling else” problem  See if.y for a yacc/bison version of this grammar and are replaced by the tokens EXP and BS stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN stmt { printf("ifstmt -> IF EXP THEN stmt\n"); } | IF EXP THEN stmt ELSE stmt { printf("ifstmt -> IF EXP THEN stmt ELSE stmt\n"); }

37 First Parse Tree

38 Second Parse Tree

39 Shift/Reduce Conflict

40 Output from bison $ bison -d if.y if.y: conflicts: 1 shift/reduce

41 Exercise 5  Can you use yacc's precedence rules to remove the ambiguity?

42 Solution 5  Convention is to associate the ELSE clause with the nearest if statement.  Force ELSE to have higher precedence than THEN  This removes the shift/reduce conflict and forces yacc to shift on the previous example %token IF THEN ELSE EXP BS %nonassoc THEN %nonassoc ELSE

43 Shift/Reduce Conflict Removed

44 Exercise 6  Can you come up with an unambigous grammar for if statements that always associates the else with the closest if?

45 Solution 6  Separate if statements into matched (with ELSE clause and recursively matched stmts) and unmatched  This forces the matched if statement to the end stmt: matched { printf("stmt -> matched \n "); } | unmatched { printf("stmt -> unmatched \n "); } ; matched: BS { printf("matched -> BS \n"); } | IF EXP THEN matched ELSE matched { printf("matched -> IF EXP THEN matched ELSE matched \n"); } ; unmatched: IF EXP THEN stmt { printf("unmatched -> IF EXP THEN stmt \n"); } | IF EXP THEN matched ELSE unmatched { printf("unmatched -> IF EXP THEN matched ELSE unmatched \n"); } ;

46 Unambiguous Parse Tree

47 No Shift/Reduce Conflict

48 Exercise 6  Can you change the syntax for if statements to remove the ambiguity. Hint - try to use syntax to denote the begin and end of the statements in the if statement?

49 Solution 6  This is the best solution since the matching IF statement and ELSE clause is visually clear. You do not have to remember unnatural precedence rules.  Such a language choice helps prevent logic bugs stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt} \n"); } | IF EXP THEN '{' stmt '}' ELSE '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt } ELSE { stmt }\n"); }

1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.

Similar presentations

Presentation on theme: "1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.

Similar presentations

Presentation on theme: "1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson."— Presentation transcript:

Similar presentations

About project

Feedback