Download presentation
Presentation is loading. Please wait.
Published byPoppy Adams Modified over 9 years ago
1
1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson
2
2 Theme Context free grammars provide a nice formalism for describing syntax of programming languages. Moreover, there is a mechanism for automatically constructing a parser (a recognizer of valid strings in the grammar) from context free grammars (typically a few additional restrictions are enforced to make it easier to construct the parser and the parser more efficient). In this lecture we review grammars as a means of describing syntax and show how, either by hand or using automated tools such as bison, to construct a parser from the grammar.
3
3 Outline Motivating Example Regular Expressions and Scanning Context Free Grammars Derivations and Parse Trees Ambiguous Grammars Parsing Recursive Decent Parsing Shift Reduce Parsing Parser Generators Syntax Directed Translation and Attribute Grammars
4
4 Motivating Example Write a function, L = ReadList(), that reads an arbitrary order list and constructs a recursive data structure L to represent it (a1,…,an), ai an integer or recursively a list Assume the input is a stream of tokens - e.g. ‘(‘, integer, ‘,’, ‘)’ and the variable Token contains the current token Assume the functions GetToken() – advance to the next token Match(token) – if token = Token then GetToken() else error M = Comp(e,L) – construct list M by inserting element e in the front of L. E.g. Comp(1,(2,3)) = (1,2,3) M = Reverse(L) – M = the reverse of the list L.
5
5 List Grammar → ( ) | ( ) →, | → | NUMBER
6
6 Derivation and Parse Tree → ( ) → (, ) → ( NUMBER, ) = (1, ) → (1,, ) → (1, NUMBER, ) = (1, 2, ) → (1, 2, ) → (1, 2, NUMBER) = (1,2,3)
7
7 Derivation and Parse Tree ( ), 1, 2 3
8
8 Parsing and Scanning Recognizing valid programming language syntax is split into two stages scanning - group input character stream into tokens parsing – group tokens into programming language structures Tokens are described by regular expressions Programming language structures by context free grammars Separating into parsing and scanning simplifies both the description and recognition and makes maintenance easier
9
9 Regular Expressions Alphabet = A language over is subset of strings in Regular expressions describe certain types of languages is a regular expression = { } is a regular expression For each a in , a denoting {a} is a regular expression If r and s are regular expressions denoting languages R and S respectively then (r + s), (rs), and (r*) are regular expressions E.G. 00, (0+1)*, (0+1)*00(0+1)*, 00*11*22*, (1+10)*
10
10 Grammar Non-terminal symbols Terminal symbols Start symbol Productions (rules) Context-Free Grammars (rule can not depend on context) Regular grammar
11
11 Example if then | if then else identifier | identifier, begin end | ; = A | B | C + | - |
12
12 Expression Grammars = A | B | C + | * | ( ) | + | * | ( ) |
13
13 Exercise 1 Show a derivation and corresponding parse tree, using the first expression grammar, for the string A = B*(A+C) Show that the second expression grammar is ambiguous by showing two distinct parse trees for the string A = B+C*A
14
14 Parse Tree = A * ( ) + A C B A = B * (A + C)
15
15 Ambiguous Grammar = A + A = B + C * A * B C A = A * + A B C
16
16 Unambiguous Expression Grammar + | * | ( ) |
17
17 Exercise 2 Show the derivation and parse tree using the unambiguous expression grammar for A = B+C*A Convince yourself that this grammar is unambiguous (ideally give a proof)
18
18 Recursive Descent Parser list() { match(‘(‘); if token ‘)’ then seq(); endif; match(‘)’); }
19
19 Recursive Descent Parser seq() { elt(); if token = ‘,’ then match(‘,’); seq(); endif }
20
20 Recursive Descent Parser elt() { if token = ‘(‘ then list(); else match(NUMBER); endif; }
21
21 Parser and Scanner Generators Tools exist (e.g. yacc/bison 1 for C/C++, PLY for python, CUP for Java) to automatically construct a parser from a restricted set of context free grammars (LALR(1) grammars for yacc/bison and the derivatives CUP and PLY) These tools use table driven bottom up parsing techniques (commonly shift/reduce parsing) Similar tools (e.g. lex/flex for C/C++, Jflex for Java) exist, based on the theory of finite automata, to automatically construct scanners from regular expressions 1 bison in the GNU version of yacc
22
22 Yacc (bison) Example %token NUMBER /* needed to communicate with scanner */ % list: '(' sequence ')' { printf("L -> ( seq )\n"); } | '(' ')' { printf("L -> () \n "); } sequence: listelement ',' sequence { printf("seq -> LE,seq\n"); } | listelement { printf("seq -> LE\n"); } ; listelement: NUMBER { printf("LE -> %d\n",$1); } | list { printf("LE -> L\n"); } ; % /* since no code here, default main constructed that simply calls parser. */
23
23 Lex (flex) Example %{ #include "list.tab.h" extern int yylval; %} % [0-9]+ { yylval = atoi(yytext); return NUMBER; } [ \t\n] ; "(" return yytext[0]; ")" return yytext[0]; "," return yytext[0]; "$" return 0; %
24
24 Building bison/flex Parse Tools available on tux You can download them for free Available as part of many linux distributions (if not installed get the appropriate package) Can be used through cygwin under windows Build instructions bison -d paren.y => paren.tab.c and paren.tab.h flex paren.l => lex.yy.c gcc paren.tab.c lex.yy.c -ly -lfl => a.out or a.exe
25
25 Executing Parser Program expects user to enter string followed by ctrl D indicating end of file, or to redirect input from a file. E.G. with valid input $./a.exe (1,2,3) LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq L -> ( seq ) E.G. input with syntax error $./a.exe (1,2,3( LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq syntax error
26
26 Recursive Descent Reader List list() { match(‘(‘); if token = ‘)’ then L = seq(); endif; match(‘)’); L = NULL; return L; }
27
27 Recursive Descent Reader List seq() { x = elt(); if token = ‘,’ then match(‘,’); M = seq(); L = Comp(x,M); else L = Comp(x,NULL) endif return L }
28
28 Recursive Descent Reader Element elt() { if token = ‘(‘ then x = list(); else match(NUMBER); x = NUMBER.val; endif; return x; }
29
29 Attribute Grammars Associate attributes with symbols Associate attribute computation rules with productions Fill in values as input parsed (decorate parse tree) Synthesized vs. inherited attributes
30
30 Example Attribute Grammar → ( ) | ( ) list.val = NULL list.val = sequence.val →, | seq0.val = Comp(listelement.val,seq1.val) seq0.val = Comp(listelement.val,NULL) → | NUMBER listelement.val = list.val listelement.val = NUMBER.val
31
31 Decorated Parse Tree ( ), Val = 1, Val = 2 Val = 3 Val = (3) Val = (2,3) Val = 2 Val = 1 Val = (1,2,3)
32
32 Yacc Example with Attributes /* This grammar is ambiguous and will cause shift/reduce conflits */ %token NUMBER % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ; %
33
33 Shift Reduce Parsing Bottom up parsing LR(1), LALR(1) Conflicts & ambiguities |1+2*3 1|+2*3 [shift] |+2*3 [reduce] +|2*3 [shift] +2|*3 [shift] + |*3 [reduce] + |*3 [shift/reduce conflict] + *|3 [shift] + *3| [shift] + * [reduce] + | [reduce] [reduce & accept]
34
34 Yacc Example (precedence rules) /* precedence rules added to resolve conflicts and remove ambiguity */ %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ;
35
35 Exercise 3 Removing left recursion Rules S → S [left recursive] cause an infinite loop for a recursive decent parser Left recursion can be systematically removed → | → → | Remove left recursion from the unambiguous expression grammar
36
36 Exercise 4 Show that the following grammar is ambiguous. → | → IF THEN | → IF THEN ELSE This is called the “dangling else” problem See if.y for a yacc/bison version of this grammar and are replaced by the tokens EXP and BS stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN stmt { printf("ifstmt -> IF EXP THEN stmt\n"); } | IF EXP THEN stmt ELSE stmt { printf("ifstmt -> IF EXP THEN stmt ELSE stmt\n"); }
37
37 First Parse Tree
38
38 Second Parse Tree
39
39 Shift/Reduce Conflict
40
40 Output from bison $ bison -d if.y if.y: conflicts: 1 shift/reduce
41
41 Exercise 5 Can you use yacc's precedence rules to remove the ambiguity?
42
42 Solution 5 Convention is to associate the ELSE clause with the nearest if statement. Force ELSE to have higher precedence than THEN This removes the shift/reduce conflict and forces yacc to shift on the previous example %token IF THEN ELSE EXP BS %nonassoc THEN %nonassoc ELSE
43
43 Shift/Reduce Conflict Removed
44
44 Exercise 6 Can you come up with an unambigous grammar for if statements that always associates the else with the closest if?
45
45 Solution 6 Separate if statements into matched (with ELSE clause and recursively matched stmts) and unmatched This forces the matched if statement to the end stmt: matched { printf("stmt -> matched \n "); } | unmatched { printf("stmt -> unmatched \n "); } ; matched: BS { printf("matched -> BS \n"); } | IF EXP THEN matched ELSE matched { printf("matched -> IF EXP THEN matched ELSE matched \n"); } ; unmatched: IF EXP THEN stmt { printf("unmatched -> IF EXP THEN stmt \n"); } | IF EXP THEN matched ELSE unmatched { printf("unmatched -> IF EXP THEN matched ELSE unmatched \n"); } ;
46
46 Unambiguous Parse Tree
47
47 No Shift/Reduce Conflict
48
48 Exercise 6 Can you change the syntax for if statements to remove the ambiguity. Hint - try to use syntax to denote the begin and end of the statements in the if statement?
49
49 Solution 6 This is the best solution since the matching IF statement and ELSE clause is visually clear. You do not have to remember unnatural precedence rules. Such a language choice helps prevent logic bugs stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt} \n"); } | IF EXP THEN '{' stmt '}' ELSE '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt } ELSE { stmt }\n"); }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.