1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.

Slides:



Advertisements
Similar presentations
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Advertisements

Chapter 3 Syntax Analysis
Exercise: Balanced Parentheses
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Context-Free Grammars Lecture 7
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
1 Programming Languages (CS 550) Scanner and Parser Generators Jeremy R. Johnson.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
LEX and YACC work as a team
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Using the LALR Parser Generator yacc By J. H. Wang May 10, 2011.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Grammars CPSC 5135.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CPS 506 Comparative Programming Languages Syntax Specification.
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
YACC (Yet Another Compiler-Compiler) Chung-Ju Wu
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Introduction to Parsing
Parsing & Context-Free Grammars
Programming Languages Translator
CS510 Compiler Lecture 4.
Textbook:Modern Compiler Design
Context-free Languages
Bottom-Up Syntax Analysis
CSE 3302 Programming Languages
Lecture 3: Introduction to Syntax (Cont’)
Lexical and Syntax Analysis
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
COP4020 Programming Languages
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
LL and Recursive-Descent Parsing
Compiler Lecture Note, Miscellaneous
Compiler Structures 7. Yacc Objectives , Semester 2,
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
Presentation transcript:

1 Programming Languages (CS 550) Lecture 1 Summary Grammars and Parsing Jeremy R. Johnson

2 Theme  Context free grammars provide a nice formalism for describing syntax of programming languages. Moreover, there is a mechanism for automatically constructing a parser (a recognizer of valid strings in the grammar) from context free grammars (typically a few additional restrictions are enforced to make it easier to construct the parser and the parser more efficient). In this lecture we review grammars as a means of describing syntax and show how, either by hand or using automated tools such as bison, to construct a parser from the grammar.

3 Outline  Motivating Example  Regular Expressions and Scanning  Context Free Grammars  Derivations and Parse Trees  Ambiguous Grammars  Parsing  Recursive Decent Parsing  Shift Reduce Parsing  Parser Generators  Syntax Directed Translation and Attribute Grammars

4 Motivating Example  Write a function, L = ReadList(), that reads an arbitrary order list and constructs a recursive data structure L to represent it  (a1,…,an), ai an integer or recursively a list  Assume the input is a stream of tokens - e.g. ‘(‘, integer, ‘,’, ‘)’ and the variable Token contains the current token  Assume the functions  GetToken() – advance to the next token  Match(token) – if token = Token then GetToken() else error  M = Comp(e,L) – construct list M by inserting element e in the front of L. E.g. Comp(1,(2,3)) = (1,2,3)  M = Reverse(L) – M = the reverse of the list L.

5 Solution L = ListRead() { match(‘(‘); L = NULL; while token  ‘)’ do /* read element */ if Token == NUMBER then x = Token.value; match(NUMBER); else if Token == ‘(‘ x = ListRead(); else error(); endif; L = Comp(x,L); if Token  ‘)’ then match(‘,’); endif; enddo; match(‘)’); return Reverse(L); }

6 List Grammar  → ( ) | ( )  →, |  → | NUMBER

7 Derivation and Parse Tree → ( ) → (, ) → ( NUMBER, ) = (1, ) → (1,, ) → (1, NUMBER, ) = (1, 2, ) → (1, 2, ) → (1, 2, NUMBER) = (1,2,3)

8 Derivation and Parse Tree ( ), 1, 2 3

9 Parsing and Scanning  Recognizing valid programming language syntax is split into two stages  scanning - group input character stream into tokens  parsing – group tokens into programming language structures  Tokens are described by regular expressions  Programming language structures by context free grammars  Separating into parsing and scanning simplifies both the description and recognition and makes maintenance easier

10 Regular Expressions  Alphabet =   A language over  is subset of strings in   Regular expressions describe certain types of languages   is a regular expression   = {  } is a regular expression  For each a in , a denoting {a} is a regular expression  If r and s are regular expressions denoting languages R and S respectively then (r + s), (rs), and (r*) are regular expressions  E.G. 00, (0+1)*, (0+1)*00(0+1)*, 00*11*22*, (1+10)*

11 Grammar  Non-terminal symbols  Terminal symbols  Start symbol  Productions (rules)  Context-Free Grammars (rule can not depend on context)  Regular grammar

12 Example   if then | if then else   identifier | identifier,   begin end   | ;   =   A | B | C   + | - |

13 Expression Grammars   =   A | B | C   + | * | ( ) |   + | * | ( ) |

14 Exercise 1  Show a derivation and corresponding parse tree, using the first expression grammar, for the string  A = B*(A+C)  Show that the second expression grammar is ambiguous by showing two distinct parse trees for the string  A = B+C*A

15 Parse Tree = A * ( ) + A C B A = B * (A + C)

16 Ambiguous Grammar = A + A = B + C * A * B C A = A * + A B C

17 Unambiguous Expression Grammar   + |   * |   ( ) |

18 Exercise 2  Show the derivation and parse tree using the unambiguous expression grammar for  A = B+C*A  Convince yourself that this grammar is unambiguous (ideally give a proof)

19 Solution 2 A = B + C * A = A + * B A C

Sketch of Proof  Induction on the length of the input string  Base case: length = 1   Otherwise, 3 cases to consider  ( expr 1 )   Induct on expr 1  expr 1 + term 1 (+ rightmost)   Induct on expr 1 and term 1  term 1 * factor 1 (no +, * rightmost)   Induct on term 1 and factor 1 20

21 Recursive Descent Parsing  Turn nonterminals into mutually recursive procedures corresponding to the production rules.  Procedure attempts to match sequence of terminals and nonterminals in rhs of rule.  Determine which rule to apply by looking at next token.  Predictive parsing.  Not all CFGs can be parsed this way

22 List Grammar  → ( ) | ( )  →, |  → | NUMBER

23 Recursive Descent Parser list() { match(‘(‘); if token  ‘)’ then seq(); endif; match(‘)’); }

24 Recursive Descent Parser seq() { elt(); if token = ‘,’ then match(‘,’); seq(); endif; }

25 Recursive Descent Parser elt() { if token = ‘(‘ then list(); else match(NUMBER); endif; }

26 Exercise 3  Removing left recursion  Rules S → S  [left recursive] cause an infinite loop for a recursive decent parser  Left recursion can be systematically removed  →   |     →   →   |   Remove left recursion from the unambiguous expression grammar

27 Solution 3  Remove left recursion from the unambiguous expression grammar  → + |  → * |  Gets transformed into  →  → + |   →  → * | 

28 EBNF List Grammar  Zero or more repetitions: { }  Optional : [ ]  → ( ) | ( )  → {, }  → | NUMBER

29 Recursive Descent EBNF Parser list() { match(‘(‘); if token  ‘)’ then elt(); while token = ‘,’ do /* { ‘,’ } */ match(‘,’); elt(); enddo; endif; match(‘)’); }

30 Parser and Scanner Generators  Tools exist (e.g. yacc/bison 1 for C/C++, PLY for python, CUP for Java) to automatically construct a parser from a restricted set of context free grammars (LALR(1) grammars for yacc/bison and the derivatives CUP and PLY)  These tools use table driven bottom up parsing techniques (commonly shift/reduce parsing)  Similar tools (e.g. lex/flex for C/C++, Jflex for Java) exist, based on the theory of finite automata, to automatically construct scanners from regular expressions 1 bison in the GNU version of yacc

31 Yacc (bison) Example %token NUMBER /* needed to communicate with scanner */ % list: '(' sequence ')' { printf("L -> ( seq )\n"); } | '(' ')' { printf("L -> () \n "); } sequence: listelement ',' sequence { printf("seq -> LE,seq\n"); } | listelement { printf("seq -> LE\n"); } ; listelement: NUMBER { printf("LE -> %d\n",$1); } | list { printf("LE -> L\n"); } ; % /* since no code here, default main constructed that simply calls parser. */

32 Lex (flex) Example %{ #include "list.tab.h" extern int yylval; %} % [0-9]+ { yylval = atoi(yytext); return NUMBER; } [ \t\n] ; "(" return yytext[0]; ")" return yytext[0]; "," return yytext[0]; "$" return 0; %

33 Building bison/flex Parse  Tools available on tux  You can download them for free  Available as part of many linux distributions (if not installed get the appropriate package)  Can be used through cygwin under windows  Build instructions  bison -d list.y => list.tab.c and list.tab.h  flex list.l => lex.yy.c  gcc list.tab.c lex.yy.c -ly -lfl => a.out or a.exe

34 Executing Parser Program expects user to enter string followed by ctrl D indicating end of file, or to redirect input from a file. E.G. with valid input $./a.exe (1,2,3) LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq L -> ( seq ) E.G. input with syntax error $./a.exe (1,2,3( LE -> 1 LE -> 2 LE -> 3 seq -> LE seq -> LE,seq syntax error

35 Recursive Descent Reader List list() { L = NULL; match(‘(‘); if token ≠ ‘)’ then L = seq(); endif; match(‘)’); return L; }

36 Recursive Descent Reader List seq() { x = elt(); if token = ‘,’ then match(‘,’); M = seq(); L = Comp(x,M); else L = Comp(x,NULL); endif; return L; }

37 Recursive Descent Reader Element elt() { if token = ‘(‘ then x = list(); else match(NUMBER); x = NUMBER.val; endif; return x; }

38 Attribute Grammars  Associate attributes with symbols  Associate attribute computation rules with productions  Fill in values as input parsed (decorate parse tree)  Synthesized vs. inherited attributes

39 Example Attribute Grammar  → ( ) | ( )  list.val = NULL  list.val = sequence.val  →, |  seq0.val = Comp(listelement.val,seq1.val)  seq0.val = Comp(listelement.val,NULL)  → | NUMBER  listelement.val = list.val  listelement.val = NUMBER.val

40 Decorated Parse Tree ( ), Val = 1, Val = 2 Val = 3 Val = (3) Val = (2,3) Val = 2 Val = 1 Val = (1,2,3)

41 Yacc Example with Attributes /* This grammar is ambiguous and will cause shift/reduce conflits */ %token NUMBER % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ; %

42 Shift Reduce Parsing  Bottom up parsing  LR(1), LALR(1)  Conflicts & ambiguities  |1+2*3  1|+2*3 [shift]  |+2*3 [reduce]  +|2*3 [shift]  +2|*3 [shift]  + |*3 [reduce]  + |*3 [shift/reduce conflict]  + *|3 [shift]  + *3| [shift]  + * [reduce]  + | [reduce]  [reduce & accept]

43 Yacc Example (precedence rules) /* precedence rules added to resolve conflicts and remove ambiguity */ %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS % statement_list: statement '\n' | statement_list statement '\n' ; statement: expression { printf("= %d\n", $1); }; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if ($3 == 0) yyerror("division by zero"); else $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')'{ $$ = $2; } | NUMBER { $$ = $1; } ;

44 Exercise 4  Show that the following grammar is ambiguous.  → |  → IF THEN  | → IF THEN ELSE  This is called the “dangling else” problem  See if.y for a yacc/bison version of this grammar and are replaced by the tokens EXP and BS stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN stmt { printf("ifstmt -> IF EXP THEN stmt\n"); } | IF EXP THEN stmt ELSE stmt { printf("ifstmt -> IF EXP THEN stmt ELSE stmt\n"); }

45 First Parse Tree

46 Second Parse Tree

47 Shift/Reduce Conflict

48 Output from bison $ bison -d if.y if.y: conflicts: 1 shift/reduce

49 Exercise 5  Can you use yacc's precedence rules to remove the ambiguity?

50 Solution 5  Convention is to associate the ELSE clause with the nearest if statement.  Force ELSE to have higher precedence than THEN  This removes the shift/reduce conflict and forces yacc to shift on the previous example %token IF THEN ELSE EXP BS %nonassoc THEN %nonassoc ELSE

51 Shift/Reduce Conflict Removed

52 Exercise 6  Can you come up with an unambigous grammar for if statements that always associates the else with the closest if?

53 Solution 6  Separate if statements into matched (with ELSE clause and recursively matched stmts) and unmatched  This forces the matched if statement to the end stmt: matched { printf("stmt -> matched \n "); } | unmatched { printf("stmt -> unmatched \n "); } ; matched: BS { printf("matched -> BS \n"); } | IF EXP THEN matched ELSE matched { printf("matched -> IF EXP THEN matched ELSE matched \n"); } ; unmatched: IF EXP THEN stmt { printf("unmatched -> IF EXP THEN stmt \n"); } | IF EXP THEN matched ELSE unmatched { printf("unmatched -> IF EXP THEN matched ELSE unmatched \n"); } ;

54 Unambiguous Parse Tree

55 No Shift/Reduce Conflict

56 Exercise 7  Can you change the syntax for if statements to remove the ambiguity. Hint - try to use syntax to denote the begin and end of the statements in the if statement?

57 Solution 7  This is the best solution since the matching IF statement and ELSE clause is visually clear. You do not have to remember unnatural precedence rules.  Such a language choice helps prevent logic bugs stmt: ifstmt { printf("stmt -> ifstmt\n"); } | BS { printf("stmt -> BS\n"); } ; ifstmt: IF EXP THEN '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt} \n"); } | IF EXP THEN '{' stmt '}' ELSE '{' stmt '}' { printf("ifstmt -> IF EXP THEN { stmt } ELSE { stmt }\n"); }