241-437 Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive.

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

Compilers: Parse Tree/9 1 Compiler Structures Objective – –extend the expressions language compiler to generate a parse tree for the input program,
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
Top-Down Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Attr. Grammars/8 1 Compiler Structures Objective – –describe semantic analysis with attribute grammars, as applied in yacc and recursive.
1 Week 4 Questions / Concerns Comments about Lab1 What’s due: Lab1 check off this week (see schedule) Homework #3 due Wednesday (Define grammar for your.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Compilers: syntax/4 1 Compiler Structures Objective – –describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets ,
Compilers: lex analysis/2 1 Compiler Structures Objective – –what is lexical analysis? – –look at a lexical analyzer for a simple 'expressions'
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Parsing Top-Down.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Lecture 3: Parsing CS 540 George Mason University.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Parsing #1 Leonidas Fegaras.
Programming Languages Translator
Lexical and Syntax Analysis
Lecture #12 Parsing Types.
Parsing — Part II (Top-down parsing, left-recursion removal)
Top-Down Parsing.
4 (c) parsing.
Syntax Analysis Sections :.
Top-Down Parsing CS 671 January 29, 2008.
CS 540 George Mason University
Compiler Design 7. Top-Down Table-Driven Parsing
Top-Down Parsing The parse tree is created top to bottom.
Computing Follow(A) : All Non-Terminals
Compiler Structures 8. Attribute Grammars Objectives
Compiler Structures 5. Top-down Parsing Objectives
Compiler Structures 4. Syntax Analysis Objectives
Compiler Structures 2. Lexical Analysis Objectives
9. Creating and Evaluating a
Predictive Parsing Program
Presentation transcript:

Compilers: topDown/5 1 Compiler Structures Objective – –look at top-down (LL) parsing using recursive descent and tables – –consider a recursive descent parser for the Expressions language , Semester 1, Top-down Parsing

Compilers: topDown/5 2 Overview 1. Parsing with a Syntax Analyzer 2. Creating a Recursive Descent Parser 3. The Expressions Language Parser 4. LL(1) Parse Tables 5. Making a Grammar LL(1) 6.Error Recovery in LL Parsing

Compilers: topDown/5 3 In this lecture Source Program Target Lang. Prog. Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code but concentrating on top-down parsing

Compilers: topDown/ Parsing with a Syntax Analyzer Lexical Analyzer (using chars) Syntax Analyzer (using tokens) Source Program 3. Token, token value 1. Get next token lexical errors syntax errors 2. Get chars to make a token parse tree

Compilers: topDown/ Top Down (LL) Parsing begin simplestmt ; simplestmt ; end S S SS B  6 B => begin SS end SS => S ; SS SS =>  S => simplestmt S => begin SS end

Compilers: topDown/ LL Parsing Definition An LL parser is a top-down parser for a context-free grammar. It parses input from Left to right, and constructs a Leftmost derivation of the input.

Compilers: topDown/5 7 A Leftmost Derivation In a leftmost derivation, the leftmost non- terminal is chosen to be expanded. – –this builds the parse tree top-down, left-to-right Example grammar: L => ( L ) L L => 

Compilers: topDown/5 8 Leftmost Derivation for (())() L // L => ( L ) L   ( L ) L // L => ( L ) L  ( ( L ) L ) L // L =>   ( ( ) L ) L // L =>   ( ( ) ) L // L =>   ( ( ) ) ( L ) L // L =>( L ) L   ( ( ) ) ( ) L // L =>    ( ( ) ) ( ) (())() input

Compilers: topDown/ LL(1) and LL(k) An LL(1) parser uses the current token only to decide which production to use next. An LL(k) parser uses k tokens of input to decide which production to use – –this make the grammar easier to write – –adds no 'power' compared to LL(1) – –harder to implement efficiently

Compilers: topDown/ Two LL Implementation Approaches Recursive Descent parsing – –all the compiler code is generated (automatically) from the grammar Table Driven parsing – –a table is generated (automatically) from the grammar – –the table is 'plugged' into an existing compiler

Compilers: topDown/ Creating a Recursive Descent Parser Each non-terminal (e.g. A) is translated into a parsing function (e.g. A()). The A() function is generated from all the productions for A: – –A => B, A => a C, etc.

Compilers: topDown/ Basic Translation Rules I'll start by assuming a production body doesn't use *, [], or . – –I'll add to the translation rules later to deal with these extra features S => Body becomes void S() { translate }

Compilers: topDown/5 13 If Body is B1 B2... Bn then it becomes: translate ; translate ; : translate ;

Compilers: topDown/5 14 If Body is B1 | B2... | Bn then it becomes: if (currToken in FIRST_SEQ ) translate ; else if (currToken in FIRST_SEQ ) translate ; : else if (currToken in FIRST_SEQ ) translate ; else error();

Compilers: topDown/5 15 currToken is the current token, which is obtained from the lexical analyzer: Token currToken; // global void nextToken(void) { currToken = scanner(); }

Compilers: topDown/5 16 The first token is read when the parser first starts. main() also calls the function representing the start symbol: int main(void) { nextToken(); S(); // S is the grammar's start symbol : // other code return 0; }

Compilers: topDown/5 17 error() reports that the current token cannot be matched against any production: int lineNum; // global void error() { printf("\nSyntax error at \'%s\' on line %d\n", currentToken, lineNum); exit(1); }

Compilers: topDown/5 18 In a body, if B is a non-terminal, it is translated into the function call: B(); In a body, if b is a terminal, it is translated into a match() call: match(b);

Compilers: topDown/5 19 match() checks that the current token is what is expected (e.g. b), and reads in the next one for future testing: void match(Token expected) { if(currToken == expected) currToken = scanner(); else error(); }

Compilers: topDown/5 20 Special '|' Body case. If Body is a1 B1 | a2 B2... | an Bn // ai's are terminals then it becomes: if (currToken == a1) { match(a1); translate ; } else if (currToken == a2) { match(a2); translate ; } : else if (currToken == an) { match(an); translate ; } else error(); a1, a2,..., an must be different

Compilers: topDown/5 21 void S() {// S => a B | b C if (currToken == a) { match(a); B(); } else if (currToken == b) { match(b); C(); } else error(); } void B() {// B => b b C match(b); C(); } void C() {// C => c c match(c); } 2.2. Example Translation And main(), nextToken(), match(), and error().

Compilers: topDown/5 22 Parsing "abbcc" S a B b b C c Function calls: main() --> S() --> match(a); B() --> match(b); match(b); C() --> match(c); match(c) abbcc input

Compilers: topDown/ When can we use Recursive Descent? A fast/efficient recursive descent parser can be generated for a LL(1) grammar. So we must first check if the grammar is LL(1). – –the check will generate information that can be used in constructing the parser – –e.g. FIRST_SEQ

Compilers: topDown/5 24 Dealing with "if" A tricky part of LL(1) is making sure that branches can be coded A tricky part of LL(1) is making sure that branches can be coded –each branch must start differently so it's easy (and also fast) to decide which branch to use based only on the current input token (currToken value) continued

Compilers: topDown/5 25 e.g. e.g. –A --> a B1 A --> b B2 –is okay since the two branches start differently (a and b) –A --> a B1 A --> a B2 –not okay since both branches start the same way a.. currToken continued

Compilers: topDown/5 26 In non-mathematical words, a grammar is LL(1) if the choice between productions can be made by looking only at the start of the production bodies and the current input token (currToken). In non-mathematical words, a grammar is LL(1) if the choice between productions can be made by looking only at the start of the production bodies and the current input token (currToken).

Compilers: topDown/5 27 Is a Grammar LL(1)? For every non-terminal in the language (e.g. A, B, C), generate the PREDICT set for all the productions: PREDICT( A =>  1)PREDICT( A =>  2 ) PREDICT( A =>  3 ) PREDICT( B =>  1 )PREDICT( B =>  2 ) PREDICT( C =>  1 )... in maths continued

Compilers: topDown/5 28 Take the intersection of all pairs of sets for A: Take the intersection of all pairs of sets for A: PREDICT( A =>  1) ∩ PREDICT( A =>  2 ) ∩ PREDICT( A =>  1) ∩ PREDICT( A =>  2 ) ∩ PREDICT( A =>  1) ∩ PREDICT( A =>  3 ) ∩ PREDICT( A =>  2) ∩ PREDICT( A =>  3 ) ∩ –the intersection of every pair must be empty (disjoint) continued

Compilers: topDown/5 29 Repeat for all the sets for B, C, etc.: Repeat for all the sets for B, C, etc.: –B -->  1B -->  2 –C -->  1C -->  2C -->  3 If every PREDICT intersection pair is disjoint then the grammar is LL(1). If every PREDICT intersection pair is disjoint then the grammar is LL(1). continued

Compilers: topDown/5 30 If there's only one PREDICT set for a non- terminal (e.g. D --> d1), then it's automatically disjoint.

Compilers: topDown/5 31 Calculating PREDICT PREDICT(A =>  ) = (FIRST_SEQ(  ) – {   FOLLOW(A) if  in FIRST_SEQ(  ) or = FIRST_SEQ(  )if  not in FIRST_SEQ(  ) FIRST_SEQ() and FOLLOW() are the set functions I described in chapter 4.

Compilers: topDown/5 32 Short Example 1 S => a S | a ProductionPredict – –S => a S {a} – –S => a {a} PREDICT(S) = {a} ∩ {a } == {a} – –not disjoint – –the grammar is not LL(1)

Compilers: topDown/5 33 Short Example 2 S => a S | b ProductionPredict – –S => a S {a} – –S => b {b} PREDICT(S) = {a} ∩ {b } == {} – –disjoint – –the grammar is LL(1)

Compilers: topDown/5 34 Larger Example Is this grammar LL(1)? E => T E1 E1 => + T E1 |  T => F T1 T1 => * F T1 |  F => id | '(' E ')' FIRST(F) = {(,id} FIRST(T) = {(,id} FIRST(E) = {(,id} FIRST(T1) = {*,  } FIRST(E1) = {+,  } FOLLOW(E) = {$,)} FOLLOW(E1) = {$,)} FOLLOW(T) = {+$,)} FOLLOW(T1) = {+,$,)} FOLLOW(F) = {*,+,$,)}

Compilers: topDown/5 35 ProductionPredict E => T E1 = FIRST(T) = {(,id} E1 => + T E1 {+} E1 =>  = FOLLOW(E1) = {$,)} T => F T1 = FIRST(F) = {(,id} T1 => * F T1 {*} T1 =>  = FOLLOW(T1) = {+,$,)} F => id {id} F => ( E ) {(} FIRST(F) = {(,id} FIRST(T) = {(,id} FIRST(E) = {(,id} FIRST(T1) = {*,  } FIRST(E1) = {+,  } FOLLOW(E) = {$,)} FOLLOW(E1) = {$,)} FOLLOW(T) = {+$,)} FOLLOW(T1) = {+,$,)} FOLLOW(F) = {*,+,$,)}

Compilers: topDown/5 36 Are the PREDICT sets disjoint for all the non-terminals? – –PREDICT(E): {(,id} yes – –PREDICT(E1): {+} ∩ {$,)}yes – –PREDICT(T): {(,id}yes – –PREDICT(T1): {*} ∩ {+,$,)}yes – –PREDICT(F): {id} ∩ {(}yes All disjoint, so the grammar is LL(1).

Compilers: topDown/ Extended Translation Rules These extra rules allow a production body to use *, [], or . S => Body becomes void S() { translate } same as before

Compilers: topDown/5 38 If Body is B1 | B2... | Bn |  then it becomes: if (currToken in FIRST_SEQ(B1)) translate ; else if (currToken in FIRST_SEQ(B2)) translate ; : else if (currToken in FIRST_SEQ(Bn)) translate ; else error(); optional  part include if there's no  part in the grammar

Compilers: topDown/5 39 If Body is [ B1 B2... Bn ] then it becomes: if (currToken in FIRST_SEQ(B1)) { translate ; translate ; : translate ; } – –[ B1 B2... Bn ] is the same as ( B1 B2... Bn ) |  rule []-1

Compilers: topDown/5 40 A variant [] translation. If the body is [ B1 B2... Bn ] C then it can become: if (currToken not in FIRST_SEQ(C)) translate ; translate ; : translate ; } translate ; rule []-2 This may be simpler code than FIRST_SEQ(B1)

Compilers: topDown/5 41 Another variant [] translation. If the grammar rule is A => [ B1 B2... Bn ] then it becomes: void A() { if (currToken not in FOLLOW(A)) translate ; translate ; : translate ; } } rule []-3 This may be simpler code than FIRST_SEQ(B1)

Compilers: topDown/5 42 If Body is ( B1 B2... Bn )* then it becomes: while (currToken in FIRST_SEQ(B1)) translate ; translate ; : translate ; } rule *-1

Compilers: topDown/5 43 A variant * translation. If the body is ( B1 B2... Bn )* C then it becomes: while (currToken not in FIRST_SEQ(C)) translate ; translate ; : translate ; } translate ; rule *-2 This may be simpler code than FIRST_SEQ(B1)

Compilers: topDown/5 44 Another variant * translation. If the grammar rule is A => ( B1 B2... Bn )* then it becomes: void A() { while (currToken not in FOLLOW(A)) translate ; translate ; : translate ; } } rule *-3 This may be simpler code than FIRST_SEQ(B1)

Compilers: topDown/5 45 match() is slightly changed to deal with the end of input symbol, $: void match(Token expected) { if(currToken == expected) { if (currToken != $) currToken = scanner(); } else error(); }

Compilers: topDown/5 46 Translation Example 1 The LL(1) Grammar: E => T E1 E1 => [ '+' T E1 ] T => F T1 T1 => [ '*' F T1 ] F => id | '(' E ')' This is the same grammar as on slides 34-36, so we know it's LL(1).

Compilers: topDown/5 47 Generated Parser void E()// E => T E1 { T(); E1(); } void E1()// E1 => ['+' T E1 ] { if (currToken == '+') { match('+'); T(); E1(); } } use rule []-1 This is C code for "currToken in FIRST_SEQ(+)"

Compilers: topDown/5 48 void T()// T => F T1 { F(); T1(); } void T1()// T1 => ['*' F T1 ] { if (currToken == '*') { match('*'); F(); T1(); } } rule []-1 This is C code for "currToken in FIRST_SEQ(*)"

Compilers: topDown/5 49 void F()// F => id | '(' E ')' { if (currToken == ID) match(ID); else if (currToken == '(') { match('('); E(); match(')'): } else error(); }

Compilers: topDown/5 50 Parsing "a + b * c" E T E1 F T1 + T E1 id a * F T1id b  F T1  id c  a+b*c input

Compilers: topDown/5 51 Optimizations It's possible to combine grammar rules and/or parse functions, in order to simplify the compiler. For example, we can combine: – –E and E1 – –T and T1

Compilers: topDown/5 52 Translation Example 2 The previous LL(1) grammar can be expressed using *: E => T ( '+' T )* T => F ( '*' F )* F => id | '(' E ')' same as before

Compilers: topDown/5 53 Generated Parser void E()// E => T ('+' T)* { T(); while (currToken == '+') { match('+'); T(); } } void T()// T => F ('*' F)* { F(); while (currToken == '*') { match('*'); F(); } } rule *-1

Compilers: topDown/5 54 void F()// F => id | '(' E ')' { if (currToken == ID) match(ID); else if (currToken == '(') { match('('); E(); match(')'): } else error(); } same as before

Compilers: topDown/5 55 Parsing "a + b * c" Again E T F + T id a * F id b F id c done inside the E() loop done inside the T() loop

Compilers: topDown/ The Expressions Language Parser Is this grammar LL(1)? Stats => ( [ Stat ] \n )* Stat => let ID = Expr | Expr Expr => Term ( (+ | - ) Term )* Term => Fact ( (* | / ) Fact ) * Fact => '(' Expr ')' | Int | ID

Compilers: topDown/ FIRST and FOLLOW Sets First(Stats) = {let, (, Int, Id, \n,  } First(Stat) = {let, (, Int, Id} First(Expr) = {(, Int, Id} First(Term) = {(, Int, Id} First(Fact) = {(, Int, Id} Follow(Stats) = {$} Follow(Stats) = {$} Follow(Stat) = {\n} Follow(Stat) = {\n} Follow(Expr) = {\n} Follow(Expr) = {\n} Follow(Term) = {+, -, \n} Follow(Term) = {+, -, \n} Follow(Fact) = {*, /, +,-,\n} Follow(Fact) = {*, /, +,-,\n}

Compilers: topDown/ PREDICT Sets ProductionPredict Disjoint Stats => ( [ Stat ] \n )*{let,(,Int,Id,\n,$}Yes Stat => let ID = Expr{let}Yes Stat => Expr{(,Int,Id} Expr => Term ( (+ | - ) Term )*{(,Int,Id}Yes Term => Fact ( (* | / ) Fact ) *{(,Int,Id}Yes Fact => '(' Expr ')'{(}Yes Fact => Int{Int} Fact => Id{Id}

Compilers: topDown/ exprParse0.c exprParse0.c is a recursive descent parser generated from the expressions grammar. It reads in an expressions program file. It's output is a print-out of parse function calls.

Compilers: topDown/5 60 An Expressions Program (test1.txt) let x = ( (x*y)/2) // comments // y let x = 5 let y = x /0 // comments

Compilers: topDown/5 61 Usage > gcc -Wall -o exprParse0 exprParse0.c >./exprParse0 < test1.txt 1: stats< 2: stat >'+' term >>> 3: stat >>> 4: stat >'+' term 5: 6: stat >>> 7: stat '/' fact >>> 8: 9: 10: >'eof'

Compilers: topDown/5 62 exprParse0.c Callgraph lexical parser (like exprTokens.c) generated from the grammar

Compilers: topDown/5 63 Standard Token Functions // globals (first used in exprToken.c) Token currToken; char tokString[MAX_IDLEN]; int tokStrLen = 0; int currTokValue; int lineNum = 1; // no. of lines read in void nextToken(void) { currToken = scanner(); } continued

Compilers: topDown/5 64 void match(Token expected) { if(currToken == expected){ printToken(); // produces the parser's output if(currToken != SCANEOF) currToken = scanner(); } else printf("Expected %s, found %s on line %d\n", tokSyms[expected], tokSyms[currToken],lineNum); } // end of match() continued

Compilers: topDown/5 65 void printToken(void) { if (currToken == ID) printf("%s(%s) ", tokSyms[currToken], tokString); // show token string else if (currToken == INT) printf("%s(%d) ", tokSyms[currToken], currTokValue); // show value else if (currToken == NEWLINE) printf("%s%2d: ", tokSyms[currToken], lineNum); // print newline token else printf("'%s' ", tokSyms[currToken]); // other tokens } // end of printToken()

Compilers: topDown/5 66 Syntax Error Reporting void syntax_error(Token tok) { printf("\nSyntax error at \'%s\' on line %d\n", tokSyms[tok], lineNum); exit(1); }

Compilers: topDown/5 67 main() int main(void) { printf("%2d: ", lineNum); nextToken(); statements(); match(SCANEOF); printf("\n\n"); return 0; } function for start symbol check that program is finished at eof

Compilers: topDown/5 68 Parsing Functions void statements(void) // Stats => ( [ Stat ] '\n' )* { printf("stats<"); while (currToken != SCANEOF) { if (currToken != NEWLINE) statement(); match(NEWLINE); } printf(">"); } // end of statements() rule *-3 rule []-2

Compilers: topDown/5 69 void statement(void) // Stat => ( 'let' ID '=' Expr ) | Expr { printf("stat<"); if (currToken == LET) { match(LET); match(ID); match(ASSIGNOP); expression(); } else if ((currToken == LPAREN) || (currToken == INT) || (currToken == ID)) expression(); else error(); printf(">"); } // end of statement() Complicated, but it can be optimized with some 'tricks'

Compilers: topDown/5 70 void expression(void) // Expr => Term ( ( '+' | '-' ) Term )* { printf("expr<"); term(); while((currToken == PLUSOP) || (currToken == MINUSOP)) { if (currToken == PLUSOP) match(PLUSOP); else if (currToken == MINUSOP) match(MINUSOP); else error(); term(); } printf(">"); } // end of expression() rule *-1 Version 1

Compilers: topDown/5 71 void expression(void) // Expr => Term ( ( '+' | '-' ) Term )* { printf("expr<"); term(); while((currToken == PLUSOP) || (currToken == MINUSOP)) { match(currToken); term(); } printf(">"); } // end of expression() Version 2: simplified | code Shorter, but also harder to understand!

Compilers: topDown/5 72 void term(void) // Term => Fact ( ('*' | '/' ) Fact )* { printf("term<"); factor(); while((currToken == MULTOP) || (currToken == DIVOP)) { if (currToken == MULTOP) match(MULTOP); else if (currToken == DIVOP) match(DIVOP); else error(); factor(); } printf(">"); } // end of term() rule *-1 Version 1

Compilers: topDown/5 73 void term(void) // Term => Fact ( ('*' | '/' ) Fact )* { printf("term<"); factor(); while((currToken == MULTOP) || (currToken == DIVOP)) { match(currToken); factor(); } printf(">"); } // end of term() Version 2: simplified | code Shorter, but also harder to understand!

Compilers: topDown/5 74 void factor(void) // Fact => '(' Expr ')' | INT | ID { printf("fact<"); if(currToken == LPAREN) { match(LPAREN); expression(); match(RPAREN); } else if(currToken == INT) match(INT); else if (currToken == ID) match(ID); else syntax_error(currToken); printf(">"); } // end of factor()

Compilers: topDown/ LL(1) Parse Tables The format of a parse table: – –T[non-term][term] A non-terminals b terminals a production A =>  with b  PREDICT(A=>  )

Compilers: topDown/5 76 Other Data Structures Sequence of input tokens (ending with $). A parse stack to hold nonterminals and terminals that are being processed. $ E push pop

Compilers: topDown/5 77 push($); push(start_symbol); currToken = scanner(); do X = pop(stack); if (X is a terminal or $) { if (X == currToken) currToken = scanner(); else error(); } else // X is a non-terminal if (T[X][currToken] == X => Y 1 Y 2...Y m ) push(Y m );... push (Y 1 ); else error(); while (X != $); The Parsing Algorithm like match()

Compilers: topDown/ Table Parsing Example Use the LL(1) grammar: E => T E1 E1 => '+' T E1 |  T => F T1 T1 => '*' F T1 |  F => id | '(' E ')'

Compilers: topDown/5 79 NT/T+*()ID$ E11 E123 T44 T16566 F87 ProductionPredict 1: E => T E1 {(,id} 2: E1 => + T E1 {+} 3: E1 =>  {$,)} 4: T => F T1 {(,id} 5: T1 => * F T1 {*} 6: T1 =>  {+,$,)} 7: F => id {id} 8: F => ( E ) {(} Parse Table Generation

Compilers: topDown/5 80 Parsing "a + b * c $" StackInputAction $Ea+b*c$ E => T E1 $E1 T " T => F T1 $E1 T1 F " F => id $E1 T1 id "match $E1 T1 +b*c$ T1 =>  $E1" E1 => + T E1 $E1 T+ "match $E1 T b*c$ T => F T1 StackInputAction $E1 T1 F " F => id $E1 T1 id "match $E1 T1 *c$ T1 => * F T1 $E1 T1 F * "match $E1 T1 F c$ F => id $E1 T1 id "match $E1 T1 $ T1 =>  $E1" E1 =>  $" Success!

Compilers: topDown/ Making a Grammar LL(1) Not all context free grammars are LL(1). We can tell if a grammar is not LL(1) by looking at its PREDICT sets – –for a LL(1) grammar, the PREDICT sets for a non-terminal will be disjoint

Compilers: topDown/5 82 Example ProductionPredict E => E + T = FIRST(E) = {(,id} E => T = FIRST(T) = {(,id} T => T * F = FIRST(T) = {(,id} T => F = FIRST(F) = {(,id} F => id = {id} F => ( E ) = {(} FIRST(F) = {(,id} FIRST(T) = {(,id} FIRST(E) = {(,id} FOLLOW(E) = {$,),+} FOLLOW(T) = {+,$,),*} FOLLOW(F) = {+,$,),*} E and T are problems since their PREDICT sets are not disjoint.

Compilers: topDown/5 83 Example of Disjoint Problem Input "5 + b" There are two productions to choose from: E => E + T E => T Which should be chosen by looking only at the current token "5"?

Compilers: topDown/ From non-LL(1) to LL(1) There are two main techniques for converting a non-LL(1) grammar to LL(1). – –but they don't work for every grammar 1. Left Factoring – –e.g. used on A => B a C D | B a C E 2. Transforming left recursion to right recursion – –e.g. used on E => E + T | T

Compilers: topDown/ Left Factoring S => a B | a C – –to see the problem try choosing a production to parse "a" in "andrew" Change S to: S => a S1 S1 => B | C – –now there is no difficult choice

Compilers: topDown/5 86 In general: A =>  n becomes A =>  A1 A1 =>  n

Compilers: topDown/ Why is Left Recursion a Problem? Grammar: A => A b A => b The input is "bbbb". Using only the current token, "b", which production should be used?

Compilers: topDown/5 88 Remove Left Recursion A => A  1 | A  2 | … |  1 |  2 | … becomes A =>  1 A1 |  2 A1 | … A1 =>  1 A1 |  2 A1 | … |   he left recursion is changed to right recursion in the new A1 rule.

Compilers: topDown/5 89 Example Translation The left recursive grammar: A => A b | b becomes A => b A1 A1 => b A1 |  Try parsing the input string "bbbb" using only the current token "b".

Compilers: topDown/5 90 Fixing the E Grammar The folowing E grammar is not LL(1): E => E + T | T T => T * F | F F => id | ( E ) Try parsing "5 + b" continued

Compilers: topDown/5 91 Eliminate left recursion in E and T: E => T E1 E1 => + T E1 |  T => F T1 T1 => * F T1 |  F => id | ( E ) This version of the E grammar is LL(1), and we've been using it for most of our examples.

Compilers: topDown/ Non-Immediate Left Recursion Ex: A 1 => A 2 a | b A 2 => A 1 c | A 2 d Convert to immediate left recursion – –replace A 1 in A 2 productions by A 1 ’s definition: A 1 => A 2 a | b A 2 => A 2 a c | b c | A 2 d Now eliminate left recursion in A 2 : A 1 => A 2 a | b A 2 => b c A 3 A 3 => a c A 3 | d A 3 |  A1A1 A2A2

Compilers: topDown/5 93 Example A => B c | d B => C f | B f C => A e | g Replace C in B's production by C's defn: B => A e f | g f | B f Replace A in B's production by A's defn: B => B c e f | d e f | g f | B f A C B

Compilers: topDown/5 94 Now grammar is: A => B c | d B => B c e f | d e f | g f | B f C => A e | g Get rid of left recursion in B: A => B c | d B => d e f B1 | g f B1 B1 => c e f B1 | f B1 |  C => A e | g If A is the start symbol, then the C production is never called, so can be deleted.

Compilers: topDown/ Error Recovery in LL Parsing Simple answer: – –when there's an error, print a message and exit Better error recovery: – –1. insert the expected token and continue this approach can cause non-termination – –2. keep deleting tokens until the parser gets a token in the FOLLOW set for the production that went wrong see example on next slide

Compilers: topDown/5 96 void E() { if (currToken in FIRST(T)) { // error checking T(); E1(); // FIRST(T) == {(,ID} } else { // error reporting and recovery printf("Expecting one of FIRST(T)"); while (currToken not in FOLLOW(E)) // FOLLOW(E) == {),$} currToken = scanner(); // skip input } } // end of E() Example: E→T E1 from slide 29

Compilers: topDown/5 97 void E() { if ((currToken == LPAREN) || (currToken == ID)) { T(); E1(); } else { printf("Expecting ( or id"); while ( (currToken != RPAREN) && (currToken != SCANEOF)) currToken = scanner(); } } // end of E() C Code