CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic2: Parsing and Lexical Analysis José Nelson Amaral

Slides:

Advertisements

Similar presentations

Compiler Construction

Advertisements

Mooly Sagiv and Roman Manevich School of Computer Science

Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)

By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.

1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.

1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.

Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Top-Down Parsing.

Bottom-up parsing Goal of parser : build a derivation

– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.

CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University

Syntax and Semantics Structure of programming languages.

Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.

410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.

Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.

4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.

Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.

Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.

1 Compiler Construction Syntax Analysis Top-down parsing.

Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.

Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.

Syntax and Semantics Structure of programming languages.

4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.

11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.

Bernd Fischer RW713: Compiler and Software Language Engineering.

Lexical Analysis: Finite Automata CS 471 September 5, 2007.

Chapter 5: Bottom-Up Parsing (Shift-Reduce)

1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.

Introduction to Compiling

111 Chapter 6 LR Parsing Techniques Prof Chung. 1.

Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.

Lecture 3: Parsing CS 540 George Mason University.

1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)

CMPUT680 Topic2: Parsing and Lexical Analysis José Nelson Amaral 1.

Syntax Analyzer (Parser)

Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.

CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )

Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.

UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.

Bernd Fischer RW713: Compiler and Software Language Engineering.

Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.

COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.

LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:

Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.

Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 

1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.

Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.

COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.

Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia

Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.

CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.

Syntax and Semantics Structure of programming languages.

Programming Languages Translator

Bottom-Up Parsing.

Table-driven parsing Parsing performed by a finite state machine.

Top-Down Parsing.

Two issues in lexical analysis

Subject Name:COMPILER DESIGN Subject Code:10CS63

Top-Down Parsing CS 671 January 29, 2008.

Lecture 7 Predictive Parsing

Bottom Up Parsing.

R.Rajkumar Asst.Professor CSE

Lecture 7 Predictive Parsing

Presentation transcript:

CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic2: Parsing and Lexical Analysis José Nelson Amaral

CMPUT Compiler Design and Optimization2 Reading List Appel, Chapter 2, 3, 4, and 5 AhoSethiUllman, Chapter 2, 3, 4, and 5

CMPUT Compiler Design and Optimization3 Some Important Basic Definitions lexical: of or relating to the morphemes of a language. morpheme: a meaningul linguistic unit that cannot be divided into smaller meaningful parts. lexical analysis: the task concerned with breaking an input into its smallest meaningful units, called tokens.

CMPUT Compiler Design and Optimization4 Some Important Basic Definitions syntax: the way in which words are put together to form phrases, clauses, or sentences. The rules governing the formation of statements in a programming language. syntax analysis: the task concerned with fitting a sequence of tokens into a specified syntax. parsing: To break a sentence down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part.

CMPUT Compiler Design and Optimization5 Some Important Basic Definitions parsing = lexical analysis + syntax analysis semantic analysis: the task concerned with calculating the program’s meaning.

CMPUT Compiler Design and Optimization6 Regular Expressions Symbol: aA regular expression formed by a. Alternation: M | NA regular expression formed by M or N. Concatenation: M NA regular expression formed by M followed by N. Epsilon:  The empty string. Repetition: M* A regular expression formed by zero or more repetitions of M.

CMPUT Compiler Design and Optimization7 Building a Recognizer for a Language General approach: 1. Build a deterministic finite automaton (DFA) from regular expression E 2. Execute the DFA to determine whether an input string belongs to L(E) Note: The DFA construction is done automatically by a tool such as lex.

CMPUT Compiler Design and Optimization8 Finite Automata A nondeterministic finite automaton A = {S, , s 0, F, move } consists of: 1. A set of states S 2. A set of input symbols  (the input symbol alphabet) 3. A state s 0 that is distinguished as the start state 4. A state F distinguished as the accepting state 5. A transition function move that maps state-symbol pairs into sets of state. In a Deterministic Finite State Automata (DFA), the function move maps each state-symbol pair into a unique state.

CMPUT Compiler Design and Optimization9 Finite Automata A Deterministic Finite Automaton (DFA): A Nondeterministic Finite Automaton (NFA): 0123 a b bbstart 0123 a a b bb What languages are accepted by these automata? b*abb (a|b)*abb (Aho,Sethi,Ullman, pp. 114)

CMPUT Compiler Design and Optimization10 Another NFA start a b a b   An  -transition is taken without consuming any character from the input. What does the NFA above accepts? aa*|bb* (Aho,Sethi,Ullman, pp. 116)

CMPUT Compiler Design and Optimization11 Constructing NFA It is very simple. Remember that a regular expression is formed by the use of alternation, concatenation, and repetition. How do we define an NFA that accepts a regular expression? Thus all we need to do is to know how to build the NFA for a single symbol, and how to compose NFAs.

CMPUT Compiler Design and Optimization12 Composing NFAs with Alternation The NFA for a symbol a is: a i f start Given two NFA N(s) and N(t) N(s) N(t) (Aho,Sethi,Ullman, pp. 122) start i   f  , the NFA N(s|t) is:

CMPUT Compiler Design and Optimization13 Composing NFAs with Concatenation start Given two NFA N(s) and N(t), the NFA N(st) is: N(s)N(t) if (Aho,Sethi,Ullman, pp. 123)

CMPUT Compiler Design and Optimization14 Composing NFAs with Repetition The NFA for N(s*) is N(s)     fi (Aho,Sethi,Ullman, pp. 123)

CMPUT Compiler Design and Optimization15 Properties of the NFA zFollowing this construction rules, we obtain an NFA N(r) with these properties: yN(r) has at most twice as many states as the number of symbols and operators in r; yN(r) has exactly one starting and one accepting state; yEach state of N(r) has at most one outgoing transition on a symbol of the alphabet  or at most two outgoing  -transitions. (Aho,Sethi,Ullman, pp. 124)

CMPUT Compiler Design and Optimization16 How to Parse a Regular Expression? Given a DFA, we can generate an automaton that recognizes the longest substring of an input that is a valid token. Using the three simple rules presented, it is easy to generate an NFA to recognize a regular expression. Given a regular expression, how do we generate an automaton to recognize tokens? Create an NFA and convert it to a DFA.

CMPUT Compiler Design and Optimization17 aAn ordinary character stands for itself.  The empty string. Another way to write the empty string. M | NAlternation, Choosing from M or N. M N Concatenation, an M followed by an N. M*Repetition (zero or more times). M+Repetition (one or more times). M?Optional, zero or one occurrence of M. [a -zA -Z]Character set alternation.. Stands for any single character except newline. “a.+*”Quotation, a string in quotes stands for itself literally. Regular expression notation: An Example (Appel, pp. 20)

CMPUT Compiler Design and Optimization18 if {return IF;} [a - z] [a - z0 - 9 ] * {return ID;} [0 - 9] + {return NUM;} ([0 - 9] + “.” [0 - 9] *) | ([0 - 9] * “.” [0 - 9] +) {return REAL;} (“--” [a - z]* “\n”) | (“ ” | “ \n ” | “ \t ”) + {/* do nothing*/}. {error ();} (Appel, pp. 20) Regular expressions for some tokens

CMPUT Compiler Design and Optimization19 Building Finite Automatas for Lexical Tokens (Appel, pp. 21) The NFA for a symbol i is: i 12 start The NFA for the regular expression if is: f 3 1 start 2 i The NFA for a symbol f is: f 2 start 1 IF if {return IF;}

CMPUT Compiler Design and Optimization20 Building Finite Automatas for Lexical Tokens (Appel, pp. 21) a-z 2 1 start ID [a-z] [a-z0-9 ] * {return ID;} 0-9 a-z

CMPUT Compiler Design and Optimization21 Building Finite Automatas for Lexical Tokens (Appel, pp. 21) start NUM [0 - 9] + {return NUM;} 0-9

CMPUT Compiler Design and Optimization22 Building Finite Automatas for Lexical Tokens (Appel, pp. 21) 1 start REAL ([0 - 9] + “.” [0 - 9] *) | ([0 - 9] * “.” [0 - 9] +) {return REAL;}

CMPUT Compiler Design and Optimization23 Building Finite Automatas for Lexical Tokens (Appel, pp. 21) 1 start /* do nothing */ (“--” [a - z]* “\n”) | (“ ” | “ \n ” | “ \t ”) + {/* do nothing*/} - 2 a-z \n \t 5 blank \n \t blank

CMPUT Compiler Design and Optimization24 ID NUM REAL a-z \n - - blank, etc. White space 21 any but \n error IF 1 2 a-z 0-9 Building Finite Automatas for Lexical Tokens 1 2 i f 3.. (Appel, pp. 21)

CMPUT Compiler Design and Optimization25 Conversion of NFA into DFA (Appel, pp. 27) What states can be reached from state 1 without consuming a character? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character

CMPUT Compiler Design and Optimization26 Conversion of NFA into DFA What states can be reached from state 1 without consuming a character? {1,4,9,14} form the  -closure of state 1 (Appel, pp. 27) a-z 0-9 a-z 0-9 i f          IF error NUM ID any character

CMPUT Compiler Design and Optimization27 Conversion of NFA into DFA What are all the state closures in this NFA? closure(1) = {1,4,9,14} closure(5) = {5,6,8} closure(8) = {6,8} closure(7) = {7,8} (Appel, pp. 27) closure(10) = {10,11,13} closure(13) = {11,13} closure(12) = {12,13} a-z 0-9 a-z 0-9 i f          IF error NUM ID any character

CMPUT Compiler Design and Optimization28 Conversion of NFA into DFA Given a set of NFA states T, the  -closure(T) is the set of states that are reachable through  -transiton from any state s  T. Given a set of NFA states T, move(T, a) is the set of states that are reachable on input a from any state s  T. (Aho,Sethi,Ullman, pp. 118)

CMPUT Compiler Design and Optimization29 Problem Statement for Conversion of NFA into DFA Given an NFA find the DFA with the minimum number of states that has the same behavior as the NFA for all inputs. If the initial state in the NFA is s 0, then the set of states in the DFA, Dstates, is initialized with a state representing  -closure(s 0 ). (Aho,Sethi,Ullman, pp. 118)

CMPUT Compiler Design and Optimization30 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } Now we need to compute: move( ,a-h) = ? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character

CMPUT Compiler Design and Optimization31 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } Now we need to compute: move( ,a-h) = {5,15}  -closure ({5,15}) = ? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character

CMPUT Compiler Design and Optimization32 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } Now we need to compute: move( ,a-h) = {5,15}  -closure ({5,15}) = {5,6,8,15} a-h a-z 0-9 a-z 0-9 i f          IF error NUM ID any character

CMPUT Compiler Design and Optimization33 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , i) = ? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h

CMPUT Compiler Design and Optimization34 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , i) = {2,5,15}  -closure ({2,5,15}) = ? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h

CMPUT Compiler Design and Optimization35 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , i) = {2,5,15}  -closure ({2,5,15}) = {2,5,6,8,15} a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i

CMPUT Compiler Design and Optimization36 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , j-z) = ? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i

CMPUT Compiler Design and Optimization37 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , j-z) = {5,15}  -closure ({5,15}) = ? a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i

CMPUT Compiler Design and Optimization38 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , j-z) = {5,15}  -closure ({5,15}) = {5,6,8,15} a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i j-z

CMPUT Compiler Design and Optimization39 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , 0-9) = {10,15}  -closure ({10,15}) = {10,11,13,15} a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i j-z

CMPUT Compiler Design and Optimization40 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } move( , other ) = {15}  -closure ({15}) = {15} a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i j-z other

CMPUT Compiler Design and Optimization41 Conversion of NFA into DFA (Appel, pp. 27) Dstates = { } The analysis for is complete. We mark it and pick another state in the DFA to analyse a-z 0-9 a-z 0-9 i f          IF error NUM ID any character a-h i j-z other

CMPUT Compiler Design and Optimization42 The corresponding DFA a-e, g-z, 0-9 a-z, f i a-h j-z 0-9 other ID NUM IF error ID a-z,0-9 (Appel, pp. 29) See pp. 118 of Aho-Sethi-Ullman and pp. 29 of Appel.

CMPUT Compiler Design and Optimization43 Lexical Analyzer and Parser lexical analyzer Syntax analyzer symbol table get next token (Aho,Sethi,Ullman, pp. 160) token: smallest meaningful sequence of characters of interest in source program Source Program get next char next char next token (Contains a record for each identifier)

CMPUT Compiler Design and Optimization44 Definition of Context-Free Grammars A context-free grammar G = (T, N, S, P) consists of: 1. T, a set of terminals (scanner tokens). 2. N, a set of nonterminals (syntactic variables generated by productions). 3. S, a designated start nonterminal. 4. P, a set of productions. Each production has the form, A::= , where A is a nonterminal and  is a sentential form, i.e., a string of zero or more grammar symbols (terminals/nonterminals).

CMPUT Compiler Design and Optimization45 Syntax Analysis Syntax Analysis Problem Statement: To find a derivation sequence in a grammar G for the input token stream (or say that none exists).

CMPUT Compiler Design and Optimization46 Tree nodes represent symbols of the grammar (nonterminals or terminals) and tree edges represent derivation steps. Parse trees A parse tree is a graphical representation of a derivation sequence of a sentential form.

CMPUT Compiler Design and Optimization47 Derivation E  E + E | E  E | ( E ) | - E | id Given the following grammar: Is the string -(id + id) a sentence in this grammar? Yes because there is the following derivation: E  -E  -(E)  -(E + E)  -(id + id) Where  reads “derives in one step”. (Aho,Sethi,Ullman, pp. 168)

CMPUT Compiler Design and Optimization48 Derivation E  E + E | E  E | ( E ) | - E | id Lets examine this derivation: E  -E  -(E)  -(E + E)  -(id + id) EE E - E E - E() E E - E() +EE E E - E() +EE id This is a top-down derivation because we start building the parse tree at the top parse tree (Aho,Sethi,Ullman, pp. 170)

CMPUT Compiler Design and Optimization49 Which derivation tree is correct? Another Derivation Example Find a derivation for the expression: id + id  id EE +EE E +EE  EE E +EE  EE id EE  EE E  EE + EE E  EE + EE E  E + E | E  E | ( E ) | - E | id (Aho,Sethi,Ullman, pp. 171)

CMPUT Compiler Design and Optimization50 According to the grammar, both are correct. Another Derivation Example Find a derivation for the expression: id + id  id E +EE  EE id E +EE  EE A grammar that produces more than one parse tree for any input sentence is said to be an ambiguous grammar. E  E + E | E  E | ( E ) | - E | id (Aho,Sethi,Ullman, pp. 171)

CMPUT Compiler Design and Optimization51 Left Recursion Consider the grammar: E  E + T | T T  T  F | F F  ( E ) | id A top-down parser might loop forever when parsing an expression using this grammar EE +ET E +ET + ET E +ET + ET + ET (Aho,Sethi,Ullman, pp. 176)

CMPUT Compiler Design and Optimization52 Left Recursion Consider the grammar: E  E + T | T T  T  F | F F  ( E ) | id A grammar that has at least one production of the form A  A  is a left recursive grammar. Top-down parsers do not work with left-recursive grammars. Left-recursion can often be eliminated by rewriting the grammar. (Aho,Sethi,Ullman, pp. 176)

CMPUT Compiler Design and Optimization53 Left Recursion This left-recursive grammar: E  E + T | T T  T  F | F F  ( E ) | id Can be re-written to eliminate the immediate left recursion: E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id (Aho,Sethi,Ullman, pp. 176)

CMPUT Compiler Design and Optimization54 Predictive Parsing Consider the grammar: stm  if expr then stmt else stmt | while expr do stmt | begin stmt_list end A parser for this grammar can be written with the following simple structure: switch(gettoken()) { case if: …. break; case while: …. break; case begin: …. break; default: reject input; } Based only on the first token, the parser knows which rule to use to derive a statement. Therefore this is called a predictive parser. (Aho,Sethi,Ullman, pp. 183)

CMPUT Compiler Design and Optimization55 Left Factoring The following grammar: stmt  if expr then stmt else stmt | if expr then stmt Cannot be parsed by a predictive parser that looks one element ahead. But the grammar can be re-written: stmt  if expr then stmt stmt’ stmt‘  else stmt |  Where  is the empty string. (Aho,Sethi,Ullman, pp. 178) Rewriting a grammar to eliminate multiple productions starting with the same token is called left factoring.

CMPUT Compiler Design and Optimization56 A Predictive Parser E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id Grammar: Parsing Table: (Aho,Sethi,Ullman, pp. 188)

CMPUT Compiler Design and Optimization57 A Predictive Parser STACK: id +  INPUT: Predictive Parsing Program E $ $ OUTPUT: E T E’ $ T PARSING TABLE:

CMPUT Compiler Design and Optimization58 T E’ $ T $ A Predictive Parser STACK: id +  INPUT: Predictive Parsing Program $ OUTPUT: E F T’ E’ $ FT’TE’ PARSING TABLE: (Aho,Sethi, Ullman, pp. 186)

CMPUT Compiler Design and Optimization59 (Aho,Sethi, Ullman, pp. 188) T E’ $ T $ A Predictive Parser STACK: id +  INPUT: Predictive Parsing Program $ OUTPUT: E F T’ E’ $ FT’TE’ id T’ E’ $ id PARSING TABLE:

CMPUT Compiler Design and Optimization60 A Predictive Parser STACK: id +  INPUT: Predictive Parsing Program $ OUTPUT: E F T’ E’ $ FT’TE’ id T’ E’ $ id Action when Top(Stack) = input  $ : Pop stack, advance input. PARSING TABLE: (Aho,Sethi, Ullman, pp. 188)

CMPUT Compiler Design and Optimization61 A Predictive Parser STACK: id +  INPUT: Predictive Parsing Program $ OUTPUT: E FT’TE’ id  T’ E’ $ $ PARSING TABLE: (Aho,Sethi, Ullman, pp. 188)

CMPUT Compiler Design and Optimization62 A Predictive Parser E FT’ TE’ id  T+E’FT’ id F  T’ id  The predictive parser proceeds in this fashion emiting the following productions: E’  +TE’ T  FT’ F  id T’   FT’ F  id T’   E’   When Top(Stack) = input = $ the parser halts and accepts the input string. (Aho,Sethi, Ullman, pp. 188)

CMPUT Compiler Design and Optimization63 LL(k) Parser This parser parses from left to right, and does a leftmost-derivation. It looks up 1 symbol ahead to choose its next action. Therefore, it is known as a LL(1) parser. An LL(k) parser looks k symbols ahead to decide its action.

CMPUT Compiler Design and Optimization64 The Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id Given this grammar: PARSING TABLE: How is this parsing table built?

CMPUT Compiler Design and Optimization65 FIRST and FOLLOW We need to build a FIRST set and a FOLLOW set for each symbol in the grammar. FIRST(  ) is the set of terminal symbols that can begin any string derived from . The elements of FIRST and FOLLOW are terminal symbols. FOLLOW(  ) is the set of terminal symbols that can follow  : t  FOLLOW(  )   derivation containing  t (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization66 Rules to Create FIRST E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: 1. If X is a terminal, FIRST(X) = {X} FIRST(id) = {id} FIRST(  ) = {  } FIRST(+) = {+} SETS: 2. If X  , then   FIRST(X) 3. If X  Y 1 Y 2 Y k FIRST(() = {(} FIRST()) = {)} FIRST rules: * and Y 1 Y i-1   and a  FIRST(Y i ) then a  FIRST(X) FIRST(F) = { (, id } FIRST(T) = FIRST(F) = { (, id } FIRST(E) = FIRST(T) = { (, id } FIRST(E’) = {  }{+,  } FIRST(T’) = {  }{ ,  } (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization67 Rules to Create FOLLOW E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: 1. If S is the start symbol, then $  FOLLOW(S) FOLLOW(E) = {$} FOLLOW(E’) = { ), $} SETS: 2. If A   B , and a  FIRST(  ) and a   then a  FOLLOW(B) 3. If A   B and a  FOLLOW(A) then a  FOLLOW(B) FOLLOW rules: { ), $ } 3a. If A   B  and and a  FOLLOW(A) then a  FOLLOW(B) *      FOLLOW(T) = { ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } A and B are non-terminals,  and  are strings of grammar symbols (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization68 Rules to Create FOLLOW E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: 1. If S is the start symbol, then $  FOLLOW(S) FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} SETS: 3. If A   B and a  FOLLOW(A) then a  FOLLOW(B) FOLLOW rules: 3a. If A   B  and and a  FOLLOW(A) then a  FOLLOW(B) *      FOLLOW(T) = { ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } 2. If A   B , and a  FIRST(  ) and a   then a  FOLLOW(B) {+, ), $} (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization69 Rules to Create FOLLOW E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: 1. If S is the start symbol, then $  FOLLOW(S) FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} SETS: FOLLOW rules: FOLLOW(T) = {+, ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } 2. If A   B , and a  FIRST(  ) and a   then a  FOLLOW(B) 3. If A   B and a  FOLLOW(A) then a  FOLLOW(B) FOLLOW(T’) = {+, ), $} 3a. If A   B  and and a  FOLLOW(A) then a  FOLLOW(B) *      (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization70 Rules to Create FOLLOW E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: 1. If S is the start symbol, then $  FOLLOW(S) FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} SETS: FOLLOW rules: FOLLOW(T) = {+, ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } 2. If A   B , and a  FIRST(  ) and a   then a  FOLLOW(B) 3. If A   B and a  FOLLOW(A) then a  FOLLOW(B) FOLLOW(T’) = {+, ), $} 3a. If A   B  and and a  FOLLOW(A) then a  FOLLOW(B) *      FOLLOW(F) = {+, ), $} (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization71 Rules to Create FOLLOW E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: 1. If S is the start symbol, then $  FOLLOW(S) FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} SETS: FOLLOW rules: FOLLOW(T) = {+, ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } 3. If A   B and a  FOLLOW(A) then a  FOLLOW(B) FOLLOW(T’) = {+, ), $} 3a. If A   B  and and a  FOLLOW(A) then a  FOLLOW(B) *      FOLLOW(F) = {+, ), $} 2. If A   B , and a  FIRST(  ) and a   then a  FOLLOW(B) {+, , ), $} (Aho,Sethi,Ullman, pp. 189)

CMPUT Compiler Design and Optimization72 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization73 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization74 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization75 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization76 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization77 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] 2. If A   : if   FIRST(  ), add A   to M[A, b] for each terminal b  FOLLOW(A), (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization78 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] 2. If A   : if   FIRST(  ), add A   to M[A, b] for each terminal b  FOLLOW(A), (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization79 Rules to Build Parsing Table E  TE’ E’  +TE’ |  T  FT’ T’   FT’ |  F  ( E ) | id GRAMMAR: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW SETS: FOLLOW(T) = {+, ), $} FOLLOW(T’) = {+, ), $} FOLLOW(F) = {+, , ), $} FIRST(F) = { (, id } FIRST(T) = { (, id } FIRST(E) = { (, id } FIRST(E’) = {+,  } FIRST(T’) = { ,  } FIRST SETS: PARSING TABLE: 1. If A   : if a  FIRST(  ), add A   to M[A, a] 2. If A   : if   FIRST(  ), add A   to M[A, b] for each terminal b  FOLLOW(A), 3. If A   : if   FIRST(  ), and $  FOLLOW(A), add A   to M[A, $] (Aho,Sethi,Ullman, pp. 190)

CMPUT Compiler Design and Optimization80 Bottom-Up and Top-Down Parsers Top-down parsers: starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars. example: predictive parsers Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. examples: shift-reduce parser (or LR(k) parsers) (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization81 Bottom-Up Parser A bottom-up parser, or a shift-reduce parser, begins at the leaves and works up to the top of the tree. The reduction steps trace a rightmost derivation on reverse. S  aABe A  Abc | b B  d Consider the Grammar: We want to parse the input string abbcde. (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization82 Bottom-Up Parser Example adbbc INPUT: Bottom-Up Parsing Program e OUTPUT: $ Production S  aABe A  Abc A  b B  d (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization83 Bottom-Up Parser Example adbbc INPUT: Bottom-Up Parsing Program e OUTPUT: A b $ Production S  aABe A  Abc A  b B  d (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization84 Bottom-Up Parser Example adbAc INPUT: Bottom-Up Parsing Program e OUTPUT: A b $ Production S  aABe A  Abc A  b B  d (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization85 Bottom-Up Parser Example adbAc INPUT: Bottom-Up Parsing Program e OUTPUT: A b $ Production S  aABe A  Abc A  b B  d We are not reducing here in this example. A parser would reduce, get stuck and then backtrack! (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization86 Bottom-Up Parser Example adbAc INPUT: Bottom-Up Parsing Program e OUTPUT: A b $ Production S  aABe A  Abc A  b B  d c A b (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization87 Bottom-Up Parser Example adA INPUT: Bottom-Up Parsing Program e OUTPUT: Ac A b $ Production S  aABe A  Abc A  b B  d b (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization88 Bottom-Up Parser Example adA INPUT: Bottom-Up Parsing Program e OUTPUT: Ac A b $ Production S  aABe A  Abc A  b B  d b B d (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization89 Bottom-Up Parser Example aBA INPUT: Bottom-Up Parsing Program e OUTPUT: Ac A b $ Production S  aABe A  Abc A  b B  d b B d (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization90 Bottom-Up Parser Example aBA INPUT: Bottom-Up Parsing Program e OUTPUT: Ac A b $ Production S  aABe A  Abc A  b B  d b B d a S e (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization91 Bottom-Up Parser Example S INPUT: Bottom-Up Parsing Program OUTPUT: Ac A b $ Production S  aABe A  Abc A  b B  d b B d a S e This parser is known as an LR Parser because it scans the input from Left to right, and it constructs a Rightmost derivation in reverse order. (Aho,Sethi,Ullman, pp. 195)

CMPUT Compiler Design and Optimization92 Bottom-Up Parser Example The scanning of productions for matching with handles in the input string, and backtracking makes the method used in the previous example very inneficient. Can we do better?

CMPUT Compiler Design and Optimization93 LR Parser Example Input StackStack LR Parsing Program actiongoto Output (Aho,Sethi,Ullman, pp. 217)

CMPUT Compiler Design and Optimization94 LR Parser Example The following grammar: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id Can be parsed with this action and goto table (Aho,Sethi,Ullman, pp. 219)

CMPUT Compiler Design and Optimization95 LR Parser Example id +  INPUT: $ STACK: E0 (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program GRAMMAR: OUTPUT: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization96 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program E5 id 0 F GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization97 OUTPUT: 0 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization98 OUTPUT: E3 F 0 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program T F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization99 OUTPUT: 0 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program T F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization100 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program E2 T 0 T F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization101 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program E7  2 T 0 T F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization102 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program E5 id 7  2 T 0 T F F GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization103 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program E7  2 T 0 T F id F GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization104 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program E10 F 7  2 T 0 T  TF F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization105 OUTPUT: 0 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program T  TF F id GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization106 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program 2 T 0 T  TF F id E GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization107 OUTPUT: 0 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program T  TF F id E GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization108 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program 1 E 0 T  TF F id E GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization109 OUTPUT: LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program T  TF F id E E 0 GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization110 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program OUTPUT: T  TF F id E E 0 F GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization111 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program OUTPUT: T  TF F id E E 0 F GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization112 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program OUTPUT: T  TF F id E 3 F E 0 F GRAMMAR: T (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization113 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program OUTPUT: T  TF F id E E 0 F GRAMMAR: (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization114 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program OUTPUT: T  TF F id E 9 T E 0 F GRAMMAR: T E + (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization115 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program 0 GRAMMAR: OUTPUT: T  TF F id E F T E + (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization116 LR Parser Example id  + INPUT: $ STACK: (1) E  E + T (2) E’  T (3) T  T  F (4) T  F (5) F  ( E ) (6) F  id LR Parsing Program OUTPUT: T  TF F id E 1 E 0 F GRAMMAR: T E + (Aho,Sethi,Ullman, pp. 220)

CMPUT Compiler Design and Optimization117 Constructing Parsing Tables All LR parsers use the same parsing program that we demonstrated in the previous slides. What differentiates the LR parsers are the action and the goto tables: Simple LR (SLR): succeds for the fewest grammars, but is the easiest to implement. Canonical LR: succeds for the most grammars, but is the hardest to implement. It splits states when necessary to prevent reductions that would get the parser stuck. Lookahead LR (LALR): succeds for most common syntatic constructions used in programming languages, but produces LR tables much smaller than canonical LR. (See AhoSethiUllman pp ). (See AhoSethiUllman pp ). (See AhoSethiUllman pp ). (Aho,Sethi,Ullman, pp. 221)

CMPUT Compiler Design and Optimization118 Using Lex Lex compiler Lex source program lex.l lex.yy.c C compiler lex.yy.ca.out Input stream sequence of tokens (Aho-Sethi-Ullman, pp. 258)

CMPUT Compiler Design and Optimization119 Parsing Action Conflicts If the grammar specified is ambiguous, yacc will report parsing action conflicts. These conflicts can be reduce/reduce conflicts or shift/reduce conflicts. Yacc has rules to resolve such conflicts automatically (see AhoSethiUllman, pp ), but the resulting parser might not have the behavior intended by the grammar writer.. Whenever you see a conflict report, rerun yacc with the -v flag, examine the y.output file, and re-write your grammar to eliminate the conflicts. (Aho-Sethi-Ullman, pp. 262)

CMPUT Compiler Design and Optimization120 Three-Address Statements A popular form of intermediate code used in optimizing compilers is three- address statements (or variations, such as quadruples). Source statement: x = a + b  c + d Three address statements with temporaries t 1 and t 2 : t 1 = b  c t 2 = a + t 1 x = t 2 + d (Aho-Sethi-Ullman, pp. 466)

CMPUT Compiler Design and Optimization121 Intermediate Code Generation Reading List: Aho-Sethi-Ullman: Chapter 8.1 ~ 8.3, Chapter 8.7

CMPUT Compiler Design and Optimization122 Lexical Analyzer (Scanner) + Syntax Analyzer (Parser) + Semantic Analyzer Abstract Syntax Tree with attributes Intermediate-code Generator Non-optimized Intermediate Code Front End Error Message Front End of a Compiler

CMPUT Compiler Design and Optimization123 Component-Based Approach to Building Compilers Target-1 Code GeneratorTarget-2 Code Generator Intermediate-code Optimizer Language-1 Front End Source program in Language-1 Language-2 Front End Source program in Language-2 Non-optimized Intermediate Code Optimized Intermediate Code Target-1 machine code Target-2 machine code

CMPUT Compiler Design and Optimization124 Advantages of Using an Intermediate Language 1. Retargeting - Build a compiler for a new machine by attaching a new code generator to an existing front-end. 2. Optimization - reuse intermediate code optimizers in compilers for different languages and different machines. Note: the terms “intermediate code”, “intermediate language”, and “intermediate representation” are all used interchangeably.

position := initial + rate * 60 The Phases of a Compiler lexical analyzer id 1 := id 2 + id 3 * 60 syntax analyzer := id 1 + id 2 * id 3 60 semantic analyzer := id 1 + id 2 * id 3 inttoreal 60 intermediate code generator temp1 := inttoreal (60) temp2 := id 3 * temp1 temp3 := id 2 + temp2 id1 := temp3 code optimizer temp1 := id 3 * 60.0 id1 := id 2 + temp1 code generator MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1