CSE P501 – Compiler Construction Top-Down Parsing Predictive Parsing LL(k) Recursive Descent Grammar Grooming Left recursion Left factoring Next Spring.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Top-Down Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
ISBN Chapter 4 Lexical and Syntax Analysis.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Professor Yihjia Tsai Tamkang University
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
Lexical and syntax analysis
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
More Parsing CPSC 388 Ellen Walker Hiram College.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
10/10/2002© 2002 Hal Perkins & UW CSED-1 CSE 582 – Compilers LR Parsing Hal Perkins Autumn 2002.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CSE P501 – Compilers Parsing Context Free Grammars (CFG) Ambiguous Grammars Next Spring 2014Jim Hogg - UW - CSE - P501C-1.
Syntax and Semantics Structure of programming languages.
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Parsing & Context-Free Grammars
Programming Languages Translator
4 (c) parsing.
Top-Down Parsing.
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Lexical and Syntax Analysis
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
Top-Down Parsing CS 671 January 29, 2008.
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
LL and Recursive-Descent Parsing
Parsing & Context-Free Grammars Hal Perkins Summer 2004
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

CSE P501 – Compiler Construction Top-Down Parsing Predictive Parsing LL(k) Recursive Descent Grammar Grooming Left recursion Left factoring Next Spring 2014 Jim Hogg - UW - CSE - P501F-1

Spring 2014Jim Hogg - UW - CSE - P501F-2 S  a A B e A  A b c | b B  d Recap: LR/Bottom-Up/Shift-Reduce Parse abbcde A abbcde abbcde A A abbcde A A B abbcde A A B S Build tree from leaves upwards Shift next token, or reduce handle Accept: no more tokens & root == S LR(k), SLR(k), LALR(k)

Prog Stm 1 Spring 2014 ;Prog AsStm = Exp Var a VorC Const Stm IfStm then Exp if AsStm < VorC Var a 1 Const 2 = Exp Var b VorC Const Prog  Stm ; Prog | Stm Stm  AsStm | IfStm AsStm  Var = Exp IfStm  if Exp then AsStm VorC  Var | Const Exp  VorC | VorC + VorC | VorC < VorC Var  [a-z] Const  [0-9] Top-Down Parsing: Part-Way Done Jim Hogg - UW - CSE - P501

Prog Stm 1 Spring 2014 ;Prog AsStm = Exp Var a VorC Const Stm IfStm then Exp if AsStm + VorC Var a 1 Const 2 = Exp Var b VorC Const Top-Down Parsing: Done Jim Hogg - UW - CSE - P501 Prog  Stm ; Prog | Stm Stm  AsStm | IfStm AsStm  Var = Exp IfStm  if Exp then AsStm VorC  Var | Const Exp  VorC | VorC + VorC | VorC < VorC Var  [a-z] Const  [0-9]

Recap: Topdown, Leftmost Derivation Prog => Stm ; Prog => AsStm ; Prog => Var = Exp ; Prog => a = Exp ; Prog => a = VorC ; Prog => a = Const ; Prog => a = 1 ; Prog => a = 1 ; Stm => a = 1 ; IfStm => a = 1 ; if Exp then AsStm => a = 1 ; if VorC + VorC then AsStm => a = 1 ; if Var + VorC then AsStm Spring 2014Jim Hogg - UW - CSE - P501F-5 => a = 1 ; if a + VorC then AsStm => a = 1 ; if a + Const then AsStm => a = 1 ; if a + 1 then AsStm => a = 1 ; if a + 1 then Var = Exp => a = 1 ; if a + 1 then b = Exp => a = 1 ; if a + 1 then b = VorC => a = 1 ; if a + 1 then b = Const => a = 1 ; if a + 1 then b = 2 Prog  Stm ; Prog | Stm Stm  AsStm | IfStm AsStm  Var = Exp IfStm  if Exp then AsStm VorC  Var | Const Exp  VorC | VorC + VorC Var  [a-z] Const  [0-9] Identical to previous slide, but using text instead of pictures

Left,Left,Left,Right,Left... Spring 2014Jim Hogg - UW - CSE - P501F-6 At each step, we chose the 'right' rules by which to extend the parse tree, in order to reach the given program. How? - by "foretelling the future" Eg: on one occasion we chose Stm  AsStm; on another occasion, we chose Stm  IfStm But we need some algorithm, that we can implement, rather than a "foretell the future" function. Choices:  Brute force: we can build a top-down parse by exploring all possible sentences of the given grammar: simply backtrack if we get stuck, and explore a different set of productions.  Like escaping the Minotaur's Maze by exhaustive enumeration of paths: possible in principle, but time-consuming

Spring 2014Jim Hogg - UW - CSE - P501F-7 Top-Down Parsing Begin at root with start symbol of grammar Repeatedly pick leftmost non-terminal and expand Why leftmost? - because we haven't yet seen tokens that derive from later non-terminals Success when expanded tree matches input LL(k) - Scan source Left-to-Right; always expand Leftmost non-terminal in emerging tree; lookahead up to k tokens In all practical cases, k = 1, works fine Much easier to understand than LR A => Stm ; Prog => AsStm ; Prog => Var = Exp ; Prog => a = Exp ; Prog => a = VorC ; Prog => a = Const ; Prog S w

Spring 2014Jim Hogg - UW - CSE - P501F-8 Top-Down Parsing, in Greek Situation: part-way thru a derivation S =>* wA  =>* wxy [w,x,y  T*, A  N,   (T  N)*] Basic Step: pick some production A   1  2 …  n that will expand A to (ultimately) match the input Back-tracking is expensive So want choice to be deterministic Usually called "predictive" parsing A S Start Symbol N Non-Terminals T Terminal

Spring 2014Jim Hogg - UW - CSE - P501F-9 Predictive Parsing Suppose we are located at some non-terminal A, and there are two or more possible productions: A   |  Want to make the correct choice by looking at just the next input token If we can do this, we can build a predictive parser that can perform a top-down parse: right first time; no backtracking And it’s possible for many real languages/grammars Counter Example: PL/1 did not reserve keywords, so this was legal: IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

Spring 2014Jim Hogg - UW - CSE - P501F-10 Predictive Parsing : Example If the next few tokens in input are: IF LPAREN ID:x … then obviously! choose: stm  if ( exp ) stm stm  id = exp ; | return exp ; | if ( exp ) stm | while ( exp ) stm

Spring 2014Jim Hogg - UW - CSE - P501F-11 LL(1) Property LL(1) grammar:  A  N such that A   | , FIRST(  )  FIRST(  ) = Ø If a grammar is LL(1), we can build a predictive parser for it that uses 1-symbol lookahead Generalize to LL(k)... If we math-up the requirement for a predictive, top-down parser, we get:

Spring 2014Jim Hogg - UW - CSE - P501F-12 LL(k) Parsers An LL(k) parser Scans the input Left to right Constructs a Leftmost derivation Looking ahead at most k symbols LL(1) works for many real language grammars LL(k) for k>1 is rare

Spring 2014Jim Hogg - UW - CSE - P501F-13 Table-Driven LL(k) Parsers As with LR(k), can build a table-driven parser from the grammar Example 1. S  ( S ) S 2. S  [ S ] S 3. S  ε As with generated LR parser, this is hard to understand and debug. But table is so small for LL(1), we can write simple code instead Eg: with S on stack, and lookahead = [ choose production number 3 Lookahead Token NonTerminal()[]$ S13233

FIRST Sets : Example FIRST(  ) = set of tokens (terminals) that can appear first in a derivation of  Spring 2014F-14 Goal  Exp Exp  Term Exp' Exp'  + Term Exp' | - Term Exp' |  Term  Factor Term' Term'   Factor Term' |  Factor Term' |  Factor  ( Exp ) | num | name  First(  ) num name    eof Exp( name num Exp' + -  Term( name num Term'    Factor( name num Grammar FIRST sets

First Sets : Algorithm foreach  in {T, eof, } do FIRST() = {} enddo foreach A in N do FIRST(A) = { } enddo while (FIRST( ) is still changing) do foreach (A 1  2...  n in P) do rhs = FIRST( 1 ) - {} i = 1 while  in FIRST( i ) && i <= n-1 do rhs = FIRST( i+1 ) - {} i++ enddo if i == n &&  in FIRST( n ) then rhs = {} enddo Spring 2014Jim Hogg - UW - CSE - P501F-15 N NonTerminals (~tokens) T Terminals (LHS of productions) eof end-of-file  epsilon Key

Spring 2014Jim Hogg - UW - CSE - P501F-16 LL vs LR Tools can generate parsers for LL(1) and for LR(1) LL(1) decides based on single non-terminal + 1-token lookahead LR(1) decides based on entire left context (contents of the stack) + 1-token lookahead LR(1) is more powerful than LL(1) ie, includes a larger set of languages If you use a tool-generated parser, might as well use LR But some very good LL parser tools (ANTLR, JavaCC) that might win for other reasons (good docs; IDE; good diagnostics; etc)

Spring 2014Jim Hogg - UW - CSE - P501F-17 Recursive-Descent Parsers Easy to implement by hand Key idea: write a method corresponding to each NonTerminal in the grammar Each of these methods is responsible for matching its NonTerminal with the next part of the input

Spring 2014Jim Hogg - UW - CSE - P501F-18 Recursive-Descent Recognizer - 1 stm  id = exp ; | return exp ; | if ( exp ) stm | while ( exp ) stm void parseStm() { switch(this.token.kind) { ID: parseAssignStm(); break; RETURN: parseReturnStm(); break; IF: parseIfStm(); break; WHILE: parseWhileStm(); break; }

Spring 2014Jim Hogg - UW - CSE - P501F-19 Recursive-Descent Recognizer - 2 void parseAssignStm() { getNextToken();// skip id mustbe(EQ); parseExp();// parse ‘exp’ mustbe(SEMI); } void mustbe(TOKEN t) { if (this.token.kind == t.kind) { getNextToken(); else { errorMessage(“expecting “, t.kind); } stm  id = exp ; | return exp ; | if ( exp ) stm | while ( exp ) stm

Spring 2014Jim Hogg - UW - CSE - P501F-20 Recursive-Descent Recognizer - 3 void parseIfStm() { getNextToken();// skip IF mustbe(LPAREN); parseExp(); mustbe(RPAREN); parseStm(); } void parseReturnStm() { getNextToken();// skip RETURN parseExp(); mustbe(SEMI); } stm  id = exp ; | return exp ; | if ( exp ) stm | while ( exp ) stm

Spring 2014Jim Hogg - UW - CSE - P501F-21 Recursive-Descent Recognizer - 4 void parseWhileStm() { getNextToken();// skip WHILE mustbe(LPAREN); parseExp();// parse ‘exp’ mustbe(RPAREN); parseStm(); } stm  id = exp ; | return exp ; | if ( exp ) stm | while ( exp ) stm

Spring 2014Jim Hogg - UW - CSE - P501F-22 Recursive-Descent Recognizer - 5 Recursive-Descent Parser is easy! Pattern of method calls traces the parse tree Example only recognizes (accepts, or rejects) a valid program. Need to add more, such as: Build AST Generate semantic checks (eg: def-before-use) Generate (naïve) code on-the-fly

Spring 2014Jim Hogg - UW - CSE - P501F-23 Invariant for Parse Functions Parser methods must agree on where they are in the input stream- of-tokens Useful invariants: On entry to each parse method, current token begins that parse method's NonTerminal Eg: parseIfStm is entered with this.token.kind == IF On exit from each parse method, current token ends on the token after that parser’s NonTerminal Eg: parseIfStm ends with this.token as first token of next Non-Terminal

Spring 2014Jim Hogg - UW - CSE - P501F-24 Possible Problems Left recursion Eg: E  E + T | … Common prefix on RHS of productions Eg: Factor  name | name ( arglist ) Either one (left recursion, common prefix) forces parser to back-track

Spring 2014Jim Hogg - UW - CSE - P501F-25 Left Recursion exp  exp + term | term void parseExp() { parseExp(); mustbe(PLUS); parseTerm(); } Why is this a problem for LL parsing?... infinite loop!

Spring 2014Jim Hogg - UW - CSE - P501F-26 Left Recursion : Non-Solution Replace with a right-recursive rule: Instead of: expr  expr + term Use?expr  term + expr Why isn’t this the right solution?

Spring 2014Jim Hogg - UW - CSE - P501F-27 Left Recursion : Solution Rewrite using right recursion and a new non-terminal Instead of: exp  exp + term Use:exp  term exp’ exp’  + term exp’ |  Why does this work? exp => term exp’ => term + term exp’ => term + term + term exp’ => term + term + term  Bending notation, equivalent to: exp  term {+ term}* Properties No infinite recursion; maintains left associatively

Spring 2014Jim Hogg - UW - CSE - P501F-28 Code for Exp & Term void parseExp() { parseTerm(); getNextToken(); while (this.token.kind == PLUS) { getNextToken(); parseTerm(); } void parseTerm() { parseFactor(); getNextToken(); while (this.token.kind == TIMES) { getNextToken(); parseFactor(); } exp  term { + term }* term  factor {  factor }* factor  int | id | ( exp )

Spring 2014Jim Hogg - UW - CSE - P501F-29 Code for Factor void parseFactor() { switch(this.token.kind) { case ILIT: // this.token.value getNextToken(); break; case ID: // this.token.lexeme getNextToken(); break; case LPAREN: getNextToken();// skip ‘(‘ parseExp(); mustbe(RPAREN);// check for ‘)’ } exp  term { + term }* term  factor {  factor }* factor  int | id | ( exp )

Spring 2014Jim Hogg - UW - CSE - P501F-30 What About Indirect Left Recursion? A grammar might have a derivation that leads to an indirect left recursion A =>  1 =>*  n => A  There are systematic ways to factor such grammars Eg: see Dragon Book

Spring 2014Jim Hogg - UW - CSE - P501F-31 Left Factoring If two rules for a non-terminal have RHS that begin with the same symbol, we can’t predict which one to use Solution: Factor-out common prefix into a separate production

Spring 2014Jim Hogg - UW - CSE - P501F-32 Left Factoring Example Original grammar stm  if ( exp ) stm | if ( exp ) stm else stm Factored grammar stm  if ( exp ) stm ifTail ifTail  else stm | ε

Spring 2014Jim Hogg - UW - CSE - P501F-33 Parsing if Statements Easy to code up the “else matches closest if” rule directly if ( exp ) stm [ else stm ] void parseIfStm() { getNextToken();// skip IF mustbe(LPAREN);// ‘(‘ parseExp(); mustbe(RPAREN);// ‘)’ parseStm(); if (token.kind == ELSE) { getNextToken(); parseStm(); }

Spring 2014Jim Hogg - UW - CSE - P501F-34 Another Lookahead Problem Old languages like FORTRAN and BASIC use ( ) for array subscripts, rather than [ ] A FORTRAN grammar includes: factor  id ( subscripts ) | id ( arguments ) | … When parser sees ID LPAREN, how to decide array access of function call?

Spring 2014Jim Hogg - UW - CSE - P501F-35 How to handle ( ) ambiguity Use the type of id to decide id previously declared array or method Lookup in Symbol Table Requires declare-before-use if we want to parse in 1 pass Use a covering grammar factor  id ( commaSeparatedList ) | … and fix later when more info becomes available

Spring 2014Jim Hogg - UW - CSE - P501F-36 Top-Down Parsing : The End Works with a smaller set of grammars (LL(1)) than bottom-up (LR(1)), but covers most sensible programming language constructs Recursive descent is often the method of choice in real compilers

Spring 2014Jim Hogg - UW - CSE - P501F-37 Parsing : All Done, for P501 That’s it! On to the rest of the compiler

Spring 2014Jim Hogg - UW - CSE - P501F-38 Topics Intermediate Reps Semantic Analysis Symbol Tables Reading Cooper&Torczon chapter 5 Next