Bernd Fischer RW713: Compiler and Software Language Engineering.

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

Mooly Sagiv and Roman Manevich School of Computer Science
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
CPSC 388 – Compiler Design and Construction
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Compiler Principles Fall Compiler Principles Lecture 3: Parsing part 2 Roman Manevich Ben-Gurion University.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Parsing Top-Down.
More Parsing CPSC 388 Ellen Walker Hiram College.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Increasing power of LL(k) parsers.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
Syntax Analyzer (Parser)
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Bernd Fischer RW713: Compiler and Software Language Engineering.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Fangfang Cui Mar. 29, Overview 1. LL(k) 1. LL(k) Definition 2. LL(1) 3. Using a Parsing Table 4. First Sets 5. Follow Sets 6. Building a Parsing.
Syntax and Semantics Structure of programming languages.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Programming Languages Translator
Compiler Construction
Top-down parsing cannot be performed on left recursive grammars.
Top-Down Parsing.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7 Predictive Parsing
Predictive Parsing Lecture 9 Wed, Feb 9, 2005.
Syntax Analysis - Parsing
Lecture 7 Predictive Parsing
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Bernd Fischer RW713: Compiler and Software Language Engineering

Top-Down Parsing

Top-down parsing searches for the (leftmost) derivation. context-free grammars: derive individual words by recursively applying productions –start with ω = S –pick an occurrence of a non-terminal A in ω –pick a production A → α in P –replace A by α in ω –repeat until ω ∈ T*     = x leftmost ? ? but which one…? ? ? how to do this efficiently?

Top-down parsing searches for the (leftmost) derivation using a stack. Use a parse stack to represent the derivation: initialize s = S if x = ε –if s = ε then accept else reject if tos ∈ T –if x i = tos then pop; skip x i else reject if tos ∈ N –pick a production tos → α in P; pop; push(α) The parser stack can be explicit or implicit. symbol by symbol, in reverse order.

Top-down parsing searches for the (leftmost) derivation using a stack. Pop-Quiz: Consider the following grammar stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID Show the evolution of the parse stack for the input (if(x)print y else (print x; print y))

Top-down parsing searches for the (leftmost) derivation using a stack. How do you pick the right production: based on already read input? based on remaining input? based on stack? all of it? subset of it?

Top-down parsing searches for the (leftmost) derivation using a stack. How do you pick the right production: based on already read input? based on prefix of remaining input based on top of stack all of it? subset of it?

Table-driven top-down parser (if(x)print y else (print x; print y)) stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID generic table interpreter tail stmt else expr print parsing table previously read current tokenstill unread output (AST / syntax error) tos parse stack input tape

Table-driven top-down parser (if(x)print y else (print x; print y)) stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID generic table interpreter tail stmt else expr parsing table output (AST / syntax error)

Table-driven top-down parser (if(x)print y else (print x; print y)) stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID generic table interpreter tail stmt else ID parsing table output (AST / syntax error)

Table-driven top-down parser (if(x)print y else (print x; print y)) stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID generic table interpreter tail stmt else parsing table output (AST / syntax error)

Table-driven top-down parser (if(x)print y else (print x; print y)) generic table interpreter tail parsing table output (AST / syntax error) stmt stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID

Table-driven top-down parser (if(x)print y else (print x; print y)) stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID generic table interpreter tail parsing table output (AST / syntax error) stmt (

LL-Parsing

LL(k) grammars allow a deterministic top-down parser. Definition: A context-free grammar G = (N, T, P, S) is LL(k) for k ∈ Nat if for any two leftmost derivations S ⇒ l * uAα ⇒ l uβα ⇒ l * ux and S ⇒ l * uAα ⇒ l uγα ⇒ l * uy the following holds: If prefix k (x) = prefix k (y), then β = γ. A language L is LL(k) if there exists an LL(k) grammar G such that L(G) = L. Defined in terms of derivations, not rules! next k tokens determine rule

Not all context-free grammars are LL(k). Consider the following left-recursive grammar G: S→ES→E E→E + T | T T→T * F | F F→(E) | id Fix by (immediate) left-recursion elimination: replace each rule A → Aα | β (where β ≠ Aγ) by two new rules A → βA’ and A’ → αA’ | ε (where A’ is a fresh non-terminal) need to “look over E” to see whether a + follows (and the first alternative must be chosen) L(G) = L(G’), but changes parse trees generalizes to multiple alternatives algorithm for indirect case exists as well

Not all context-free grammars are LL(k). Pop-Quiz: apply left-recursion elimination to G: S→ES→E E→E + T | E - T | T T→T * F | F F→(E) | id Pop-Quiz: use G and G’ to construct parse trees for the input string “a-b-c”. S→E E→TE’ E’→ + TE’ | - TE’ | ε T→FT’ T’→ * FT’ | ε F→(E) | id

Not all context-free grammars are LL(k). Consider the following grammar G: stmt →if expr then stmt end; | if expr then stmt else stmt end ; Fix by left factoring: replace each rule A → αβ | αγ by two new rules A → αA’ and A’ → β | γ (where A’ is a fresh non-terminal) need to look over arbitrarily long common prefix to find distinguishing token

Not all context-free grammars are LL(k).

Developing an LL(k) check on rules (k=1) A grammar is trivially LL(1) if all alternatives start with a different token: stmt →(stmt tail | if (expr) stmt else stmt | print expr tail→; stmt tail | ) expr →ID

Developing an LL(k) check on rules (k=1) But what happens if there are non-terminals? stmt →(stmt tail | if (expr) stmt else stmt | call tail→; stmt tail | ) expr →ID call→print expr | open ID | close ID need to check first token of all possible right-hand sides for call. disjoint from other alternatives, so ok.

Computing FIRST(  ) For a grammar without ε-productions, this is straightforward: –FIRST(s) = { s } for s  T –FIRST(A) = U FIRST( β ) for all A  β and FIRST(A α ) = FIRST(A), FIRST( α A) = FIRST( α ) ε-productions make things more complicated E.g. S  A x A  z A | ε FIRST(A) = { z } but FIRST(S) = { x, z }

Another problem with ε FIRST no longer sufficient! –Derivation  FDef  Fun ( Arg ) : Type ;  function ID ( Arg ) : Type ; –Which production for Arg ?  FIRST(ID : Type ; Arg) = {ID}, FIRST(  ) = {}  Report error in input ?? –No: Arg is nullable and “)” can follow it FDef  Fun ( Arg ) : Type ; Arg  ID : Type ; Arg Arg   Type  integer Type  char Type  boolean Fun  function ID function ID ( ) : char ;

Ingredients for RD parsing: nullable(X): –true if non-terminal X can produce  FIRST(  ): –set of terminal symbols that can begin any string produced by  FOLLOW(X): –set of terminal symbols t that can immediately follow X; i.e., there exists a derivation from the start symbol to a string  X t 

Computing nullable(X) for all symbols X do nullable(X) := false; repeat change := false; for every production X  s 1 … s k do if nullable(X)=false and s 1 … s k are all nullable then {nullable(X) := true; change := true;} until change = false; Z  dY   X  Y Z  XYZY  cX  a true if k=0 !

Extending nullable to strings Very straightforward: –nullable(s) = false for s  T –nullable( ε ) = true –nullable(s 1 s 2 … s k ) = nullable(s 1 )  nullable(s 2 )  …  nullable(s k )

Computing FIRST for all non-terminals X do FIRST(X) := {}; for all terminals t do FIRST(t) := {t}; repeat for every production X  s 1 … s k do { FIRST(X) := FIRST(X)  FIRST(s 1 ); for i := 1 to k-1 do if nullable(s i ) then FIRST(X) := FIRST(X)  FIRST(s i+1 ); else exit; } until no more changes; Z  dY   X  Y Z  XYZY  cX  a Y: {c} X: {a,c} Z: {a,c,d}

Extending FIRST(  ) to strings FIRST( X  ) = FIRST( X ) if not nullable( X ) FIRST( X  ) = FIRST( X )  FIRST(  ) if nullable( X ) Example: FIRST( XYZ ) = –{a,c}  FIRST( YZ ) = –{a,c}  {c}  FIRST( Z ) = –{a,c}  {c}  {a,c,d} = {a,c,d} Z  dY   X  Y Z  XYZY  cX  a Y: {c} X: {a,c} Z: {a,c,d}

Computing FOLLOW for all non-terminals X do FOLLOW(X) := {}; repeat for every non-terminal X do for each production of the form A  α X β do { FOLLOW(X) := FOLLOW(X)  FIRST( β ); if nullable( β )then FOLLOW(X) := FOLLOW(X)  FOLLOW(A); } until no more changes; Z  dY   X  Y Z  XYZY  cX  a Y: {a,c,d} X: {a,c,d} Z: {} FIRST: Y: {c} X: {a,c} Z: {a,c,d} [where α and β are possibly empty strings of terminals and non-terminals]

Constructing the parser Make table of applicable productions: –Rows: non-terminals X –Columns: terminal symbols c (= next input token) –Production X   is applicable iff  c  FIRST(  ) or Nullable(  ) and c  FOLLOW( X ) If more than one applicable production for some pair (X, c), grammar is not LL(1). c t 1 …t k... t 1 …t k X  S

acdX...Y...Z...acdX...Y...Z... acd XX  aX  YX  Y X  Y YY   Y   Y   Y  c Z Z  XYZ Z  XYZZ  XYZ Z  d Example Predictive parsing table: FOLLOW Y: {a,c,d} X: {a,c,d} Z: {} FIRST Y: {c} X: {a,c} Z: {a,c,d} Z  dY   X  Y Z  XYZY  cX  a Conflicts !! => not LL(1)

Panic Mode Error Recovery

Recursive Descent Parsing

Recursive descent parsing implements the productions directly as methods. For every non-terminal –One procedure, with a switch on next token –One case per production S  if E then S else S S  begin S L S  print E L  end L  ; S L E  num = num void S() {switch(tok) { case IF: eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break; case BEGIN: eat(BEGIN); S(); L(); break; case PRINT: eat(PRINT); E(); break; default: error(); }} void eat(ex) { if (tok == ex) tok = System.in.read(); else {System.out.print(“expected:“); System.out.print(ex);...} }

Further Reading Terence Parr, Kathleen Fisher: LL(*): the foundation of the ANTLR parser generator. PLDI 2011: ⇒ describes theory behind ANTLR