Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages.

Slides:



Advertisements
Similar presentations
Compiler Designs and Constructions
Advertisements

Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
CS 31003: Compilers  Difference between SLR and LR(1)  Construction of LR(1) parsing table  LALR parser Bandi Sumanth 11CS30006 Date : 9/10/2013.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
Bhaskar Bagchi (11CS10058) Lecture Slides( 9 th Sept. 2013)
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
1 Bottom Up Parsing. 2 Bottom-Up Parsing l Bottom-up parsing is more general than top-down parsing »And just as efficient »Builds on ideas in top-down.
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Context-Free Grammars Lecture 7
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Bottom Up Parsing.
1 LR parsing techniques SLR (not in the book) –Simple LR parsing –Easy to implement, not strong enough –Uses LR(0) items Canonical LR –Larger parser but.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Bottom-up parsing Goal of parser : build a derivation
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
LALR Parsing Canonical sets of LR(1) items
1 Languages and Compilers (SProg og Oversættere) Parsing.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
LESSON 24.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
4. Bottom-up Parsing Chih-Hung Wang
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
CS 3304 Comparative Languages
Announcements/Reading
A Simple Syntax-Directed Translator
Chapter 2 :: Programming Language Syntax
LALR Parsing Canonical sets of LR(1) items
Bottom-Up Syntax Analysis
Fall Compiler Principles Lecture 4: Parsing part 3
Parsing #2 Leonidas Fegaras.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
R.Rajkumar Asst.Professor CSE
LALR Parsing Adapted from Notes by Profs Aiken and Necula (UCB) and
CS 3304 Comparative Languages
Compiler Construction
CS 3304 Comparative Languages
Parsing #2 Leonidas Fegaras.
Fall Compiler Principles Lecture 4: Parsing part 3
Chapter 2 :: Programming Language Syntax
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Chapter 2 :: Programming Language Syntax
Presentation transcript:

Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages Why? And how can we improve it? S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] T ::= i T ::= i [ E ] Example (recap) has an LR0 shift reduce conflict in state: Q: How could we decide between the two actions?

Syntax Analysis (chapter 4) SLR: an improved LR0 parser LR0 parsing is rather weak. Cannot handle many languages Why? Because it uses a lookahead of 0 tokens to determine the next action! (i.e. parser-action decision only based on the parser state) S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] T ::= i T ::= i [ E ] Example (recap) Example: Which tokens can follow a T ? SLR parsing: use the LR0 GOTO table, but... only reduce to a non-terminal if next input symbol can follow such symbol.

Syntax Analysis (chapter 4) LR1: LR items with lookahead information SLR parsing is better than LR0, but still rather weak. Still cannot handle many languages. Why? Lookahead decision is based on what can follow a particular non- terminal anywhere but does not take context into account. LR 1 parsing: similar to LR0, but based on LR 1 items. An LR 1 item looks like this: N ::=   {lookahead set} In the next slides we’ll construct a small part of the LR1 items of our simple example expression language as we discuss a parsing example.

Syntax Analysis (chapter 4) Notation annoyances LR1 items have three different concrete syntaxes: Why? Historical accident/different views What to do? Live with it! N ::=   { x y z } N ::=  , x y z N ::=  , x N ::=  , y N ::=  , z

Syntax Analysis (chapter 4) and epsilon closure LR1: Example Q: what’s the starting set s0 of the handler matching DFA? S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] S ::= E $ E ::= T | E + T T ::= i | ( E ) | i [ E ] Recap: the grammar S ::= E $ {$} E ::= T {$} E ::= E + T {$} T ::= i {$} T ::= ( E ) {$} T ::= i [ E ] {$}

Syntax Analysis (chapter 4) Algorithm: For each item of the form M ::=  N  {  } find each grammar production of the form N ::=  and add a new item N ::=  starters[  {  }]} Repeat this until no more new items can be added. Epsilon Closure for LR 1 Items

Syntax Analysis (chapter 4)  starters[  {  }]}???  starters[  {  }]} is the union over all elements s in  of starters[  s] Example:  starters[  {x y z}]} =  starters[  x]   starters[  y]  starters[  z] Note that these lookahead sets will always h ave at least one terminal in them. Why?

Syntax Analysis (chapter 4) LR1: Parsing Example S ::= E $ {$} E ::= T {$+} E ::= E + T {$+} T ::= i {$+} T ::= ( E ) {$+} T ::= i [ E ] {$+} s0: Q: what state do we arrive in after shifting an i token? s0 i[i]$

Syntax Analysis (chapter 4) LR1: Example T ::= i {$+} T ::= i [ E ] {$+} s0 i[i]$ s1 s1: Q: shift or reduce? (Is there a shift reduce conflict here?)

Syntax Analysis (chapter 4) LR1: Example T ::= i {$+} T ::= i [ E ] {$+} s0 i[i]$ s1 s1: s2 T ::= i [ E ] {$+} s2:

Syntax Analysis (chapter 4) LR1: Example s0 i[i]$ s1 s2: s2 s3: T ::= i [ E ] {$+} E ::= T {]} E ::= E + T {]} T ::= i {]} T ::= ( E ) {]} T ::= i [ E ] {]} s3 T ::= i {]} T ::= i [ E ] {]}

Syntax Analysis (chapter 4) LR1: Example s0 i[ T ]$ s1 s2: s2 s4: T ::= i [ E ] {$+} E ::= T {]} E ::= E + T {]} T ::= i {]} T ::= ( E ) {]} T ::= i [ E ] {]} s4 E ::= T {]} i

Syntax Analysis (chapter 4) LR1: Example s0 i[ E ]$ s1 s2: s2 s5: T ::= i [ E ] {$+} E ::= T {]} E ::= E + T {]} T ::= i {]} T ::= ( E ) {]} T ::= i [ E ] {]} s5 T ::= i [ E ] {$+} E ::= E + T {]} i T

Syntax Analysis (chapter 4) LR1: Example s0 i[ E ]$ s1 s5: s2s5 T ::= i [ E ] {$+} E ::= E + T {]} i T s6 s6: T ::= i [ E ] {$+}

Syntax Analysis (chapter 4) LR1: Example s0 T $ s7 i T E []i S ::= E {$} E ::= T {$+} E ::= E + T {$+} T ::= i {$+} T ::= ( E ) {$+} T ::= i [ E ] {$+} s0: E ::= T {$+} s7:

Syntax Analysis (chapter 4) LR1: Example s0 E $ s8 i T E []i S ::= E {$} E ::= T {$+} E ::= E + T {$+} T ::= i {$+} T ::= ( E ) {$+} T ::= i [ E ] {$+} s0: S ::= E {$} E ::= E + T {$+} s8: T

Syntax Analysis (chapter 4) LR1: Example s0 S $ E []i T E T i

Syntax Analysis (chapter 4) LR1 and LALR LR1 parsers are very powerful (it can be theoretically proven that they are the most powerful bottom-up parsers possible with one lookahead token!). But… they have very big parsing tables. (For normal programming language, order of magnitude is megabytes!) SLR, LR0 only require order of magnitudes of 100Kb. (but are not very strong). LALR comes to the rescue! The LALR algorithm is based on LR1 but reduces the number of states of the automaton => less memory (the same number as LR0 and SLR)

Syntax Analysis (chapter 4) LR1 and LALR S ::= A | xB A ::= aAb | B B ::= x S ::= A | xB A ::= aAb | B B ::= x Example Gramar: The LR0 automaton: picture handed out in class The LR1 automaton: picture handed out in class The LALR1 automaton: construct this yourself

Syntax Analysis (chapter 4) LR1 and LALR The LR1 automaton: picture handed out in class Note the automaton has several states which look very similar: The states are identical except for the lookahead sets. Definition: The core LR 0 set of an LR 1 item set, is the set of LR 0 items obtained by removing the lookaheads of the LR 1 items. Example: T ::= i [ E ] {$+} E ::= E + T {]} T ::= i [ E ] E ::= E + T LR 1 items: Core LR 0 items:

Syntax Analysis (chapter 4) LR1 and LALR LALR automaton can be obtained from an LR 1 automaton by “merging” all states which have the same core items into a single state. => LALR automaton has precisely the same number of states as an LR0 automaton! It is possible this introduces conflicts but... In practice it almost never does! LALR is now the most widely used algorithm.

Syntax Analysis (chapter 4) Parser Conflict Resolution Most programming language grammars are LR 1. But, in practice, one still encounters grammars which have parsing conflicts. => a common cause is an ambiguous grammar Ambiguous grammars always have parsing conflicts (because they are ambiguous this is just unavoidable). In practice, parser generators still generate a parser for such grammars, using a “resolution rule” to resolve parsing conflicts deterministically. => The resolution rule may or may not do what you want/expect => You will get a warning message. If you know what you are doing this can be ignored. Otherwise => try to solve the conflict by disambiguating the grammar.

Syntax Analysis (chapter 4) Parser Conflict Resolution Example: (from Mini-triangle grammar) single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command if a then if b then c1 else c2 This parse tree?

Syntax Analysis (chapter 4) Parser Conflict Resolution Example: (from Mini-triangle grammar) single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command if a then if b then c1 else c2 or this one ?

Syntax Analysis (chapter 4) Parser Conflict Resolution Example: “dangling-else” problem (from Mini-triangle grammar) single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= if Expression then single-Command | if Expression then single-Command else single-Command sC ::= if E then sC {… else …} sC ::= if E then sC else sC {…} LR1 items (in some state of the parser) Resolution rule: shift has priority over reduce. Q: Does this resolution rule solve the conflict? What is its effect on the parse tree?

Syntax Analysis (chapter 4) Parser Conflict Resolution There is usually also a resolution rule for reduce reduce conflicts, for example the rule which appears first in the grammar description has priority. Reduce-reduce conflicts usually mean there is a real problem with your grammar. => You need to fix it! Don’t rely on the resolution rule!

Syntax Analysis (chapter 4) JavaCUP: A LALR generator for Java Grammar BNF-like Specification JavaCUP Java File: Parser Class Uses Scanner to get Tokens Parses Stream of Tokens Definition of tokens Regular Expressions JFlex Java File: Scanner Class Recognizes Tokens Syntactic Analyzer

Syntax Analysis (chapter 4) Example: Mini Scheme Parser Example: An implementation of a simplistic Scheme parser with CUP and Flex. 1) AST Node representation 2) Flex Scanner 3) Cup Parser

Syntax Analysis (chapter 4) Example: Mini Scheme Parser 1) Mini Scheme AST Node representation public abstract class Sexpr { public static SAtom nil = new SAtom(“()”); public SPair cons(Sexpr cdr) { return new SPair(... } } public class SAtom extends Sexpr { private String lexeme; public SAtom(String s) { lexeme = s; }... } public class SPair extends Sexpr { private Sexpr car,cdr;... } public abstract class Sexpr { public static SAtom nil = new SAtom(“()”); public SPair cons(Sexpr cdr) { return new SPair(... } } public class SAtom extends Sexpr { private String lexeme; public SAtom(String s) { lexeme = s; }... } public class SPair extends Sexpr { private Sexpr car,cdr;... }

Syntax Analysis (chapter 4) Example: Mini Scheme Parser 2) Flex Scanner... blah blah... Ident = {ALPHA}({ALPHA}|{DIGIT}|_)* % "(" { return token(sym.LPAREN); } ")" { return token(sym.RPAREN); } "'" { return token(sym.QUOTE); } "." { return token(sym.PERIOD); } "+" { return token(sym.PLUS); }... blah blah... {DIGIT}+ { return token(sym.NUMBER); } {Ident} { return token(sym.IDENTIFIER); }... blah blah... Ident = {ALPHA}({ALPHA}|{DIGIT}|_)* % "(" { return token(sym.LPAREN); } ")" { return token(sym.RPAREN); } "'" { return token(sym.QUOTE); } "." { return token(sym.PERIOD); } "+" { return token(sym.PLUS); }... blah blah... {DIGIT}+ { return token(sym.NUMBER); } {Ident} { return token(sym.IDENTIFIER); }... blah blah...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser /* Simplified Scheme parser for CUP. * Copyright (C) 2000 * Norman C. Hutchinson * * Modifications * Kris De Volder * => somewhat more object oriented representation * for Sexprs */ parser code {:... declarations to be added in generated parser... :};... parser definitions... /* Simplified Scheme parser for CUP. * Copyright (C) 2000 * Norman C. Hutchinson * * Modifications * Kris De Volder * => somewhat more object oriented representation * for Sexprs */ parser code {:... declarations to be added in generated parser... :};... parser definitions... 3) Cup Parser

Syntax Analysis (chapter 4) Example: Mini Scheme Parser... parser code {: // This code is inserted in generated parser S canner lexer; public parser(Scanner l) { this(); lexer=l; }... blah blah... :}; scan with {: return lexer.next_token(); :};... parser definitions continued next page parser code {: // This code is inserted in generated parser S canner lexer; public parser(Scanner l) { this(); lexer=l; }... blah blah... :}; scan with {: return lexer.next_token(); :};... parser definitions continued next page...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser scan with {: return lexer.next_token(); :}; terminal Token IDENTIFIER; terminal Token MULT, EQ, LPAREN, RPAREN; terminal Token PLUS, MINUS, DIV; terminal Token LT, GT, LTEQ, GTEQ; terminal Token NOTEQ; terminal Token QUOTE; terminal Token NUMBER; terminal Token PERIOD; non terminal goal; non terminal Sexpr sexpr, sexprlist;... parser definitions continued next page... scan with {: return lexer.next_token(); :}; terminal Token IDENTIFIER; terminal Token MULT, EQ, LPAREN, RPAREN; terminal Token PLUS, MINUS, DIV; terminal Token LT, GT, LTEQ, GTEQ; terminal Token NOTEQ; terminal Token QUOTE; terminal Token NUMBER; terminal Token PERIOD; non terminal goal; non terminal Sexpr sexpr, sexprlist;... parser definitions continued next page...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser non terminal Sexpr sexpr, sexprlist; start with goal; goal ::= | goal sexpr:s {: System.out.println(s.toString()); :} ; sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :} | IDENTIFIER:i {: RESULT = new SAtom(i.text); :} | MULT:i {: RESULT = new SAtom(i.text); :} parser definitions continued next page... non terminal Sexpr sexpr, sexprlist; start with goal; goal ::= | goal sexpr:s {: System.out.println(s.toString()); :} ; sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :} | IDENTIFIER:i {: RESULT = new SAtom(i.text); :} | MULT:i {: RESULT = new SAtom(i.text); :} parser definitions continued next page...

Syntax Analysis (chapter 4) Example: Mini Scheme Parser sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :}... blah blah... | QUOTE sexpr:s {: RESULT = new SAtom("quote").cons(s.cons(Sexpr.nil)); :} | LPAREN sexpr:s sexprlist:sl RPAREN {: RESULT = s.cons(sl); :} | LPAREN RPAREN {: RESULT = Sexpr.nil; :} | LPAREN sexpr:left PERIOD sexpr:right RPAREN {: RESULT = left.cons(right); :} ; sexprlist ::= /*epsilon*/ {: RESULT = Sexpr.nil; :} | sexpr:l sexprlist:r {: RESULT = l.cons(r); :} ; sexpr ::= NUMBER:i {: RESULT = new SAtom(i.text); :}... blah blah... | QUOTE sexpr:s {: RESULT = new SAtom("quote").cons(s.cons(Sexpr.nil)); :} | LPAREN sexpr:s sexprlist:sl RPAREN {: RESULT = s.cons(sl); :} | LPAREN RPAREN {: RESULT = Sexpr.nil; :} | LPAREN sexpr:left PERIOD sexpr:right RPAREN {: RESULT = left.cons(right); :} ; sexprlist ::= /*epsilon*/ {: RESULT = Sexpr.nil; :} | sexpr:l sexprlist:r {: RESULT = l.cons(r); :} ;

Syntax Analysis (chapter 4) Example: Mini Scheme Parser public static void main(String argv[]) { try { //try to scan and parse input files for (int i = 0; i < argv.length; i++) { Scanner s; parser p;... s = new Scanner(...get input stream... ); p = new parser(s);... p.parse(); } catch... all kinds of nasty exceptions... } public static void main(String argv[]) { try { //try to scan and parse input files for (int i = 0; i < argv.length; i++) { Scanner s; parser p;... s = new Scanner(...get input stream... ); p = new parser(s);... p.parse(); } catch... all kinds of nasty exceptions... } Driver class