Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.

Slides:



Advertisements
Similar presentations
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Advertisements

Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Compiler Principles Fall Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University.
Chapter 3 Syntax Analysis
Abstract Syntax Mooly Sagiv html:// 1.
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Context-Free Grammars Lecture 7
Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom Up Parsing.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Compiler Construction Parsing II Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Bottom-up parsing Goal of parser : build a derivation
Parser construction tools: YACC
1 Abstract Syntax Tree--motivation The parse tree –contains too much detail e.g. unnecessary terminals such as parentheses –depends heavily on the structure.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
LEX and YACC work as a team
LR Parsing Compiler Baojian Hua
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Syntax and Semantics Structure of programming languages.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
Chapter 5: Bottom-Up Parsing (Shift-Reduce)
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Compilation /15a Lecture 5 1 * Syntax Analysis: Bottom-Up parsing Noam Rinetzky.
CS 614: Theory and Construction of Compilers Lecture 9 Fall 2002 Department of Computer Science University of Alabama Joel Jones.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
CS 614: Theory and Construction of Compilers Lecture 7 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Compiler Principles Winter Compiler Principles Syntax Analysis (Parsing) – Part 3 Mayer Goldberg and Roman Manevich Ben-Gurion University.
Mooly Sagiv and Roman Manevich School of Computer Science
4. Bottom-up Parsing Chih-Hung Wang
Syntax Analysis (chapter 4) SLR, LR1, LALR: Improving the parser From the previous examples: => LR0 parsing is rather weak. Cannot handle many languages.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Announcements/Reading
CUP: An LALR Parser Generator for Java
Programming Languages Translator
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Table-driven parsing Parsing performed by a finite state machine.
Chapter 4 Syntax Analysis.
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
Fall Compiler Principles Lecture 4: Parsing part 3
Parsing #2 Leonidas Fegaras.
Lecture 4: Syntax Analysis: Botom-Up Parsing Noam Rinetzky
Parsing #2 Leonidas Fegaras.
Fall Compiler Principles Lecture 4: Parsing part 3
5. Bottom-Up Parsing Chih-Hung Wang
Presentation transcript:

Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University

Tentative syllabus Front End Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Intermediate Representation Operational Semantics Lowering Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection 2

Previously 3 LR(0) parsing – Running the parser – Constructing transition diagram – Constructing parser table – Detecting conflicts SLR(0) – Eliminating SHIFT-REDUCE via FOLLOW sets

Agenda 4 LR(1) LALR(1) Automatic LR parser generation Handling ambiguities

Going beyond SLR(0) Some common language constructs introduce conflicts even for SLR (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L 5

S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S’ → S  S → L  = R R → L  S → R  L → *  R R →  L L →  * R L →  id L → id  S → L =  R R →  L L →  * R L →  id L → * R  R → L  S → L = R  S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5 6

shift/reduce conflict S → L  = R vs. R → L  FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve conflict 7 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

Inputs requiring shift/reduce For the input id the rightmost derivation S’ => S => R => L => id requires reducing in q2 For the input id = id S’ => S => L = R => L = L => L = id => id = id requires shifting 8 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

LR(1) grammars In SLR: a reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N – Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of FOLLOW computed per item 9

LR(1) item N  α  β, t Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β and after reducing N we expect to see the token t 10

LR(1) items LR(1) item is a pair – LR(0) item – Lookahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example – The production L  id yields the following LR(1) items 11 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L [L → ● id] [L → id ●] LR(0) items LR(1) items

Computing Closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b  FIRST(βc) – Add [B → ● δ, b] to S 12

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 13

Back to the conflict Is there a conflict now? 14 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2

LALR(1) LR(1) tables have huge number of entries Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice – Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 15

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 16

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 17

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (R → L ∙, $) q12 q10 R id 18

Left/Right- recursion At home: create a simple grammar with left- recursion and one with right-recursion Construct corresponding LR(0) parser – Any conflicts? Run on simple input and observe behavior – Attempt to generalize observation for long inputs 19

Example: non-LR(1) grammar 20 (1) S  Y b c $ (2) S  Z b d $ (3) Y  a (4) Z  a S  ∙ Y b c, $ S  ∙ Y b c, $ Y  ∙ a, b Z  ∙ a, b Y  a ∙, b Z  a ∙, b a reduce-reduce conflict on lookahead ‘b’

21 Automated parser generation (via CUP)CUP

High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser AST LANG.cup LANG.lex Parser.java sym.java Lexer.java (Token.java) 22

Expression calculator expr  expr + expr | expr - expr | expr * expr | expr / expr | - expr | ( expr ) | number Goals of expression calculator parser: Is a valid expression? What is the meaning (value) of this expression? 23

Syntax analysis with CUP CUPjavac Parser spec.javaParser AST CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer Can dump automaton and table tokens 24

CUP spec file Package and import specifications User code components Symbol (terminal and non-terminal) lists – Terminals go to sym.java – Types of AST nodes Precedence declarations The grammar – Semantic actions to construct AST 25

26 Parsing ambiguous grammars

Expression Calculator – 1 st Attempt terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ; Symbol type explained later 27

Ambiguities a + b * c a bc * + a bc + * a + b + c a bc + + a bc

Ambiguities as conflicts for LR(1) a + b + c a bc + + a bc a + b * c a bc * + a bc + *

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Expression Calculator – 2 nd Attempt Increasing precedence Contextual precedence 30

Parsing ambiguous grammars using precedence declarations Each terminal assigned with precedence – By default all terminals have lowest precedence – User can assign his own precedence – CUP assigns each production a precedence Precedence of rightmost terminal in production or user-specified contextual precedence On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left / right help resolve conflicts – left means reduce – right means shift More information on precedence declarations in CUP’s manualprecedence declarations 31

Resolving ambiguity (associativity) a + b + c a bc + + a bc + + precedence left PLUS 32

Resolving ambiguity (op. precedence) a + b * c a bc * + a bc + * precedence left PLUS precedence left MULT 33

Resolving ambiguity (contextual) - a * b a b * - precedence left MULT MINUS expr %prec UMINUS a - b * 34

Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Rule has precedence of UMINUS UMINUS never returned by scanner (used only to define precedence) 35

More CUP directives precedence nonassoc NEQ – Non-associative operators: == != etc. – 1<2<3 identified as an error (semantic error?) start non-terminal – Specifies start non-terminal other than first non-terminal – Can change to test parts of grammar Getting internal representation – Command line options: -dump_grammar -dump_states -dump_tables -dump 36

import java_cup.runtime.*; % %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Parser gets terminals from the scanner Scanner integration Generated from token declarations in.cup file 37

Recap Package and import specifications and user code components Symbol (terminal and non-terminal) lists – Define building-blocks of the grammar Precedence declarations – May help resolve conflicts The grammar – May introduce conflicts that have to be resolved 38

39 Abstract syntax tree construction

Assigning meaning So far, only validation Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; 40

Symbol labels used to name variables RESULT names the left-hand side symbol non terminal Integer expr; expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; Assigning meaning 41

Abstract Syntax Trees More useful representation of syntax tree – Less clutter – Actual level of detail depends on your design Basis for semantic analysis Later annotated with various information – Type information – Computed values Technically – a class hierarchy of abstract syntax tree nodes 42

Parse tree vs. AST + expr 12+3 ()()

AST hierarchy example 44 int_constplusminustimesdivide expr

AST construction AST Nodes constructed during parsing – Stored in push-down stack Bottom-up parser – Grammar rules annotated with actions for AST construction – When node is constructed all children available (already constructed) – Node (RESULT) pushed on stack 45

1 + (2) + (3) expr + (expr) + (3) + expr 12+3 expr + (3) expr ()() expr + (expr) expr expr + (2) + (3) int_const val = 1 plus e1e2 int_const val = 2 int_const val = 3 plus e1e2 expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} AST construction 46

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Example of lists 47 Executed when e is shifted

Next lecture: Operational Semantics