Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.

Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University

Tentative syllabus Front End Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Intermediate Representation Operational Semantics Lowering Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection 2

Previously 3 LR(0) parsing – Running the parser – Constructing transition diagram – Constructing parser table – Detecting conflicts SLR(0) – Eliminating SHIFT-REDUCE via FOLLOW sets

Agenda 4 LR(1) LALR(1) Automatic LR parser generation Handling ambiguities

Going beyond SLR(0) Some common language constructs introduce conflicts even for SLR (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L 5

S’ →  S S →  L = R S →  R L →  * R L →  id R →  L S’ → S  S → L  = R R → L  S → R  L → *  R R →  L L →  * R L →  id L → id  S → L =  R R →  L L →  * R L →  id L → * R  R → L  S → L = R  S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5 6

shift/reduce conflict S → L  = R vs. R → L  FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve conflict 7 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

Inputs requiring shift/reduce For the input id the rightmost derivation S’ => S => R => L => id requires reducing in q2 For the input id = id S’ => S => L = R => L = L => L = id => id = id requires shifting 8 S → L  = R R → L  S → L =  R R →  L L →  * R L →  id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L

LR(1) grammars In SLR: a reduce item N  α  is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N – Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of FOLLOW computed per item 9

LR(1) item N  α  β, t Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β and after reducing N we expect to see the token t 10

LR(1) items LR(1) item is a pair – LR(0) item – Lookahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example – The production L  id yields the following LR(1) items 11 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L [L → ● id] [L → id ●] LR(0) items LR(1) items

Computing Closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b  FIRST(βc) – Add [B → ● δ, b] to S 12

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 13

Back to the conflict Is there a conflict now? 14 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2

LALR(1) LR(1) tables have huge number of entries Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice – Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 15

(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (R → L ∙, $) q12 q10 R id 18

Left/Right- recursion At home: create a simple grammar with left- recursion and one with right-recursion Construct corresponding LR(0) parser – Any conflicts? Run on simple input and observe behavior – Attempt to generalize observation for long inputs 19

Example: non-LR(1) grammar 20 (1) S  Y b c $ (2) S  Z b d $ (3) Y  a (4) Z  a S  ∙ Y b c, $ S  ∙ Y b c, $ Y  ∙ a, b Z  ∙ a, b Y  a ∙, b Z  a ∙, b a reduce-reduce conflict on lookahead ‘b’

21 Automated parser generation (via CUP)CUP

High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser AST LANG.cup LANG.lex Parser.java sym.java Lexer.java (Token.java) 22

Syntax analysis with CUP CUPjavac Parser spec.javaParser AST CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer Can dump automaton and table tokens 24

CUP spec file Package and import specifications User code components Symbol (terminal and non-terminal) lists – Terminals go to sym.java – Types of AST nodes Precedence declarations The grammar – Semantic actions to construct AST 25

26 Parsing ambiguous grammars

Ambiguities a + b * c a bc * + a bc + * a + b + c a bc + + a bc + + 28

Ambiguities as conflicts for LR(1) a + b + c a bc + + a bc + + 29 a + b * c a bc * + a bc + *

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Expression Calculator – 2 nd Attempt Increasing precedence Contextual precedence 30

Parsing ambiguous grammars using precedence declarations Each terminal assigned with precedence – By default all terminals have lowest precedence – User can assign his own precedence – CUP assigns each production a precedence Precedence of rightmost terminal in production or user-specified contextual precedence On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left / right help resolve conflicts – left means reduce – right means shift More information on precedence declarations in CUP’s manualprecedence declarations 31

Resolving ambiguity (associativity) a + b + c a bc + + a bc + + precedence left PLUS 32

Resolving ambiguity (op. precedence) a + b * c a bc * + a bc + * precedence left PLUS precedence left MULT 33

Resolving ambiguity (contextual) - a * b a b * - precedence left MULT MINUS expr %prec UMINUS a - b * 34

Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Rule has precedence of UMINUS UMINUS never returned by scanner (used only to define precedence) 35

More CUP directives precedence nonassoc NEQ – Non-associative operators: == != etc. – 1<2<3 identified as an error (semantic error?) start non-terminal – Specifies start non-terminal other than first non-terminal – Can change to test parts of grammar Getting internal representation – Command line options: -dump_grammar -dump_states -dump_tables -dump 36

import java_cup.runtime.*; % %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Parser gets terminals from the scanner Scanner integration Generated from token declarations in.cup file 37

Recap Package and import specifications and user code components Symbol (terminal and non-terminal) lists – Define building-blocks of the grammar Precedence declarations – May help resolve conflicts The grammar – May introduce conflicts that have to be resolved 38

39 Abstract syntax tree construction

Symbol labels used to name variables RESULT names the left-hand side symbol non terminal Integer expr; expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; Assigning meaning 41

Abstract Syntax Trees More useful representation of syntax tree – Less clutter – Actual level of detail depends on your design Basis for semantic analysis Later annotated with various information – Type information – Computed values Technically – a class hierarchy of abstract syntax tree nodes 42

Parse tree vs. AST + expr 12+3 ()() 12 + 3 + 43

AST hierarchy example 44 int_constplusminustimesdivide expr

AST construction AST Nodes constructed during parsing – Stored in push-down stack Bottom-up parser – Grammar rules annotated with actions for AST construction – When node is constructed all children available (already constructed) – Node (RESULT) pushed on stack 45

1 + (2) + (3) expr + (expr) + (3) + expr 12+3 expr + (3) expr ()() expr + (expr) expr expr + (2) + (3) int_const val = 1 plus e1e2 int_const val = 2 int_const val = 3 plus e1e2 expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} AST construction 46

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Example of lists 47 Executed when e is shifted

Next lecture: Operational Semantics

Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.

Similar presentations

Presentation on theme: "Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.

Similar presentations

Presentation on theme: "Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University."— Presentation transcript:

Similar presentations

About project

Feedback