Download presentation
Presentation is loading. Please wait.
Published byJessica Washington Modified over 8 years ago
1
Compiler Principles Fall 2015-2016 Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University
2
Tentative syllabus Front End Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Intermediate Representation Operational Semantics Lowering Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection 2
3
Previously 3 LR(0) parsing – Running the parser – Constructing transition diagram – Constructing parser table – Detecting conflicts SLR(0) – Eliminating SHIFT-REDUCE via FOLLOW sets
4
Agenda 4 LR(1) LALR(1) Automatic LR parser generation Handling ambiguities
5
Going beyond SLR(0) Some common language constructs introduce conflicts even for SLR (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L 5
6
S’ → S S → L = R S → R L → * R L → id R → L S’ → S S → L = R R → L S → R L → * R R → L L → * R L → id L → id S → L = R R → L L → * R L → id L → * R R → L S → L = R S L R id * = R * R L * L q0 q4 q7 q1 q3 q9 q6 q8 q2 q5 6
7
shift/reduce conflict S → L = R vs. R → L FOLLOW(R) contains = – S ⇒ L = R ⇒ * R = R SLR cannot resolve conflict 7 S → L = R R → L S → L = R R → L L → * R L → id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L
8
Inputs requiring shift/reduce For the input id the rightmost derivation S’ => S => R => L => id requires reducing in q2 For the input id = id S’ => S => L = R => L = L => L = id => id = id requires shifting 8 S → L = R R → L S → L = R R → L L → * R L → id = q6 q2 (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L
9
LR(1) grammars In SLR: a reduce item N α is applicable only when the lookahead is in FOLLOW(N) But FOLLOW(N) merges lookahead for all alternatives for N – Insensitive to the context of a given production LR(1) keeps lookahead with each LR item Idea: a more refined notion of FOLLOW computed per item 9
10
LR(1) item N α β, t Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β and after reducing N we expect to see the token t 10
11
LR(1) items LR(1) item is a pair – LR(0) item – Lookahead token Meaning – We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example – The production L id yields the following LR(1) items 11 [L → ● id, *] [L → ● id, =] [L → ● id, id] [L → ● id, $] [L → id ●, *] [L → id ●, =] [L → id ●, id] [L → id ●, $] (0) S’ → S (1) S → L = R (2) S → R (3) L → * R (4) L → id (5) R → L [L → ● id] [L → id ●] LR(0) items LR(1) items
12
Computing Closure for LR(1) For every [A → α ● Bβ, c] in S – for every production B→δ and every token b in the grammar such that b FIRST(βc) – Add [B → ● δ, b] to S 12
13
(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 13
14
Back to the conflict Is there a conflict now? 14 (S → L ∙ = R, $) (R → L ∙, $) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) = q6 q2
15
LALR(1) LR(1) tables have huge number of entries Often don’t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice – Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 15
16
(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 16
17
(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (R → L ∙, $) (L → * R ∙, $) q11 q12 q10 R q13 id 17
18
(S’ → ∙ S, $) (S → ∙ L = R, $) (S → ∙ R, $) (L → ∙ * R, = ) (L → ∙ id, = ) (R → ∙ L, $ ) (L → ∙ id, $ ) (L → ∙ * R, $ ) (S’ → S ∙, $) (S → L ∙ = R, $) (R → L ∙, $) (S → R ∙, $) (L → * ∙ R, =) (R → ∙ L, =) (L → ∙ * R, =) (L → ∙ id, =) (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → id ∙, $) (L → id ∙, =) (S → L = ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (L → * R ∙, =) (L → * R ∙, $) (R → L ∙, =) (R → L ∙, $) (S → L = R ∙, $) S L R id * = R * R L * L q0 q4 q5 q7 q6 q9 q3 q1 q2 q8 (L → * ∙ R, $) (R → ∙ L, $) (L → ∙ * R, $) (L → ∙ id, $) (R → L ∙, $) q12 q10 R id 18
19
Left/Right- recursion At home: create a simple grammar with left- recursion and one with right-recursion Construct corresponding LR(0) parser – Any conflicts? Run on simple input and observe behavior – Attempt to generalize observation for long inputs 19
20
Example: non-LR(1) grammar 20 (1) S Y b c $ (2) S Z b d $ (3) Y a (4) Z a S ∙ Y b c, $ S ∙ Y b c, $ Y ∙ a, b Z ∙ a, b Y a ∙, b Z a ∙, b a reduce-reduce conflict on lookahead ‘b’
21
21 Automated parser generation (via CUP)CUP
22
High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser AST LANG.cup LANG.lex Parser.java sym.java Lexer.java (Token.java) 22
23
Expression calculator expr expr + expr | expr - expr | expr * expr | expr / expr | - expr | ( expr ) | number Goals of expression calculator parser: Is 2+3+4+5 a valid expression? What is the meaning (value) of this expression? 23
24
Syntax analysis with CUP CUPjavac Parser spec.javaParser AST CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer Can dump automaton and table tokens 24
25
CUP spec file Package and import specifications User code components Symbol (terminal and non-terminal) lists – Terminals go to sym.java – Types of AST nodes Precedence declarations The grammar – Semantic actions to construct AST 25
26
26 Parsing ambiguous grammars
27
Expression Calculator – 1 st Attempt terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ; Symbol type explained later 27
28
Ambiguities a + b * c a bc * + a bc + * a + b + c a bc + + a bc + + 28
29
Ambiguities as conflicts for LR(1) a + b + c a bc + + a bc + + 29 a + b * c a bc * + a bc + *
30
terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Expression Calculator – 2 nd Attempt Increasing precedence Contextual precedence 30
31
Parsing ambiguous grammars using precedence declarations Each terminal assigned with precedence – By default all terminals have lowest precedence – User can assign his own precedence – CUP assigns each production a precedence Precedence of rightmost terminal in production or user-specified contextual precedence On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left / right help resolve conflicts – left means reduce – right means shift More information on precedence declarations in CUP’s manualprecedence declarations 31
32
Resolving ambiguity (associativity) a + b + c a bc + + a bc + + precedence left PLUS 32
33
Resolving ambiguity (op. precedence) a + b * c a bc * + a bc + * precedence left PLUS precedence left MULT 33
34
Resolving ambiguity (contextual) - a * b a b * - precedence left MULT MINUS expr %prec UMINUS a - b * 34
35
Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Rule has precedence of UMINUS UMINUS never returned by scanner (used only to define precedence) 35
36
More CUP directives precedence nonassoc NEQ – Non-associative operators: == != etc. – 1<2<3 identified as an error (semantic error?) start non-terminal – Specifies start non-terminal other than first non-terminal – Can change to test parts of grammar Getting internal representation – Command line options: -dump_grammar -dump_states -dump_tables -dump 36
37
import java_cup.runtime.*; % %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ % ”+” { return new Symbol(sym.PLUS); } ”-” { return new Symbol(sym.MINUS); } ”*” { return new Symbol(sym.MULT); } ”/” { return new Symbol(sym.DIV); } ”(” { return new Symbol(sym.LPAREN); } ”)” { return new Symbol(sym.RPAREN); } {NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } \n { }. { } Parser gets terminals from the scanner Scanner integration Generated from token declarations in.cup file 37
38
Recap Package and import specifications and user code components Symbol (terminal and non-terminal) lists – Define building-blocks of the grammar Precedence declarations – May help resolve conflicts The grammar – May introduce conflicts that have to be resolved 38
39
39 Abstract syntax tree construction
40
Assigning meaning So far, only validation Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; 40
41
Symbol labels used to name variables RESULT names the left-hand side symbol non terminal Integer expr; expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; Assigning meaning 41
42
Abstract Syntax Trees More useful representation of syntax tree – Less clutter – Actual level of detail depends on your design Basis for semantic analysis Later annotated with various information – Type information – Computed values Technically – a class hierarchy of abstract syntax tree nodes 42
43
Parse tree vs. AST + expr 12+3 ()() 12 + 3 + 43
44
AST hierarchy example 44 int_constplusminustimesdivide expr
45
AST construction AST Nodes constructed during parsing – Stored in push-down stack Bottom-up parser – Grammar rules annotated with actions for AST construction – When node is constructed all children available (already constructed) – Node (RESULT) pushed on stack 45
46
1 + (2) + (3) expr + (expr) + (3) + expr 12+3 expr + (3) expr ()() expr + (expr) expr expr + (2) + (3) int_const val = 1 plus e1e2 int_const val = 2 int_const val = 3 plus e1e2 expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} AST construction 46
47
terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr_list ::= expr_list expr_part | expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Example of lists 47 Executed when e is shifted
48
Next lecture: Operational Semantics
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.