Download presentation
Presentation is loading. Please wait.
Published byRandolf Chandler Modified over 8 years ago
1
Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Jeff Ullman, Eran Yahav
2
Admin Next week: Trubowicz 101 (Law school) Mobiles... 2
3
What is a Compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 3
4
Conceptual Structure of a Compiler Executable code exe Source text txt Semantic Representation Backend Compiler Frontend Lexical Analysis Syntax Analysis Parsing Semantic Analysis Intermediate Representation (IR) Code Generation wordssentences 4
5
From scanning to parsing ((23 + 7) * x) )x*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: E ... | Id Id ‘a’ |... | ‘z’ Op(*) Id(b) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error 5
6
Context Free Grammars V – non terminals (syntactic variables) T – terminals (tokens) P – derivation rules –Each rule of the form V (T V)* S – start symbol G = (V,T,P,S) 6
7
CFG terminology Symbols: Terminals (tokens): ; := ( ) id num print Non-terminals: S E L Start non-terminal: S Convention: the non-terminal appearing in the first derivation rule Grammar productions (rules) N μ S S ; S S id := E E id E num E E + E 7
8
CFG terminology Derivation - a sequence of replacements of non-terminals using the derivation rules Language - the set of strings of terminals derivable from the start symbol Sentential form - the result of a partial derivation in which there may be non- terminals 8
9
Parse Tree S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := E + id ; id := idid := id + id ; x:= z;y := x + z S S S S ; ; S S id := E E id := E E E E + + E E id 9
10
Leftmost/rightmost Derivation Leftmost derivation –always expand leftmost non-terminal Rightmost derivation –Always expand rightmost non-terminal Allows us to describe derivation by listing the sequence of rules –always know what a rule is applied to 10
11
Leftmost Derivation x := z; y := x + z S S;S S id := E E id | E + E | E * E | ( E ) S SS ; id := ES ; id := idS ; id := E ; id := id id := E + E ; id := id id := id + E ; id := id id := id + id ; S S;S S id := E E id S id := E E E + E E id x := z ; y := x + z 11
12
Broad kinds of parsers Parsers for arbitrary grammars –Earley’s method, CYK method –Usually, not used in practice (though might change) Top-Down parsers –Construct parse tree in a top-down matter –Find the leftmost derivation Linear Bottom-Up parsers –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order 12
13
Intuition: Top-Down Parsing Begin with start symbol “Guess” the productions Check if parse tree yields user's program 13
14
Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) +*321 FFF T T E T E 14 Recall: Non standard precedence …
15
We need this rule to get the * +*321 E 15 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
16
+*321 E T E 16 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
17
+*321 T E T E 17 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
18
+*321 F T T E T E 18 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
19
+*321 FF T T E T E 19 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
20
+*321 FF T T E E 20 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) T
21
+*321 FF T T E T E 21 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
22
+*321 FFF T T E T E 22 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
23
+*321 FFF T T E T E 23 Intuition: Top-Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E )
24
Challenges in top-down parsing Top-down parsing begins with virtually no information –Begins with just the start symbol, which matches every program How can we know which productions to apply? 24
25
Blind exhaustive search –Goes over all possible production rules –Read & parse prefixes of input –Backtracks if guesses wrong Implementation –Uses (possibly recursive) functions for every production rule –Backtracks “rewind” input 25 Recursive Descent
26
Recursive descent bool A(){ // A A 1 | … | A n pos = recordCurrentPosition(); for (i = 1; i ≤ n; i++){ if (A i ()) return true; rewindCurrent(pos); } return false; } bool A i (){ // A i = X 1 X 2 …X k for (j=1; j ≤ k; j++) if (X j is a terminal) if (X j == current) match(current); else return false; else if (! X j ()) return false; return true; } 26 )x*)7+23(( RPIdOPRPNumOPNumLP token stream current
27
Example Grammar –E T | T + E –T int | int * T | ( E ) Input: ( 5 ) Token stream: LP int RP 27
28
int E() { return E() && match(token(‘-’)) && term(); } E E - term | term What happens with this procedure? Recursive descent parsers cannot handle left-recursive grammars p. 127 28 Problem: Left Recursion
29
Left recursion removal L(G 1 ) = β, βα, βαα, βααα, … L(G 2 ) = same N Nα | β N βN’ N’ αN’ | G1G1 G2G2 E E - term | term E term TE | term TE - term TE | For our 3 rd example: p. 130 Can be done algorithmically. Problem: grammar becomes mangled beyond recognition 29
30
Challenges in top-down parsing Top-down parsing begins with virtually no information –Begins with just the start symbol, which matches every program How can we know which productions to apply? Wanted: Top-Down parsing without backtracking 30
31
Predictive parsing Given a grammar G and a word w derive w using G –Apply production to leftmost nonterminal –Pick production rule based on next input token General grammar –More than one option for choosing the next production based on a token Restricted grammars (LL) –Know exactly which single rule to apply based on Non terminal Next (k) tokens (lookahead) 31
32
Boolean expressions example E LIT | (E OP E) | not E LIT true | false OP and | or | xor 32 not ( not true or false )
33
Boolean expressions example not ( not true or false ) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E E (EOPE) notLITorLIT truefalse production to apply known from next token E LIT | (E OP E) | not E LIT true | false OP and | or | xor 33
34
Cannot tell which rule to use based on lookahead (ID) term ID | ID [ expr ] Problem: productions with common prefix 34
35
Solution: left factoring Rewrite the grammar to be in LL(1) Intuition: just like factoring x*y + x*z into x*(y+z) term ID | ID [ expr ] term ID after_ID After_ID [ expr ] | 35
36
S if E then S else S | if E then S | T S if E then S S’ | T S’ else S | Left factoring – another example 36
37
LL(k) Parsers Predictive parser –Can be generated automatically –Does not use recursion –Efficient In contrast, recursive descent –Manual construction –Recursive –Expensive 37
38
Pushdown automaton uses –Prediction stack –Input token stream –Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t LL(k) parsing via pushdown automata and prediction table 38
39
()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Which rule should be used Example transition table 39 Table
40
Non-Recursive Predictive Parser Predictive Parsing program Parsing Table X Y Z $ Stack $b+a Output 40
41
LL(k) parsing via pushdown automata and prediction table Two possible moves –Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error –Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error Parsing terminates when prediction stack is empty –If input is empty at that point, success. Otherwise, syntax error 41
42
abc AA aAbA c A aAb | c aacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success Running parser example 42
43
FIRST sets FIRST(α) = { t | α * t β} ∪ {X * ℇ } –FIRST(α) = all terminals that α can appear as first in some derivation for α + ℇ if can be derived from X Example: –FIRST( LIT ) = { true, false } –FIRST( ( E OP E ) ) = { ‘(‘ } –FIRST( not E ) = { not } 43
44
Computing FIRST sets FIRST (t) = { t } // “t” non terminal ℇ∈ FIRST(X) if –X ℇ or –X A 1.. A k and ℇ∈ FIRST(A i ) i=1…k FIRST(α) ⊆ FIRST(X) if –X A 1.. A k α and ℇ∈ FIRST(A i ) i=1…k 44
45
FIRST sets computation example STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM | zero? TERM | not EXPR | ++ id | -- id TERM id | constant TERMEXPRSTMT 45
46
1. Initialization TERMEXPRSTMT id constant zero? Not ++ -- if while 46 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
47
TERMEXPRSTMT id constant zero? Not ++ -- if while zero? Not ++ -- 47 2. Iterate 1 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
48
2. Iterate 2 TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- 48 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
49
2. Iterate 3 – fixed-point TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- id constant 49 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR TERM | zero? TERM | not EXPR | ++ id | -- id TERM id | constant
50
FOLLOW sets FOLLOW(X) = { t | S * α X t β} FOLLOW(X) = set of tokens that can immediately follow X in some sentential form NOT related to what can be derived from X Intuition: X A B –FIRST(B) ⊆ FOLLOW(A) –FOLLOW(B) ⊆ FOLLOW(X) –If B * then FOLLOW(A) ⊆ FOLLOW(X) p. 189 50
51
FOLLOW sets: Constraints $ ∈ FOLLOW(S) FIRST(β) – { ℇ } ⊆ FOLLOW(X) –For each A α X β FOLLOW(A) ⊆ FOLLOW(X) –For each A α X β and ℇ ∈ FIRST(β) 51
52
Example: FOLLOW sets E TX X + E | ℇ T (E) | int Y Y * T | ℇ 52 Terminal+(*)int FOLLOWint, ( _, ), $*, ), +, $ Non. Term. ETXY FOLLOW), $+, ), $$, )_, ), $
53
Prediction Table A α T[A,t] = α if t ∈ FIRST(α) T[A,t] = α if ℇ ∈ FIRST(α) and t ∈ FOLLOW(A) –t can also be $ T is not well defined the grammar is not LL(1) 53
54
Problem: Non LL Grammars bool S() { return A() && match(token(‘a’)) && match(token(‘b’)); } bool A() { return match(token(‘a’)) || true; } S A a b A a | What happens for input “ab”? What happens if you flip order of alternatives and try “aab”? 54
55
FIRST(S) = { a }FOLLOW(S) = { $ } FIRST(A) = { a }FOLLOW(A) = { a } FIRST/FOLLOW conflict S A a b A a | 55 Problem: Non LL Grammars
56
Solution: substitution S A a b A a | S a a b | a b Substitute A in S S a after_A after_A a b | b Left factoring 56
57
LL(k) grammars A grammar is LL(k) if it can be derived via: –Top-down derivation –Scanning the input from left to right (L) –Producing the leftmost derivation (L) –With lookahead of k tokens (k) –T is well defined A language is said to be LL(k) when it has an LL(k) grammar 57
58
LL(k) grammars A grammar is not LL(k) if it is –Ambiguous –Left recursive –Not left factored –… 58
59
Earley Parsing Invented by Earley [PhD. 1968] Handles arbitrary CFG Can handle ambiguous grammars Complexity O(N 3 ) when N = |input| Uses dynamic programming –Compactly encodes ambiguity 59
60
Dynamic programming Break a problem P into subproblems P 1 …P k –Solve P by combining solutions for P 1 …P k –Memo-ize (store) solutions to subproblems instead of re-computation Bellman-Ford shortest path algorithm –Sol(x,y,i) = minimum of Sol(x,y,i-1) Sol(t,y,i-1) + weight(x,t) for edges (x,t) 60
61
Earley Parsing Dynamic programming implementation of a recursive descent parser –S[N+1] Sequence of sets of “Earley states” N = |INPUT| Earley state (item) s is a sentential form + aux info –S[i] All parse tree that can be produced (by a RDP) after reading the first i tokens S[i+1] built using S[0] … S[i] 61
62
Earley States s = –constituent (dotted rule) for A αβ A αβ predicated constituents A αβ in-progress constituents A αβ completed constituents –back previous Early state in derivation 62
63
Earley Parser Input = x[1…N] S[0] = ; S[1] = … S[N] = {} for i = 0... N do until S[i] does not change do foreach s ∈ S[i] if s = and a=x[i+1] then S[i+1] = S[i+1] ∪ { } // scan if s = and X α then S[i] = S[i] ∪ { } // predict if s = and ∈ S[b] then S[i] = S[i] ∪ { } // complete 63
64
Example 64
65
65
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.