Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:

Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Jeff Ullman, Eran Yahav

Admin Next week: Trubowicz 101 (Law school) Mobiles... 2

What is a Compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 3

Conceptual Structure of a Compiler Executable code exe Source text txt Semantic Representation Backend Compiler Frontend Lexical Analysis Syntax Analysis Parsing Semantic Analysis Intermediate Representation (IR) Code Generation wordssentences 4

From scanning to parsing ((23 + 7) * x) )x*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: E ... | Id Id  ‘a’ |... | ‘z’ Op(*) Id(b) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error 5

Context Free Grammars V – non terminals (syntactic variables) T – terminals (tokens) P – derivation rules –Each rule of the form V  (T  V)* S – start symbol G = (V,T,P,S) 6

CFG terminology Symbols: Terminals (tokens): ; := ( ) id num print Non-terminals: S E L Start non-terminal: S Convention: the non-terminal appearing in the first derivation rule Grammar productions (rules) N  μ S  S ; S S  id := E E  id E  num E  E + E 7

CFG terminology Derivation - a sequence of replacements of non-terminals using the derivation rules Language - the set of strings of terminals derivable from the start symbol Sentential form - the result of a partial derivation in which there may be nonterminals 8

Parse Tree S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := E + id ; id := idid := id + id ; x:= z;y := x + z S S S S ; ; S S id := E E id := E E E E + + E E id 9

Leftmost/rightmost Derivation Leftmost derivation –always expand leftmost non-terminal Rightmost derivation –Always expand rightmost non-terminal Allows us to describe derivation by listing the sequence of rules –always know what a rule is applied to 10

Leftmost Derivation x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) S SS ; id := ES ; id := idS ; id := E ; id := id id := E + E ; id := id id := id + E ; id := id id := id + id ; S  S;S S  id := E E  id S  id := E E  E + E E  id x := z ; y := x + z 11

Broad kinds of parsers Parsers for arbitrary grammars –Earley’s method, CYK method –Usually, not used in practice (though might change) Top-Down parsers –Construct parse tree in a top-down matter –Find the leftmost derivation Linear Bottom-Up parsers –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order 12

Intuition: Top-Down Parsing Begin with start symbol “Guess” the productions Check if parse tree yields user's program 13

Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FFF T T E T E 14 Recall: Non standard precedence …

We need this rule to get the * +*321 E 15 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 E T E 16 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 T E T E 17 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 F T T E T E 18 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 FF T T E T E 19 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 FF T T E E 20 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) T

+*321 FF T T E T E 21 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 FFF T T E T E 22 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

+*321 FFF T T E T E 23 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

Challenges in top-down parsing Top-down parsing begins with virtually no information –Begins with just the start symbol, which matches every program How can we know which productions to apply? 24

Blind exhaustive search –Goes over all possible production rules –Read & parse prefixes of input –Backtracks if guesses wrong Implementation –Uses (possibly recursive) functions for every production rule –Backtracks  “rewind” input 25 Recursive Descent

Recursive descent bool A(){ // A  A 1 | … | A n pos = recordCurrentPosition(); for (i = 1; i ≤ n; i++){ if (A i ()) return true; rewindCurrent(pos); } return false; } bool A i (){ // A i = X 1 X 2 …X k for (j=1; j ≤ k; j++) if (X j is a terminal) if (X j == current) match(current); else return false; else if (! X j ()) return false; return true; } 26 )x*)7+23(( RPIdOPRPNumOPNumLP token stream current

Example Grammar –E  T | T + E –T  int | int * T | ( E ) Input: ( 5 ) Token stream: LP int RP 27

int E() { return E() && match(token(‘-’)) && term(); } E  E - term | term  What happens with this procedure?  Recursive descent parsers cannot handle left-recursive grammars p. 127 28 Problem: Left Recursion

Left recursion removal L(G 1 ) = β, βα, βαα, βααα, … L(G 2 ) = same N  Nα | β N  βN’ N’  αN’ |  G1G1 G2G2 E  E - term | term E  term TE | term TE  - term TE |   For our 3 rd example: p. 130 Can be done algorithmically. Problem: grammar becomes mangled beyond recognition 29

Challenges in top-down parsing Top-down parsing begins with virtually no information –Begins with just the start symbol, which matches every program How can we know which productions to apply? Wanted: Top-Down parsing without backtracking 30

Predictive parsing Given a grammar G and a word w derive w using G –Apply production to leftmost nonterminal –Pick production rule based on next input token General grammar –More than one option for choosing the next production based on a token Restricted grammars (LL) –Know exactly which single rule to apply based on Non terminal Next (k) tokens (lookahead) 31

Boolean expressions example not ( not true or false ) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E E (EOPE) notLITorLIT truefalse production to apply known from next token E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor 33

Cannot tell which rule to use based on lookahead (ID) term  ID | ID [ expr ] Problem: productions with common prefix 34

Solution: left factoring Rewrite the grammar to be in LL(1) Intuition: just like factoring x*y + x*z into x*(y+z) term  ID | ID [ expr ] term  ID after_ID After_ID  [ expr ] |  35

S  if E then S else S | if E then S | T S  if E then S S’ | T S’  else S |  Left factoring – another example 36

LL(k) Parsers Predictive parser –Can be generated automatically –Does not use recursion –Efficient In contrast, recursive descent –Manual construction –Recursive –Expensive 37

Pushdown automaton uses –Prediction stack –Input token stream –Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t LL(k) parsing via pushdown automata and prediction table 38

()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Which rule should be used Example transition table 39 Table

Non-Recursive Predictive Parser Predictive Parsing program Parsing Table X Y Z $ Stack $b+a Output 40

LL(k) parsing via pushdown automata and prediction table Two possible moves –Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error –Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error Parsing terminates when prediction stack is empty –If input is empty at that point, success. Otherwise, syntax error 41

abc AA  aAbA  c A  aAb | c aacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A  aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A  aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A  c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success Running parser example 42

FIRST sets FIRST(α) = { t | α  * t β} ∪ {X  * ℇ } –FIRST(α) = all terminals that α can appear as first in some derivation for α + ℇ if can be derived from X Example: –FIRST( LIT ) = { true, false } –FIRST( ( E OP E ) ) = { ‘(‘ } –FIRST( not E ) = { not } 43

Computing FIRST sets FIRST (t) = { t } // “t” non terminal ℇ∈ FIRST(X) if –X  ℇ or –X  A 1.. A k and ℇ∈ FIRST(A i ) i=1…k FIRST(α) ⊆ FIRST(X) if –X  A 1.. A k α and ℇ∈ FIRST(A i ) i=1…k 44

FOLLOW sets FOLLOW(X) = { t | S  * α X t β} FOLLOW(X) = set of tokens that can immediately follow X in some sentential form NOT related to what can be derived from X Intuition: X  A B –FIRST(B) ⊆ FOLLOW(A) –FOLLOW(B) ⊆ FOLLOW(X) –If B  *  then FOLLOW(A) ⊆ FOLLOW(X) p. 189 50

FOLLOW sets: Constraints $ ∈ FOLLOW(S) FIRST(β) – { ℇ } ⊆ FOLLOW(X) –For each A  α X β FOLLOW(A) ⊆ FOLLOW(X) –For each A  α X β and ℇ ∈ FIRST(β) 51

Example: FOLLOW sets E  TX X  + E | ℇ T  (E) | int Y Y  * T | ℇ 52 Terminal+(*)int FOLLOWint, ( _, ), $*, ), +, $ Non. Term. ETXY FOLLOW), $+, ), $$, )_, ), $

Prediction Table A  α T[A,t] = α if t ∈ FIRST(α) T[A,t] = α if ℇ ∈ FIRST(α) and t ∈ FOLLOW(A) –t can also be $ T is not well defined  the grammar is not LL(1) 53

Problem: Non LL Grammars bool S() { return A() && match(token(‘a’)) && match(token(‘b’)); } bool A() { return match(token(‘a’)) || true; } S  A a b A  a |   What happens for input “ab”?  What happens if you flip order of alternatives and try “aab”? 54

FIRST(S) = { a }FOLLOW(S) = { $ } FIRST(A) = { a  }FOLLOW(A) = { a } FIRST/FOLLOW conflict S  A a b A  a |  55 Problem: Non LL Grammars

Solution: substitution S  A a b A  a |  S  a a b | a b Substitute A in S S  a after_A after_A  a b | b Left factoring 56

LL(k) grammars A grammar is LL(k) if it can be derived via: –Top-down derivation –Scanning the input from left to right (L) –Producing the leftmost derivation (L) –With lookahead of k tokens (k) –T is well defined A language is said to be LL(k) when it has an LL(k) grammar 57

LL(k) grammars A grammar is not LL(k) if it is –Ambiguous –Left recursive –Not left factored –… 58

Earley Parsing Invented by Earley [PhD. 1968] Handles arbitrary CFG Can handle ambiguous grammars Complexity O(N 3 ) when N = |input| Uses dynamic programming –Compactly encodes ambiguity 59

Dynamic programming Break a problem P into subproblems P 1 …P k –Solve P by combining solutions for P 1 …P k –Memo-ize (store) solutions to subproblems instead of re-computation Bellman-Ford shortest path algorithm –Sol(x,y,i) = minimum of Sol(x,y,i-1) Sol(t,y,i-1) + weight(x,t) for edges (x,t) 60

Earley Parsing Dynamic programming implementation of a recursive descent parser –S[N+1] Sequence of sets of “Earley states” N = |INPUT| Earley state (item) s is a sentential form + aux info –S[i] All parse tree that can be produced (by a RDP) after reading the first i tokens S[i+1] built using S[0] … S[i] 61

Earley States s = –constituent (dotted rule) for A  αβ A  αβ predicated constituents A  αβ in-progress constituents A  αβ completed constituents –back previous Early state in derivation 62

Earley Parser Input = x[1…N] S[0] = ; S[1] = … S[N] = {} for i = 0... N do until S[i] does not change do foreach s ∈ S[i] if s = and a=x[i+1] then S[i+1] = S[i+1] ∪ { } // scan if s = and X  α then S[i] = S[i] ∪ { } // predict if s = and ∈ S[b] then S[i] = S[i] ∪ { } // complete 63

Example 64

Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:

Similar presentations

Presentation on theme: "Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:

Similar presentations

Presentation on theme: "Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:"— Presentation transcript:

Similar presentations

About project

Feedback