Presentation is loading. Please wait.

Presentation is loading. Please wait.

LL(k) Parsing Compiler Baojian Hua

Similar presentations


Presentation on theme: "LL(k) Parsing Compiler Baojian Hua"— Presentation transcript:

1 LL(k) Parsing Compiler Baojian Hua bjhua@ustc.edu.cn

2 Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer

3 Parsing The parser translates the source program into abstract syntax trees Token sequence: from the lexer abstract syntax trees: compiler internal data structures for programs check (syntactic) validity of programs Must take account the program syntax

4 Conceptually token sequence abstract syntax tree parser language syntax

5 Syntax: Context-free Grammar Context-free grammars are (often) given by BNF expressions (Backus-Naur Form) read dragon-book sec 2.2 More powerful than RE in theory Good for defining language syntax

6 Context-free Grammar (CFG) A CFG consists of 4 components: a set of terminals (tokens): T a set of nonterminals: N a set of production rules: P s -> t1 t2 … tn with s  N, and t1, …, tn  (T ∪ N) a unique start nonterminal: S

7 Example // SLP as in Tiger book chap. 1 (simplified): N = {S, E} T = {SEMICOLON, ID, IF, ASSIGN, …} S = S S -> S SEMICOLON S | ID ASSIGN E | PRINT LPAREN E RPAREN E -> ID | NUM | E PLUS E | E TIMES E

8 Derivation A derivation: Starts with the unique start nonterminal S repeatedly replacing a right-hand nonterminal s by the body of a production rule of the nonterminal s stop when right-hand are all terminals The final string consists of terminals only and is called a sentence (program)

9 Example S -> S ; S | id := E | print (E) E -> … x := 5; print (x) derive me S -> … (a choice)

10 Example x := 5; print (x) derive me S -> S ; S -> x := E ; S -> x := 5 ; S -> x := 5 ; print (E) -> x := 5 ; print (x) S -> S ; S | id := E | print (E) E -> id | num | E + E | E * E

11 Another Try to Derive the same Program x := 5; print (x) derive me S -> x := E -> x := 5 -> // stuck! :-( S -> S ; S | id := E | print (E) E -> …

12 Derivation For same string, there may exist many different derivations left-most derivation right-most derivation Parsing is the problem of taking a string of terminals and figure out whether it could be derived from a CFG error-detection

13 Parse Trees Derivation can also be represented as trees useful to understand AST (discussed later) Idea: each internal node is labeled with a nonterminal each leaf node is labeled with a terminal each use of a rule in a derivation explains how to generate children in the parse tree from the parent

14 Example S -> S ; S | … x := 5; print (x) derive me S S; S x :=E 5 printE x ()

15 Parse Tree has Meanings: post-order traversal S -> S ; S | … x := 5; print (x) derive me S S; S x :=E 5 printE x ()

16 Ambiguous Grammars A grammar is ambiguous if the same sequence of tokens can give rise to two or more different parse trees

17 Example E -> num | id | E + E | E * E 3+4*5 derive me E -> E + E -> 3 + E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 E -> E * E -> E + E * E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5

18 Example E -> num | id | E + E | E * E E -> E + E -> 3 + E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 E -> E * E -> E + E * E -> 3 + E * E -> 3 + 4 * E -> 3 + 4 * 5 E E+E 3 E*E 5 4 E E*E 5 E+E 4 3

19 Ambiguous Grammars Problem: compilers make use of parse trees to interpret the meaning of parsed programs different parse trees have different meanings eg: 4 + 5 * 6 is not (4 + 5) * 6 languages with ambiguous grammars are DISASTROUS; the meaning of programs isn ’ t well-defined! You can ’ t tell what your program might do! Solution: rewrite grammar to equivalent forms

20 Eliminating ambiguity In programming language syntax, ambiguity often arises from missing operator precedence or associativity * is of high precedence than + both + and * are left-associative Why or why not? Rewrite grammar to take account of this

21 Example E -> num | id | E + E | E * E E -> E + T | T T -> T * F | F F -> num | id Q: is the right grammar ambiguous? Why or why not?

22 Parser A program to check whether a program is derivable from a given grammar expensive in general must be fast to compile a 2000k lines of kernel even for small application code, speed may be a concern Theorists have developed specialized kind of grammar which may be parsed efficiently LL(k) and LR(k)

23 Recursive Decedent Parsing

24 Predictive parsing A.K.A: Recursive descent parsing, top-down parsing simple to code by hand efficient can parse a large set of grammar your Tiger compiler will use this Key idea: one (recursive) function for each nonterminal one clause for each right-hand production rule

25 Connecting with the lexer (* step #1: represent tokens *) token = ID | IF | NUM | ASSIGN | SEMICOLON | LPAREN | RPAREN | … (* step #2: connect with lexer *) token current_token; /* external var */ void eat (token t) = if (current_token = t) current_token = Lex_nextToken (); else error (“want “, t, “but got”, current_token)

26 stm -> stm ; stm | id := exp | print (exp) exp -> ID | NUM | exp + exp | exp * exp (* step #1: cook a lexer, including tokens *) struct token current_token = lex (); (* step #2: build the parser *) void parse_stm () = switch (current_token) case ID => eat (ID); eat (ASSIGN); parse_exp (); case PRINT => eat (PRINT); eat (LPAREN); parse_exp (); eat (RPAREN); default =>error(“want ID, PRINT”); void parse_exp () = switch (current_token) case ID: ??? case NUM: ??? // backtracking!! parse_stm() parse_exp()

27 How to handle precedence? void parse_stm_all () parse_stm(); while (current_token == “;”) eat (;); parse_stm (); void parse_exp_plus () parse_exp_times(); while (current_token == “+”) eat (+); parse_exp_times(); stm ; stm ; stm ; stm 2+3*4+5*7 Generally: if there are n level of precedence, one may write n parsing functions.

28 Moral The key point in predicative parsing is to determine the production rule to use (recursive function to call) must know the “ start ” symbols of each rule “ start ” symbol must not overlap e.g.: exp -> NUM | ID This motivates the idea of first and follow sets

29 First & Follow

30 Moral S -> w1 -> w2 -> … -> wn For nonterminal S, and current input token t if wk starts with t, then choose wk, or if wk derives empty string, and the string follow S starts with t First symbol sets of wi (1<=i<=n) don ’ t overlap to avoid backtracking

31 Nullable, First and Follow sets To use predicative parsing, we must compute: Nullable: nonterminals that derive empty string First(ω) : set of terminals that can begin any string derivable from ω Follow(X): set of terminals that can immediately follow any string derivable from nonterminal X Read tiger sec 3.2 Fixpoint algorithms

32 Nullable, First and Follow sets Which symbol X, Y and Z can derive empty string? What terminals may the string derived from X, Y and Z begin with? What terminals may follow X, Y and Z? Z -> d -> X Y Z Y -> c -> X -> Y -> a

33 Nullable If X can derive an empty string, iff: base case: X -> inductive case: X -> Y1 … Yn Y1, …, Yn are n nonterminals and may all derive empty strings

34 Computing Nullable /* Nullable: a set of nonterminals */ Nullable <- {}; while (Nullable still changes) for (each production X -> α) switch (α) case  : Nullable ∪ = {X}; break; case Y1 … Yn: if (Y1  Nullable && … && Yn  Nullable) Nullable ∪ = {X}; break;

35 Example: Nullables Z -> d -> X Y Z Y -> c -> X -> Y -> a Round012 nullable{}

36 Example: Nullables Z -> d -> X Y Z Y -> c -> X -> Y -> a Round012 nullable{}{Y, X}

37 Example: Nullables Z -> d -> X Y Z Y -> c -> X -> Y -> a Round012 Φ{}{Y, X}

38 First(X) Set of terminals that X begins with: X => a … Rules base case: X -> a First (X) ∪ = {a} inductive case: X -> Y1 Y2 … Yn First (X) ∪ = First(Y1) if Y1  Nullable, First (X) ∪ = First(Y2) if Y1,Y2  Nullable, First (X) ∪ = First(Y3) …

39 Computing First // Suppose Nullable set has been computed foreach (nonterminal X) First(X) <- {}; while (some First set still changes) for (each production X -> α) switch (α) case a: First(X) ∪ = {a}; break; case Y1 … Yn: First(X) ∪ = First(Y1); if (Y1 \not\in Nullable) break; First(X) ∪ = First(Y2); …; // Similar as above

40 Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z){} First(Y){} First(X){} Nullable = {X, Y}

41 Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z){}{d} First(Y){}{c} First(X){}{c, a} Nullable = {X, Y}

42 Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z){}{d}{d, c, a} First(Y){}{c} First(X){}{c, a} Nullable = {X, Y}

43 Example: First Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z){}{d}{d, c, a} First(Y){}{c} First(X){}{c, a} Nullable = {X, Y}

44 Parsing with First Z -> d {d} -> X Y Z {a, c, d} Y -> c {c} -> {} X -> Y {c} -> a {a} First(Z){d, c, a} First(Y){c} First(X){c, a} Nullable = {X, Y} Now consider this string: d Suppose we choose the production: Z -> X Y Z But we get stuck at: X -> Y -> a But neither can accept d! What’s the problem?

45 Follow(X) Set of terminals that may follow X: S => … X a … Rules: Base case: Follow (X) = {} inductive case: Y -> ω1 X ω2 Follow(X) ∪ = Fisrt(ω2) if ω2 is Nullable, Follow(X) ∪ = Follow(Y)

46 Computing Follow(X) foreach (nonterminal X) Follow(X) <- {}; while (some Follow still changes) { for (each production Y -> ω1 X ω2 ) Follow(X) ∪ = First (ω2); if ( ω2 is Nullable) Follow(X) ∪ = Follow (Y);

47 Example: Follow Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z) Follow(Z) {d, c, a} {} First(Y) Follow(Y) {c} {} First(X) Follow(X) {c, a} {} Nullable = {X, Y}

48 Example: Follow Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z) Follow(Z) {d, c, a} {}{$} First(Y) Follow(Y) {c} {}{d, c, a} First(X) Follow(X) {c, a} {}{d, c, a} Nullable = {X, Y}

49 Example: Follow Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z) Follow(Z) {d, c, a} {}{$} First(Y) Follow(Y) {c} {}{d, c, a} First(X) Follow(X) {c, a} {}{d, c, a} Nullable = {X, Y}

50 Predicative Parsing Table With Nullables, First(), and Follow(), we can make a parsing table P(N,T) each entry contains a set of productions t1 t2 t3 t4 … $(EOF) N1 ri N2 rk N3 rj …

51 Predicative Parsing Table For each rule X -> ω for each a  First(ω), add X -> ω to P(X, a) if X is nullable, add X -> ω to P(X, b) for each b  Follow (X) all other entries are “ error ” t1 t2 t3 t4 … $(EOF) N1 r1 N2 rk N3 ri …

52 Example: Predicative Parsing Table First(X) Follow(X) {c, a} {c, d, a} First(Y) Follow(Y) {c} {c, d, a} First(Z) Follow(Z) {d, c, a} {$} Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y} acd ZZ->X Y Z Z->d Z->X Y Z YY->Y->c Y-> XX->Y X->a X->Y

53 Example: Predicative Parsing Table First(X) Follow(X) {c, a} {c, d, a} First(Y) Follow(Y) {c} {c, d, a} First(Z) Follow(Z) {d, c, a} {$} Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y} acd ZZ->X Y Z Z->d Z->X Y Z YY->Y->c Y-> XX->Y X->a X->Y

54 LL(1) A context-free grammar is called LL(1) if it can be parsed this way: Left-to-right parsing Leftmost derivation 1 token lookahead This means that in the predicative parsing table, there is at most one production in every entry

55 Speeding up set Construction All these sets (Nullable, First, Follow) can be computed simultaneously see Tiger book algorithm 3.13 Order the computation: What ’ s the optimal order to compute these set?

56 Example: Speeding up set Construction Z -> d -> X Y Z Y -> c -> X -> Y -> a Round0123 First(Z){} First(Y){} First(X){} Nullable = {X, Y} Q1: What ’ s reasonable order here? Q2: How to set this order?

57 Directed Graph Model Z -> d -> X Y Z Y -> c -> X -> Y -> a Nullable = {X, Y} Q1: What ’ s reasonable order here? Q2: How to set this order? Z X Y {c} {c, a} {d, c, a} Order: Y X Z

58 Reverse Quasi-Topological Sort Quasi-topological sort the directed graph Quasi: topo-sort general directed graph is impossible also known as reverse depth-first ordering Reverse: information (here: First) flows from successors backward to predecessors Refer to your favorite algorithm book

59 Problem LL(1) can only be used with grammars in which every production rules for a nonterminal start with different terminals Unfortunately, many grammars don ’ t have this perfect property

60 Example exp -> NUM -> ID -> exp + exp -> exp * exp exp -> exp + term -> term term -> term * factor -> factor factor -> NUM -> ID Q: is the right grammar LL(1)? Why or why not?

61 Solutions Left-recursion elimination Left-factoring Read: tiger section 3.2

62 Example for SLP stm -> id := exp A -> print(exp) A A -> ; stm A -> Q1: is the right grammar LL(1)? Q2: are these two grammars equivalent? stm -> stm ; stm -> id := exp -> print (exp)

63 LL(k) LL(1) can be further generalized to LL(k): Left-to-right parsing Leftmost derivation k token lookahead Q: table size? other problems with this approach?

64 Summary Context-free grammar is a math tool for specifying language syntax among others … Writing parsers for general grammar is hard and costly LL(k) and LR(k) LL(1) grammars can be implemented efficiently table-driven algorithms (again!)


Download ppt "LL(k) Parsing Compiler Baojian Hua"

Similar presentations


Ads by Google