Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Principles Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion University 1.

Similar presentations


Presentation on theme: "Compiler Principles Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion University 1."— Presentation transcript:

1 Compiler Principles Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion University 1

2 Previously: lexical analysis High-level process Scanner generator (e.g., JFlex) automatically generates scanner code 2 List of regular expressions (one per lexeme) NFA+Є DFA Token nextToken() { … } Code implementing maximal munch with tie breaking policy minimization

3 Books 3 Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Advanced Compiler Design and Implementation Steven Muchnik Modern Compiler Design D. Grune, H. Bal, C. Jacobs, K. Langendoen Modern Compiler Implementation in Java Andrew W. Appel

4 Tentative syllabus Front End Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Attribute Grammars Intermediate Representation Lowering Optimizations Local Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection 4 mid-termexam

5 Agenda 5 Understand role of syntax analysis Context-free grammars refresher Top-down parsing

6 The bigger picture Compilers include different kinds of program analyses each further constrains the set of legal programs – Lexical constraints – Syntax constraints – Semantic constraints – “Logical” constraints (Verifying Compiler grand challenge) 6 Program consists of legal tokens Program included in a given context- free language Program included in a given attribute grammar (type checking, legal inheritance graph, variables initialized before used) Memory safety: null dereference, array-out-of-bounds access, data races, functional correctness (program meets specification)

7 Syntax analysis overview 7

8 Role of syntax analysis Recover structure from stream of tokens – Parse tree / abstract syntax tree Error reporting (recovery) Other possible tasks – Syntax directed translation (one pass compilers) – Create symbol table – Create pretty-printed version of the program, e.g., Auto Formatting function in Eclipse 8 High-level Language (scheme) Executable Code Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation

9 From tokens to abstract syntax trees 59 + (1257 * xPosition) )id*num(+ Lexical Analyzer program text token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) + num x * Abstract Syntax Tree valid syntax error 9 Lexical error valid Regular expressions Finite automata Context-free grammars Push-down automata

10 Context-free grammars refresher 10

11 Example grammar 11 shorthand for Statement shorthand for Expression shorthand for List (of expressions) S  S ; S S  id := E S  print (L) E  id E  num E  E + E L  E L  L, E

12 CFG terminology 12 Symbols : Terminals (tokens): ; := ( ) id num print Non-terminals: S E L Start non-terminal: S Convention: the non-terminal appearing in the first derivation rule Grammar productions (rules) N  α S  S ; S S  id := E S  print (L) E  id E  num E  E + E L  E L  L, E

13 More definitions Sentential form: a sequence of symbols, terminals (tokens) and non-terminals Sentence: a sequence of terminals (tokens) Derivation step: given a sentential form αNβ and rule N  µ a step is the transition αNβ  αµβ Derivation sequence: a sequence of derivation steps  1  …   k such that  i   i+1 is the result of applying one production and  k is a sentence 13

14 Language of a CFG A word ω is in L(G) (valid program) if there exists a corresponding derivation sequence – Start the start symbol – Repeatedly replace one of the non-terminals by a right-hand side of a production – Stop when the sentence contains only terminals ω is in L(G) if S  * ω – Rightmost derivation – Leftmost derivation 14

15 Leftmost derivation 15 S => S ; S => id := E ; S => id := num ; S => id := num ; id := E => id := num ; id := E + E => id := num ; id := num + E => id := num ; id := num + num a := 56 ; b := 7 + 3 id := num ; id := num + num S  S ; S S  id := E S  print (L) E  id E  num E  E + E L  E L  L, E

16 Rightmost derivation 16 S => S ; S => S ; id := E => S ; id := E + E => S ; id := E + num => S ; id := num + num => id := E ; id := num + num => id := num ; id := num + num a := 56 ; b := 7 + 3 id := num ; id := num + num S  S ; S S  id := E S  print (L) E  id E  num E  E + E L  E L  L, E

17 Canonical derivations Leftmost/rightmost derivations may not be unique but they allow describing a derivation by the sequence of production rules taken (since non-terminal is already known) 17

18 Parse trees Tree nodes are symbols, children ordered left-to-right Each internal node is non-terminal and its children correspond to one of its productions N  µ 1 … µ k Root is start non-terminal Leaves are tokens Yield of parse tree: left-to-right walk over leaves 18 µ1µ1 µkµk N …

19 Parse tree exercise 19 S  S ; S S  id := E S  print (L) E  id E  num E  E + E L  E L  L, E id:=num;id:=num + Draw parse tree for expression

20 Parse tree exercise 20 id:=num;id:=num + EEE SE S S Order-independent representation S  S ; S S  id := E S  print (L) E  id E  num E  E + E L  E L  L, E ( S ( S a := ( E 56) E ) S ; ( S b := ( E ( E 7) E + ( E 3) E ) E ) S ) S Equivalently add parentheses labeled by non-terminal names

21 Capabilities and limitations of CFGs CFGs naturally express – Hierarchical structure A program is a list of classes, A Class is a list of definition… – Alternatives A definition is either a field definition or a method definition – Beginning-end type of constraints Balanced parentheses S  (S)S | ε Cannot express – Correlations between unbounded strings (identifiers) – For example: variables are declared before use: ω S ω Handled by semantic analysis (attribute grammars) 21 p. 173

22 Bad grammars 22 By Oren neu dag (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

23 Badly-formed grammars A non-terminal N is reachable if S  * αNβ A non-terminal N is generating if N  * ω A grammar G is badly-formed if it either contains unreachable non- terminals or non-generating non-terminals – G 1 = { S  x N  y } – G 2 = { S  x | N N  a N b N } Theorem: for every grammar G there exists an equivalent well- formed grammar G’ ( that is, L(G)=L(G’) ) Proof: exercise From now on, we will only handle well-formed grammars 23

24 Ambiguity in Context-free grammars 24

25 Sometimes there are two parse trees 25 Leftmost derivation E E + E num + E num + E + E num + num + E num + num + num num(1) E EE + EE +num(2)num(3) Rightmost derivation E E + E E + num E + E + num E + num + num num + num + num +num(3)+num(1)num(2) Arithmetic expressions: E  id E  num E  E + E E  E * E E  ( E ) 1 + 2 + 3 E EE E E 1 + (2 + 3)(1 + 2) + 3

26 Is ambiguity a problem for compilers? Leftmost derivation E E + E num + E num + E + E num + num + E num + num + num num(1) E EE + EE +num(2)num(3) Rightmost derivation E E + E E + num E + E + num E + num + num num + num + num +num(3)+num(1)num(2) Arithmetic expressions: E  id E  num E  E + E E  E * E E  ( E ) 1 + 2 + 3 E EE E E = 6 1 + (2 + 3)(1 + 2) + 3 Depends on semantics 26

27 Problematic ambiguity example Leftmost derivation E E + E num + E num + E * E num + num * E num + num * num num(1) E EE + EE *num(2)num(3) Rightmost derivation E E * E E * num E + E * num E + num * num num + num * num *num(3)+num(1)num(2) Arithmetic expressions: E  id E  num E  E + E E  E * E E  ( E ) 1 + 2 * 3 This is what we usually want: * has precedence over + E EE E E = 7= 9 1 + (2 * 3)(1 + 2) * 3 27

28 Ambiguous grammars A grammar is ambiguous if there exists a word for which there are – Two different leftmost derivations – Two different rightmost derivations – Two different parse trees Property of grammars, not languages Some languages are inherently ambiguous – no unambiguous grammars exist No algorithm to detect whether arbitrary grammar is ambiguous 28

29 Drawbacks of ambiguous grammars Ambiguous semantics Parsing complexity May affect other phases Solutions? 29

30 Drawbacks of ambiguous grammars Ambiguous semantics Parsing complexity May affect other phases Solutions – Allow only non-ambiguous grammars – Transform grammar into non-ambiguous – Handle as part of parsing method Using special form of “precedence” Wait for bottom-up parsing lecture 30

31 Transforming ambiguous grammars to non-ambiguous by layering Ambiguous grammar E  E + E E  E * E E  id E  num E  ( E ) Unambiguous grammar E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Layer 1 Layer 2 Layer 3 Let’s derive 1 + 2 * 3 Each layer takes care of one way of composing sub- strings to form a string: 1: by + 2: by * 3: atoms 31

32 Transformed grammar: * precedes + Ambiguous grammar E  E + E E  E * E E  id E  num E  ( E ) Unambiguous grammar E  E + T E  T T  T * F T  F F  id F  num F  ( E ) Derivation E => E + T => T + T => F + T => 1 + T => 1 + T * F => 1 + F * F => 1 + 2 * F => 1 + 2 * 3 +*321 FFF T TE T E Parse tree 32

33 Transformed grammar: + precedes * Ambiguous grammar E  E + E E  E * E E  id E  num E  ( E ) Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 Derivation E => E * T => T * T => T + F * T => F + F * T => 1 + F * T => 1 + 2 * T => 1 + 2 * F => 1 + 2 * 3 FFF T T E T E Parse tree 33

34 Another example for layering 34 Ambiguous grammar P  ε | P P | ( P ) ε)(ε)(() PP PP P ε)(ε)(() PP PP P P ε P

35 Another example for layering 35 Ambiguous grammar P  ε | P P | ( P ) Unambiguous grammar S  P S | ε P  ( S ) Takes care of “concatenation” Takes care of nesting ε)(ε)(() SS PP s ε s P s s s ε

36 “dangling-else” example 36 Ambiguous grammar S  if E then S | if E then S else S | other if S Sthen ifelseESS E E1E1 E2E2 S1S1 S2S2 if S Sthen if else ES S E E1E1 E2E2 S1S1 S2S2 if E 1 then (if E 2 then S 1 else S 2 )if E 1 then (if E 2 then S 1 ) else S 2 This is what we usually want: match else to closest unmatched then if E 1 then if E 2 then S 1 else S 2 p. 174

37 “dangling-else” example 37 if S Sthen ifelse Ambiguous grammar S  if E then S | if E then S else S | other ESS E E1E1 E2E2 S1S1 S2S2 if S Sthen if else ES S E E1E1 E2E2 S1S1 S2S2 if E 1 then (if E 2 then S 1 else S 2 )if E 1 then (if E 2 then S 1 ) else S 2 Unambiguous grammar S  M | U M  if E then M else M | other U  if E then S | if E then M else U if E 1 then if E 2 then S 1 else S 2 Matched statements Unmatched statements p. 174

38 Parsing strategies 38

39 Broad kinds of parsers Parsers for arbitrary grammars – Cocke-Younger-Kasami [‘65] method O(n 3 ) – Earley’s method (implemented by NLTK)NLTK – Not commonly used by compilers Parsers for restricted classes of grammars – Top-Down With/without backtracking – Bottom-Up 39

40 Top-down parsing Constructs parse tree in a top- down matter Preorder tree traversal Find the leftmost derivation Predictive: for every non- terminal and k-tokens predict the next production LL(k) Challenge: beginning with the start symbol, try to guess the productions to apply to end up at the user's program 40 By Fidelio (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

41 Top-down parsing example 41 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FFF T T E T E

42 Top-down parsing example 42 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) We need this rule to match the * in the input +*321 E

43 Top-down parsing example 43 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 E T E

44 Top-down parsing example 44 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 T E T E

45 Top-down parsing example 45 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 F T T E T E

46 Top-down parsing example 46 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T T E T E

47 Top-down parsing example 47 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T T E T E

48 Top-down parsing example 48 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T T E T E

49 Top-down parsing example 49 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T T E T E F

50 Top-down parsing example 50 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FFF T T E T E

51 Bottom-up parsing Construct parse tree in a bottom-up manner Find the rightmost derivation in a reverse order For every potential right hand side and k-tokens decide when a production is found LR(k) Postorder tree traversal Challenge: beginning with the user's program, try to apply productions in reverse to convert the program back into the start symbol 51

52 Bottom-up parsing example 52 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321

53 Bottom-up parsing example 53 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 F

54 Bottom-up parsing example 54 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 F T

55 Bottom-up parsing example 55 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T

56 Bottom-up parsing example 56 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T F

57 Bottom-up parsing example 57 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T F T

58 Bottom-up parsing example 58 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T F T T

59 Bottom-up parsing example 59 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T F T T E

60 Bottom-up parsing example 60 Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T F T T E E

61 Top-down parsing via recursive descent 61 By Vahram Mekhitarian (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

62 Challenges in top-down parsing Top-down parsing begins with virtually no information – Begins with just the start symbol, which matches every program How can we know which productions to apply? In general, we can‘t – There are some grammars for which the best we can do is guess and backtrack if we're wrong If we have to guess, how do we do it? – Parsing as a search algorithm – Too expensive in theory (exponential worst-case time) and practice 62

63 Predictive parsing Given a grammar G and a word ω attempt to derive ω using G Idea – Apply production to leftmost nonterminal – Pick production rule based on next input token General grammar – More than one option for choosing the next production based on a token Restricted grammars (LL) – Know exactly which single rule to apply – May require some lookahead to decide 63

64 Boolean expressions example 64 not ( not true or false ) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E E (EOPE) notLITorLIT truefalse production to apply known from next token E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor

65 Recursive descent parsing Define a function for every nonterminal Every function works as follows – Find applicable production rule – Terminal function checks match with next input token (if no match reports error) – Nonterminal function calls (recursively) other functions If there are several applicable productions for a nonterminal, use lookahead 65

66 Matching tokens Variable current holds the current input token 66 match(token t) { if (current == t) current = next_token() else error } E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor

67 Functions for nonterminals 67 E() { if (current  {TRUE, FALSE}) // E  LIT LIT(); else if (current == LPAREN) // E  ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT)// E  not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor

68 Implementation via recursion E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor E() { if (current  {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error; } LIT() { if (current == TRUE)match(TRUE); else if (current == FALSE)match(FALSE); elseerror; } OP() { if (current == AND)match(AND); else if (current == OR)match(OR); else if (current == XOR)match(XOR); elseerror; } 68

69 Adding semantic actions Can add an action to perform on each production rule Can build the parse tree – Every function returns an object of type Node – Every Node maintains a list of children – Function calls can add new children 69

70 Building the parse tree Node E() { result = new Node(); result.name = “E”; if (current  {TRUE, FALSE}) // E  LIT result.addChild(LIT()); else if (current == LPAREN) // E  ( E OP E ) result.addChild(match(LPAREN)); result.addChild(E()); result.addChild(OP()); result.addChild(E()); result.addChild(match(RPAREN)); else if (current == NOT) // E  not E result.addChild(match(NOT)); result.addChild(E()); else error; return result; } 70

71 Recursive descent How do you pick the right A -production? Generally – try them all and use backtracking In our case – use lookahead void A() { choose an A-production, A  X 1 X 2 …X k ; for (i=1; i≤ k; i++) { if (X i is a nonterminal) call procedure X i (); elseif (X i == current terminal) advance input; else report error; } 71

72 Technical challenges with recursive descent 72

73 With lookahead 1, the function for indexed_elem will never be tried… – What happens for input of the form ID[expr] term  ID | indexed_elem indexed_elem  ID [ expr ] Recursive descent: problem 1 73

74 Recursive descent: problem 2 int S() { return A() && match(token(‘a’)) && match(token(‘b’)); } int A() { return match(token(‘a’)) || 1; } S  A a b A  a |   What happens for input “ab”?  What happens if you flip order of alternatives and try “aab”? 74

75 Recursive descent: problem 3 int E() { return E() && match(token(‘-’)) && term(); } E  E - term | term  What happens with this procedure?  Recursive descent parsers cannot handle left-recursive grammars p. 127 75

76 Indirect left recursion 76 E  F - term | term F  E int E() { return F() && match(token(‘-’)) && term(); } int F() { return E(); }  A grammar is left-recursive if it allows a derivation sequence of the form S  * N   * N   Example: E  F - term  E - term

77 Next lecture: more on top-down parsing 77


Download ppt "Compiler Principles Fall 2014-2015 Compiler Principles Lecture 2: Parsing part 1 Roman Manevich Ben-Gurion University 1."

Similar presentations


Ads by Google