Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall Compiler Principles Lecture 2: LL parsing

Similar presentations


Presentation on theme: "Fall Compiler Principles Lecture 2: LL parsing"— Presentation transcript:

1 Fall 2017-2018 Compiler Principles Lecture 2: LL parsing
Roman Manevich Ben-Gurion University of the Negev

2 Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Modern Compiler Implementation in Java Andrew W. Appel Modern Compiler Design D. Grune, H. Bal, C. Jacobs, K. Langendoen Advanced Compiler Design and Implementation Steven Muchnik

3 Tentative syllabus mid-term exam Front End Intermediate Representation
Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Intermediate Representation Operational Semantics Lowering Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection mid-term exam

4 Context-free grammars
start nonterminal terminal S  E $ E  T E  E + T T  id T  ( E ) nonterminal production / rule

5 Context-free languages
Sentential forms Derivations (leftmost, rightmost) Language = all derivable words Derivation tree (also called parse tree) Language = all yields of derivation trees Ambiguous grammars

6 Agenda Understand role of syntax analysis Parsing strategies
LL parsing Building a predictor table via FIRST/FOLLOW/NULLABLE sets Handling conflicts

7 Role of syntax analysis
High-level Language (scheme) Executable Code Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation Recover structure from stream of tokens Parse tree / abstract syntax tree Error reporting (recovery) Other possible tasks Syntax directed translation (one pass compilers) Create symbol table Create pretty-printed version of the program, e.g., Auto Formatting function in IDE

8 From tokens to abstract syntax trees
program text 59 + (1257 * xPosition) Lexical Analyzer Regular expressions Finite automata Lexical error valid token stream ) id * num ( + Grammar: E  id E  num E  E + E E  E * E E  ( E ) Parser Context-free grammars Push-down automata syntax error valid + num x * Abstract Syntax Tree

9 Marking “end-of-file”
Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ and a new production rule S’  S $ $ is not part of the set of tokens It is a special End-Of-File (EOF) token To parse α with G’ we change it into α $ Simplifies parsing grammars with null productions Also simplifies parsing LR grammars Blank space character ˽

10 Another convention We will assume that all productions have been consecutively numbered (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E )

11 Parsing strategies

12 Broad kinds of parsers Parsers for arbitrary grammars
Cocke-Younger-Kasami [‘65] method O(n3) Earley’s method (implemented by NLTK) O(n3) but lower for restricted classes Not commonly used by compilers Parsers for restricted classes of grammars Top-Down Predictive – LL parsing Backtracking – recursive descent / combinators Bottom-Up – LR parsing

13 Top-down parsing Constructs parse tree in a top-down matter
Find leftmost derivation Predictive: for every non-terminal and k-tokens predict the next production LL(k) Challenge: beginning with the start symbol, try to guess the productions to apply to end up at the user's program By Fidelio (Own work) [GFDL ( or CC-BY-SA ( via Wikimedia Commons

14 Predictive parsing

15 Exercise: show leftmost derivation
(1) E  LIT (2) | (E OP E) (3) | not E (4) LIT  true (5) | false (6) OP  and (7) | or (8) | xor How did we decide which production of ‘E’ to take? E  not E  E not ( E OP E )  not E not ( not E OP E )  not ( not LIT OP E )  ( E OP E ) not ( not true OP E )  not E or LIT not ( not true or E )  not ( not true or LIT )  LIT false not ( not true or false ) true

16 Predictive parsing Given a grammar G attempt to derive a word ω Idea
Scan input from left to right Apply production to leftmost nonterminal Pick production rule based on next (1) input token Problem: there is more than one production based for next token Solution: restrict grammars to LL(1) Parser correctly predicts which production to apply If grammar is not in LL(1) the parser construction algorithm will detect it

17 LL(1) parsing via pushdown automata
Input stream $ b + a Stack of symbols (current sentential form) Parsing program X Y Z $ Derivation tree / error Prediction table nonterminal token production

18 LL(1) parsing algorithm
Initialze stack to S $ while true Prediction When top of stack is nonterminal N Pop N lookup Table[N,t] If table[N,t] is not empty, push Table[N,t] on stack else return syntax error Match When top of stack is terminal t If t=next input token, pop t and increment input index else return syntax error End When stack is empty If input is empty return success else return syntax error

19 Example prediction table
(1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Table entries determine which production to take Input tokens ( ) not true false and or xor $ E 2 3 1 LIT 4 5 OP 6 7 8 Nonterminals

20 Running parser example
aacbb$ S  aSb | c Input suffix Stack content Move aacbb$ S$ predict(S,a) = S  aSb aSb$ match(a,a) acbb$ Sb$ aSbb$ cbb$ Sbb$ predict(S,c) = S  c match(c,c) bb$ match(b,b) b$ $ match($,$) – success a b c S S  aSb S  c

21 Illegal input example abcbb$ S  aSb | c Input suffix Stack content
Move abcbb$ S$ predict(S,a) = S  aSb aSb$ match(a,a) bcbb$ Sb$ predict(S,b) = ERROR a b c S S  aSb S  c

22 Building the prediction table
Let G be a grammar Compute FIRST/NULLABLE/FOLLOW Check for conflicts No conflicts => G is an LL(1) grammar Conflicts exit => G is not an LL(1) grammar Attempt to transform G into an equivalent LL(1) grammar G’

23 First sets

24 FIRST sets Definition: For a nonterminal A, FIRST(A) is the set of terminals that can start in a sentence derived from A Formally: FIRST(A) = {t | A * t ω} Definition: For a sentential form α, FIRST(α) is the set of terminals that can start in a sentence derived from α Formally: FIRST(α) = {t | α * t ω}

25 FIRST sets example E  LIT | (E OP E) | not E LIT  true | false
OP  and | or | xor FIRST(E) = …? FIRST(LIT) = …? FIRST(OP) = …?

26 FIRST sets example E  LIT | (E OP E) | not E LIT  true | false
OP  and | or | xor FIRST(E) = FIRST(LIT)  FIRST(( E OP E ))  FIRST(not E) FIRST(LIT) = { true, false } FIRST(OP) = {and, or, xor} A set of recursive equations How do we solve them?

27 Assume no null productions (A  )
Computing FIRST sets Assume no null productions (A  ) Initially, for all nonterminals A, set FIRST(A) = { t | A  t ω for some ω } Repeat the following until no changes occur: for each nonterminal A for each production A  α1 | … | αk FIRST(A) := FIRST(α1)  …  FIRST(αk) This is known as a fixed-point algorithm We will see such iterative methods later in the course and learn to reason about them

28 Exercise: compute FIRST
FIRST(STMT) = FIRST(if EXPR then STMT) ∪ FIRST(while EXPR do STMT) ∪ FIRST(EXPR) FIRST(EXPR) = FIRST(TERM -> id) ∪ FIRST(zero? TERM) ∪ FIRST(not EXPR) ∪ FIRST(++ id) ∪ FIRST(-- id) FIRST(TERM) = FIRST(id) ∪ FIRST(constant) STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERM EXPR STMT

29 Exercise: compute FIRST
FIRST(STMT) = {if, while} ∪ FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} ∪ FIRST(TERM) FIRST(TERM) = {id, constant} STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERM EXPR STMT

30 1. Initialization FIRST(STMT) = {if, while} ∪ FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} ∪ FIRST(TERM) FIRST(TERM) = {id, constant} STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERM EXPR STMT id constant zero? Not ++ -- if while

31 2. Iterate 1 FIRST(STMT) = {if, while} ∪ FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} ∪ FIRST(TERM) FIRST(TERM) = {id, constant} STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERM EXPR STMT id constant zero? Not ++ -- if while

32 2. Iterate 2 FIRST(STMT) = {if, while} ∪ FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} ∪ FIRST(TERM) FIRST(TERM) = {id, constant} STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERM EXPR STMT id constant zero? Not ++ -- if while

33 2. Iterate 3 – fixed-point FIRST(STMT) = {if, while} ∪ FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} ∪ FIRST(TERM) FIRST(TERM) = {id, constant} STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERM EXPR STMT id constant zero? Not ++ -- if while

34 Reasoning about the algorithm
Assume no null productions (A  ) Initially, for all nonterminals A, set FIRST(A) = { t | A  t ω for some ω } Iterate to fixpoint: for each nonterminal A for each production A  α1 | … | αk FIRST(A) := FIRST(α1) ∪ … ∪ FIRST(αk) Is the algorithm correct? Does it terminate? (complexity)

35 Reasoning about the algorithm
Termination: Correctness:

36 LL(1) Parsing of grammars without epsilon productions

37 Using FIRST sets Assume G has no epsilon productions and for every non-terminal X and every pair of productions X   and X   we have that FIRST()  FIRST() = {} No intersection between FIRST sets => can always pick a single rule

38 Using FIRST sets In our Boolean expressions example
FIRST( LIT ) = { true, false } FIRST( ( E OP E ) ) = { ‘(‘ } FIRST( not E ) = { not } If the FIRST sets intersect, may need longer lookahead LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens LL(1) is an important and useful class

39 Exercise: LL(1) prediction table
Terminals: id . num $ (1) S  E $ (2) E  A B (3) E  B (4) A  id B (5) B  . id A (6) B  num FIRST(S) = FIRST(E) = FIRST(A) = FIRST(B) = id . num $ S E A B

40 Extending LL(1) Parsing for epsilon productions

41 FIRST, FOLLOW, NULLABLE sets
For each non-terminal X FIRST(X) = set of terminals that can start in a sentence derived from X FIRST(X) = {t | X * t ω} NULLABLE(X) if X *  FOLLOW(X) = set of terminals that can follow X in some derivation FOLLOW(X) = {t | S *  X t }

42 Computing the NULLABLE set
Lemma: NULLABLE(1 … k) = NULLABLE(1) …  NULLABLE(k) If X  1 | … | k then we have the following equation: NULLABLE(X) = NULLABLE(1) …  NULLABLE(k) Initially NULLABLE(X) = false Iterate to fixpoint: for each production Y  1 … k if NULLABLE(1 … k) then NULLABLE(Y) = true

43 Exercise: compute NULLABLE
S  A a b A  a |  B  A B | C C  b |  NULLABLE(S) = NULLABLE(A)  NULLABLE(a)  NULLABLE(b) NULLABLE(A) = NULLABLE(a)  NULLABLE() NULLABLE(B) = NULLABLE(A)  NULLABLE(B)  NULLABLE(C) NULLABLE(C) = NULLABLE(b)  NULLABLE()

44 FIRST with epsilon productions
How do we compute FIRST(1 … k) when epsilon productions are allowed? FIRST(1 … k) = ?

45 FIRST with epsilon productions
How do we compute FIRST(1 … k) when epsilon productions are allowed? FIRST(1 … k) = if not NULLABLE(1) then FIRST(1) else FIRST(1)  FIRST (2 … k)

46 Exercise: compute FIRST
S  A c b A  a |  NULLABLE(S) = NULLABLE(A)  NULLABLE(c)  NULLABLE(b) NULLABLE(A) = NULLABLE(a)  NULLABLE() FIRST(S) = FIRST(A)  FIRST(cb) FIRST(A) = FIRST(a)  FIRST () FIRST(S) = FIRST(A)  {c} FIRST(A) = {a} S  A c b A  a |  What should we predict for input “acb”? What should we predict for input “cb”?

47 FOLLOW sets FOLLOW(X) = set of terminals that can follow X in some derivation FOLLOW(X) = {t | S *  X t }

48 FOLLOW sets p. 189 if X  α Y  then FOLLOW(Y) ? if NULLABLE() or = then FOLLOW(Y) ?

49 FOLLOW sets p. 189 if X  α Y  then FOLLOW(Y)  FIRST() if NULLABLE() or = then FOLLOW(Y) ?

50 FOLLOW sets p. 189 if X  α Y  then FOLLOW(Y)  FIRST() if NULLABLE() or = then FOLLOW(Y)  FOLLOW(X)

51 FOLLOW sets p. 189 if X  α Y  then FOLLOW(Y)  FIRST() if NULLABLE() (or =) then FOLLOW(Y)  FOLLOW(X) Allows predicting nullable productions: X  α where NULLABLE(α) when the lookahead token is in FOLLOW(X) S  A c b A  a |  What should we predict for input “cb”? What should we predict for input “acb”? | c

52 Filling the prediction table
Table[N, t] = N  α if t  FIRST(α) or NULLABLE(α) and t  FOLLOW(N)

53 LL(1) conflicts

54 Conflicts FIRST-FIRST conflict X  α and X   and
If FIRST(α)  FIRST(β)  {} FIRST-FOLLOW conflict X  α NULLABLE(α) If FIRST(α)  FOLLOW(X)  {}

55 LL(1) grammars A grammar is in the class LL(1) when its LL(1) prediction table contains no conflicts A language is said to be LL(1) when it has an LL(1) grammar

56 LL(k) grammars

57 LL(k) grammars Generalizes LL(1) for k lookahead tokens
Need to generalize FIRST and FOLLOW for k lookahead tokens

58 Agenda Understand role of syntax analysis Parsing strategies
LL parsing Building a predictor table via FIRST/FOLLOW/NULLABLE sets Handling conflicts

59 Handling conflicts

60 Problem 1: FIRST-FIRST conflict
term  ID | indexed_elem indexed_elem  ID [ expr ] FIRST(indexed_elem) = { ID } How can we transform the grammar into an equivalent grammar that does not have this conflict?

61 Solution: left factoring
Rewrite the grammar to be in LL(1) term  ID | indexed_elem indexed_elem  ID [ expr ] New grammar is more complex – has epsilon production term  ID after_ID After_ID  [ expr ] |  Intuition: just like factoring in algebra: x*y + x*z into x*(y+z)

62 Exercise: apply left factoring
S  if E then S else S | if E then S | T

63 Exercise: apply left factoring
S  if E then S else S | if E then S | T S  if E then S S’ | T S’  else S | 

64 Problem 2: FIRST-FOLLOW conflict
S  A a b A  a |  FIRST(S) = { a } FOLLOW(S) = { } FIRST(A) = { a } FOLLOW(A) = { a } How can we transform the grammar into an equivalent grammar that does not have this conflict?

65 Solution: substitution
S  A a b A  a |  Substitute A in S S  a a b | a b

66 Solution: substitution
S  A a b A  a |  Substitute A in S S  a a b | a b Left factoring S  a after_A after_A  a b | b

67 Problem 3: FIRST-FIRST conflict
E  E - term | term Left recursion cannot be handled with a bounded lookahead How can we transform the grammar into an equivalent grammar that does not have this conflict?

68 Solution: left recursion removal
p. 130 N  Nα | β N  βN’ N’  αN’ |  G1 G2 L(G1) = β, βα, βαα, βααα, … L(G2) = same Can be done algorithmically. Problem 1: grammar becomes mangled beyond recognition Problem 2: grammar may not be LL(1) For our 3rd example: E  E - term | term E  term TE | term TE  - term TE | 

69 Recap Given a grammar Compute for each non-terminal
NULLABLE FIRST using NULLABLE FOLLOW using FIRST and NULLABLE Compute FIRST for each sentential form appearing on right-hand side of a production Check for conflicts If exist, attempt to remove conflicts by rewriting grammar

70 Next lecture: bottom-up parsing


Download ppt "Fall Compiler Principles Lecture 2: LL parsing"

Similar presentations


Ads by Google