Compilers Welcome to a journey to CS419 Lecture15: Syntax Analysis: Cairo University FCI Welcome to a journey to Compilers CS419 Lecture15: Syntax Analysis: Top-Down Parsing (Prerequisites) (Cont’d) Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University
designing a top-down parser: Dr. Mohammad Nassef designing a top-down parser: Steps: Elimination of Ambiguity Elimination of Left Recursion Left Factoring Drawing Transition Diagram (Optional) Applying First/Follow operators Building the Parsing Table Parse the given statements
Transition diagrams Transition diagrams can describe recursive parsers, just like they can describe lexical analyzers, but the diagrams are slightly different. Construction: Eliminate left recursion from grammar G Left factor grammar G For each non-terminal A, do: Create an initial and final (return) state For each production A -> X1 X2 … Xn, create a path from the initial to the final state with edges X1 X2 … Xn.
Example transition diagrams An expression grammar with left recursion and ambiguity removed: E -> T E’ E’ -> + T E’ | ε T -> F T’ T’ -> * F T’ | ε F -> ( E ) | id Example : parse the string “id + id * id” Corresponding transition diagrams:
Using transition diagrams Begin in the start state for the start symbol When we are in state s with edge labeled by terminal a to state t, if the next input symbol is a, move to state t and advance the input pointer. For an edge to state t labeled with non-terminal A, jump to the transition diagram for A, and when finished, return to state t For an edge labeled ε, move immediately to t.
Procedure Make a transition diagram( like DFA/NFA) for every rule of the grammar. Optimize the DFA by reducing the number of states, yielding the final transition diagram To parse a string, simulate the string on the transition diagram If after consuming the input the transition diagram reaches an accept state, it is parsed.
designing a top-down parser: Dr. Mohammad Nassef designing a top-down parser: Steps: Elimination of Ambiguity Elimination of Left Recursion Left Factoring Drawing Transition Diagram (Optional) Applying First/Follow operators Building the Parsing Table Parse the given statements
Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need to backtrack Can we avoid the backtracking? Given A | the parser should be able to choose between and How? What if we do some "preprocessing" to answer the question: Given a non-terminal A and look-ahead t, which (if any) production of A is guaranteed to start with a t?
Predictive parsing Armed with FIRST FOLLOW We can build a parser where no backtracking is required!
Example grammar for first/follow EE+E EE*E E(E) Eid Original grammar: This grammar is left-recursive, ambiguous and requires left-factoring. It needs to be modified before we build a predictive parser for it: Remove ambiguity: Remove left recursion: ETE' E'+TE'| TFT' T'*FT'| F(E) Fid EE+T TT*F F(E) Fid
Computing first: Compute FIRST(X) as follows: if X is a terminal, then FIRST(X)={X} if X is a production, then add to FIRST(X) if X is a non-terminal and XY1Y2...Yn is a production, add FIRST(Y1) to FIRST(X) if X is a non-terminal and XY1Y2...Yn is a production, add FIRST(Yi) to FIRST(X) if the preceding Yj’s contain in their FIRSTs Focus on L.H.S of productions
FIRST Example E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id CS416 Compiler Design Fall 2003 FIRST Example E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id FIRST(F) = {(,id} FIRST(T’) = {*, } FIRST(T) = FIRST(F) = {(,id} FIRST(E’) = {+, } FIRST(E) = FIRST(T) = {(,id} FIRST(TE’) = {(,id} FIRST(+TE’ ) = {+} FIRST() = {} FIRST(FT’) = {(,id} FIRST(*FT’) = {*} FIRST((E)) = {(} FIRST(id) = {id}
Computing follow Compute FOLLOW as follows: FOLLOW(S) contains EOF (or $) For productions AB, everything in FIRST() except goes into FOLLOW(B) For productions AB or AB where FIRST() contains , FOLLOW(B) contains everything that is in FOLLOW(A) Focus on R.H.S of productions
FOLLOW Example E TE’ E’ +TE’ | T FT’ T’ *FT’ | CS416 Compiler Design Fall 2003 FOLLOW Example E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id FOLLOW(E) = { $, ) } FOLLOW(E’) = Follow(E) = { $, ) } FOLLOW(T) = FIRST(E’) + FOLLOW(E’) = { +, ), $ } FOLLOW(T’) = FOLLOW(T) = { +, ), $ } FOLLOW(F) = FIRST(T’) + FOLLOW(T’) = {+, *, ), $ }
designing a top-down parser: Dr. Mohammad Nassef designing a top-down parser: Steps: Elimination of Ambiguity Elimination of Left Recursion Left Factoring Drawing Transition Diagram (Optional) Applying First/Follow operators Building the Parsing Table Parse the given statements
Predictive parsing (w/table) For each production A do: For each terminal a FIRST() add A to entry M[A,a] If FIRST(), add A to entry M[A,b] for each terminal b FOLLOW(A). If FIRST() and EOFFOLLOW(A), add A to M[A,EOF] Use table and stack to simulate recursion.
LL(1) Parsing table E TE’ E’ +TE’ | T FT’ T’ *FT’ | CS416 Compiler Design Fall 2003 LL(1) Parsing table E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E') = {+, } FIRST(T') = {*, } FOLLOW(E) = FOLLOW(E') = {$, )} FOLLOW(T) = FOLLOW(T') = {+, $, )} FOLLOW(F) = {*, +, $, )} id + * ( ) $ E E TE’ E’ E’ +TE’ E’ T T FT’ T’ T’ T’ *FT’ F F id F (E) Is this grammar LL(1)? Yes, because each cell has only one production!
designing a top-down parser: Dr. Mohammad Nassef designing a top-down parser: Steps: Elimination of Ambiguity Elimination of Left Recursion Left Factoring Drawing Transition Diagram (Optional) Applying First/Follow operators Building the Parsing Table Parse the given statements
LL(1) Parser – Example id + * ( ) $ E E TE’ E’ E’ +TE’ E’ T CS416 Compiler Design Fall 2003 LL(1) Parser – Example id + * ( ) $ E E TE’ E’ E’ +TE’ E’ T T FT’ T’ T’ T’ *FT’ F F id F (E) stack input output $E id+id$ E TE’ $E’T id+id$ T FT’ $E’ T’F id+id$ F id $ E’ T’id id+id$ $ E’ T’ +id$ T’ $ E’ +id$ E’ +TE’ $ E’ T+ +id$ $ E’ T id$ T FT’
LL(1) Parser – Example (Cont’d) CS416 Compiler Design Fall 2003 LL(1) Parser – Example (Cont’d) id + * ( ) $ E E TE’ E’ E’ +TE’ E’ T T FT’ T’ T’ T’ *FT’ F F id F (E) stack input output $ E’ T id$ T FT’ $ E’ T’ F id$ F id $ E’ T’id id$ $ E’ T’ $ T’ $ E’ $ E’ $ $ accept