Download presentation
Presentation is loading. Please wait.
1
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #7 Parsing
2
2 Announcements Programming assignment 2 will be on the class webpage, due in two weeks, October 31, Thursday –In this assignment you will work as teams of two. Please find a partner. –Start the project early, don’t leave it to the last weekend! Homework 2 will be due Read chapter 4 Midterm will be in two weeks, November 5, Tuesday –in class –closed books, closed notes
3
3 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation) Pick a production and try to match the input Bad “pick” may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root We can think of the process as reducing the input string to the start symbol At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production Bottom-up parsers handle a large class of grammars
4
4 Eliminating Immediate Left Recursion To remove left recursion, we can transform the grammar Consider a grammar fragment of the form A A | where or are strings of terminal and nonterminal symbols and neither nor start with A We can rewrite this as A R R R | where R is a new non-terminal This accepts the same language, but uses only right recursion A A A A R R R
5
5 Left-Recursive and Right-Recursive Expression Grammar 1S Expr 2Expr Expr + Term 3 |Expr – Term 4|Term 5Term Term * Factor 6|Term / Factor 7|Factor 8Factor num 9|id 1S Expr 2Expr Term Expr 3 Expr + Term Expr 4|– Term Expr 5 | 6Term Factor Term 7Term * Factor Term 8 |/ Factor Term 9 | 10Factor num 11|id
6
6 Predictive Parsing Basic idea Given A , the parser should be able to choose between & F IRST sets For a string of grammar symbols , define FI RST ( ) as the set of tokens that appear as the first symbol in some string that derives from That is, x F IRST ( ) iff * x , for some The LL(1) Property If A and A both appear in the grammar, we would like F IRST ( ) F IRST ( ) = This would allow the parser to make a correct choice with a lookahead of exactly one symbol ! (Pursuing this idea leads to LL(1) parser generators...)
7
7 Recursive Descent Parsing Recursive-descent parsing A top-down parsing method The term descent refers to the direction in which the parse tree is traversed (or built). Use a set of mutually recursive procedures (one procedure for each nonterminal symbol) –Start the parsing process by calling the procedure that corresponds to the start symbol –Each production becomes one clause in procedure We consider a special type of recursive-descent parsing called predictive parsing –Use a lookahead symbol to decide which production to use
8
8 Recursive Descent Parsing: Expression Grammar void main() { lookahead=getNextToken(); S(); match(EOF); } void S() { Expr(); } void Expr() { Term(); ExprPrime(); } void ExprPrime() { switch(lookahead) { case PLUS : match(PLUS); Term(); ExprPrime(); break; case MINUS : match(MINUS); Term(); ExprPrime(); break; default: return; } void Term() { Factor(); TermPrime(); } void TermPrime() { switch(lookahead) { case TIMES: match(TIMES); Factor(); TermPrime(); break; case DIV: match(DIV); Factor(); TermPrime(); break; default: return; } void Factor() { switch(lookahead) { case ID : match(ID); break; case NUMBER: match(NUMBER); break; default: error(); } int PLUS=1, MINUS=2,... int lookahead; void match(int token) { if (lookahead==token) lookahead=getNextToken(); else error(); }
9
9 Recursive Descent Parsing: Another Grammar 1S if E then S else S 2|begin S L 3|print E 4L end 5|; S L 6E num = num void S() { switch(lookahead) { case IF: match(IF); E(); match(THEN); S(); match(ELSE); S(); break; case BEGIN: matvh(BEGIN); S(); L(); break; case PRINT: match(PRINT); E(); break; default: error(); } void E() { match(NUM); match(EQ); match(NUM); } void L() { switch(lookahead) { case END: match(END); break; case SEMI: match(SEMI); S(); L(); break; default: error(); } void main() { lookahead=getNextToken(); S(); match(EOF); }
10
10 Example Execution For Input: if 2=2 then print 5=5 else print 1=1 main: call S(); S 1 : find the production for (S, IF) : S if E then S else S S 1 : match(IF); S 1 : call E(); E 1 : find the production for (E, NUM): E num = num E 1 : match(NUM); match(EQ); match(NUM); E 1 : return from E 1 to S 1 S 1 : match(THEN); S 1 :call S(); S 2 : find the production for (S, PRINT): S print E S 2 : match(PRINT); S 2 : call E(); E 2 : find the production for (E, NUM): E num = num E 2 : match(NUM); match(EQ); match(NUM); E 2 : return from E 2 to S 2 S 2 : return from S 2 to S 1 S 1 : match(ELSE); S 1 : call S(); S 3 : find the production for (S, PRINT): S print E S 3 : match(PRINT); S 3 : call E(); E 3 : find the production for (E, NUM): E num = num E 3 : match(NUM); match(EQ); match(NUM); E 3 : return from E 2 to S 3 S 3 : return from S 3 to S 1 S 1 : return from S 1 to main main: match(EOF); return success;
11
11 Another Approach: Stack-Based Table-Driven Parsing The parsing table A two dimensional array M[A, a] gives a production –A: a nonterminal symbol –a: a terminal symbol What does it mean? –If top of the stack is A and the lookahead symbol is a then we apply the production M[A, a] IF BEGIN PRINT END SEMI NUM S S if E then S else S S begin S L S print E L L end L ; S L E E num = num
12
12 Table-driven Parsers A table-driven parser looks like Parsing tables can be built automatically! Scanner Table-driven Parser Parsing Table Parser Generator source code grammar IR Stack
13
13 Table-Driven Predictive Parsing Algorithm Push the end-of-file symbol ($) and the start symbol onto the stack Consider the symbol X on the top of the stack and lookahead symbol a –If X = a = $ announce successful parse and halt –If X = a $ pop X off the stack and advance the input pointer to the next input symbol – If X is a nonterminal, look at the production M[X, a] If there is no such production (M[X, a] = error), then call an error routine If M[X, a] is a production X Y 1 Y 2... Y k, then pop X and push Y k, Y k-1,..., Y 1 onto the stack with Y 1 on top –If none of the cases above apply, then call an error routine
14
14 Table-Driven Predictive Parsing Algorithm Push($); // $ is the end-of-file symbol Push(S); // S is the start symbol of the grammar lookahead = get_next_token(); repeat X = top_of_stack(); if (X is a terminal or X == $) then if (X == lookahead) then pop(X); lookahead = get_next_token(); else error(); else // X is a non-terminal if ( M[X, lookahead] == X Y 1 Y 2... Y k ) then pop(X); push(Y k ); push(Y k-1 );... push(Y 1 ); else error(); until (X = $)
15
15 Recursive Descent Parser On: if 2=2 then print 5=5 else print 1=1 main: call S(); S 1 : find the production for (S, IF) : S if E then S else S S 1 : match(IF); S 1 : call E(); E 1 : find the production for (E, NUM): E num = num E 1 : match(NUM); match(EQ); match(NUM); E 1 : return from E 1 to S 1 S 1 : match(THEN); S 1 :call S(); S 2 : find the production for (S, PRINT): S print E S 2 : match(PRINT); S 2 : call E(); E 2 : find the production for (E, NUM): E num = num E 2 : match(NUM); match(EQ); match(NUM); E 2 : return from E 2 to S 2 S 2 : return from S 2 to S 1 S 1 : match(ELSE); S 1 : call S(); S 3 : find the production for (S, PRINT): S print E S 3 : match(PRINT); S 3 : call E(); E 3 : find the production for (E, NUM): E num = num E 3 : match(NUM); match(EQ); match(NUM); E 3 : return from E 2 to S 3 S 3 : return from S 3 to S 1 S 1 : return from S 1 to main main: match(EOF); return success;
16
16 Table Driven Parser On: if 2=2 then print 5=5 else print 1=1$ StacklookaheadParse-table lookup $SIFM[S,IF]: S if E then S else S $S,ELSE,S,THEN,E,IFIF $S,ELSE,S,THEN,E NUMM[E,NUM]: E num = num $S,ELSE,S,THEN,NUM,EQ,NUMNUM $S,ELSE,S,THEN,NUM,EQEQ $S,ELSE,S,THEN,NUMNUM $S,ELSE,S,THENTHEN $S,ELSE,SPRINTM[S,PRINT]: S print E $S,ELSE,E,PRINTPRINT $S,ELSE,ENUMM[E,NUM]: E num = num $S,ELSE,NUM,EQ,NUMNUM $S,ELSE,NUM,EQEQ $S,ELSE,NUMNUM $S,ELSEELSE $SPRINTM[S,PRINT]: S print E $E,PRINTPRINT $ENUMM[E,NUM]: E num = num $NUM,EQ,NUMNUM $NUM,EQEQ $NUMNUM $$report success!
17
17 How to Build Parse Tables? FIRST Sets For a string of grammar symbols define FIRST( ) as The set of tokens that appear as the first symbol in some string that derives from If * , then is in FIRST( ) To construct FIRST(X) for a grammar symbol X, apply the following rules until no more symbols can be added to FIRST(X) If X is a terminal FIRST(X) is {X} If X is a production then is in FIRST(X) If X is a nonterminal and X Y 1 Y 2... Y k is a production then put every symbol in FIRST(Y 1 ) other than to FIRST(X) If X is a nonterminal and X Y 1 Y 2... Y k is a production, then put terminal a in FIRST(X) if a is in FIRST(Y i ) and is in FIRST(Y j ) for all 1 j i If X is a nonterminal and X Y 1 Y 2... Y k is a production, then put in FIRST(X) if is in FIRST(Y i ) for all 1 i k
18
18 Computing FIRST Sets for Strings of Symbols To construct the FIRST set for any string of grammar symbols X 1 X 2... X n (given the FIRST sets for symbols X 1, X 2,... X n ) apply the following rules. FIRST(X 1 X 2... X n ) contains: –Any symbol in FIRST(X 1 ) other than –Any symbol in FIRST(X i ) other than , if is in FIRST(X j ) for all 1 j i – , if is in FIRST(X j ) for all 1 i n
19
19 FIRST Sets 1S Expr 2Expr Term Expr 3Expr + Term Expr 4 |- Term Expr 5| 6Term Factor Term 7Term * Factor Term 8|/ Factor Term 9| 10Factor num 11|id SymbolFIRST S{num, id} Expr{num, id} Expr{ , +, - } Term{num, id} Term{ , *, / } Factor{num, id} num{num} id{id} +{+} -{-} *{*} /{/}
20
20 How to build Parse Tables? FOLLOW Sets For a non-terminal symbol A, define FOLLOW(A) as: The set of terminal symbols that can appear immediately to the right of A in some sentential form To construct FOLLOW(A) for a non-terminal symbol A apply the following rules until no more symbols can be added to FOLLOW(A) Place $ in FOLLOW(S) ($ is the end-of-file symbol, S is the start symbol) If there is a production A B , then everything in FIRST( ) except is placed in FOLLOW(B) If there is a production A B, then everything in FOLLOW(A) is placed in FOLLOW(B) If there is a production A B , and is in FIRST( ) then everything in FOLLOW(A) is placed in FOLLOW(B)
21
21 FOLLOW Sets 1S Expr 2Expr Term Expr 3Expr + Term Expr 4 |- Term Expr 5| 6Term Factor Term 7Term * Factor Term 8|/ Factor Term 9| 10Factor num 11|id SymbolFOLLOW S{ $ } Expr{ $ } Term{ $, +, - } Factor{ $, +, -, *, / }
22
22 LL(1) Parse Table Construction For all productions A , perform the following steps: –For each terminal symbol a in FIRST( ), add A to M[A, a] –If is in FIRST( ), then add A to M[A, b] for each terminal symbol b in FOLLOW(A) and add A to M[A, $] if $ is in FOLLOW(A) Set all the undefined entries in M to error
23
23 1S Expr 2Expr Term Expr 3Expr + Term Expr 4 |- Term Expr 5| 6Term Factor Term 7Term * Factor Term 8|/ Factor Term 9| 10Factor num 11|id id num + - * / $ S S E S E E E T E E T E E’ E + T E E - T E E T T F T T F T T’ T’ T’ T * F T T / F T T’ F F id F num Grammar: LL(1) Parse table:
24
24 LL(1) gramars Left-to-right scan of the input, Leftmost derivation, 1-token lookahead Two alternative definitions of LL(1) grammars: 1.A grammar G is LL(1) if there are no multiple entries in its LL(1) parse table 2.A grammar G is LL(1) if for each set of its productions A 1 | 2 |... | n FIRST( 1 ), FIRST( 2 ),..., FIRST( n ), are all pairwise disjoint If i * , then FIRST ( j ) FOLLOW (A) = for all 1 i n, i j
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.