156 LR(k) Parsing We learned that the typical strategy of LL(k) parsing is that if a nonterminal symbol A appears at the stack top, the parser chooses.

Slides:

Advertisements

Similar presentations

Parsing V: Bottom-up Parsing

Advertisements

 CS /11/12 Matthew Rodgers.  What are LL and LR parsers?  What grammars do they parse?  What is the difference between LL and LR?  Why do.

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.

LR-Grammars LR(0), LR(1), and LR(K).

142 Parsing start (a, Z 0 /aZ 0 ) ( a, a/aa ) (b, a/  ) ( , Z 0 /Z 0 ) For a given CFG G, parsing a string w is to check if w  L(G) and, if it is,

Pushdown Automata Part II: PDAs and CFG Chapter 12.

CS5371 Theory of Computation

Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)

Top-Down Parsing.

CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.

56 Rumination on the Formal Definition of DPDA In the definition of DPDA, there are some parts that do not agree with our intuition. Let M = (Q, , ,

COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.

January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.

Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.

CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.

Normal forms for Context-Free Grammars

LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.

COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.

CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.

Bottom-up parsing Goal of parser : build a derivation

CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

Pushdown Automata.

Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Pushdown Automata (PDAs)

Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.

Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.

Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.

Chapter 7 Pushdown Automata

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.

Top-Down Parsing.

Context-Free Languages

Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,

Chapter 19 LL(k) Grammars. 2 LL(k) Parsers n Can be developed using PDAs for parsing CFGs by converting the machines directly into program statements.

GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.

CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.

CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.

CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.

Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.

CS6800 Advance Theory of Computation Spring 2016 Nasser Alsaedi

1 Key to Homework #3 (Models of Computation, Spring, 2001) Finite state control with a 4-way read/write head Figure (a)Figure (b) (over)

Lecture 17: Theory of Automata:2014 Context Free Grammars.

Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.

Programming Languages Translator

Bottom-up parsing Goal of parser : build a derivation

Parsing IV Bottom-up Parsing

Table-driven parsing Parsing performed by a finite state machine.

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Pushdown Automata.

PZ03A - Pushdown automata

Context-Free Languages

Compiler Design 7. Top-Down Table-Driven Parsing

Midterm (Models of Computation, Fall, 2000)

Bottom-Up Parsing “Shift-Reduce” Parsing

Key to Homework #8 1. For each of the following context-free grammars (a) and (b) below, construct an LL(k) parser with minimum k according to the following.

ReCap Chomsky Normal Form, Theorem regarding CNF, examples of converting CFG to be in CNF, Example of an FA corresponding to Regular CFG, Left most and.

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

Presentation transcript:

156 LR(k) Parsing We learned that the typical strategy of LL(k) parsing is that if a nonterminal symbol A appears at the stack top, the parser chooses a production rule for A, say A , depending on the k look-ahead contents, if needed, and pops the nonterminal A at the stack top by . In LR(k) parsing, the parser does the opposite, i.e., when the machine sees (indeed reads from inside out)  at the stack top portion, chooses a production rule that generates , say A , depending on the k look-ahead contents, if needed, and replaces  with A (see the figure on the following slide). Notice that  0 in general. Hence, LR(k) parser have additional capability of looking down the stack of some constant depth. The depth is no greater than the maximum length of the right side of production rules. If such  does not appear at the stack, the parser shifts in (i.e., reads and pushes) the next input symbol and again examine if stack top portion can be reduced by applying a production rule. The parser iteratively applies this strategy until the whole input string is processed and the stack contains only the start symbol S, which is the final accepting configuration. If an input string x has been parsed successfully, the sequence of production rules applied to reduce the stack turns out to be exactly the reverse order of the rules applied for the rightmost derivation for x.

Z 0 q1q A   ? G RR A   | ... q1q A Z 0 A   ! G A   | ...

158 For the basic strategy of LR(k) parsing consider the following simple grammar. S  A | B A  abc B  abd We can easily construct an LL(3) parser for this grammar. With LR(k) parsing strategy, we can build LR(0) parser for the grammar, i.e., the parser does not need to look ahead. For input string abc, the parser parses the input as follows. The parser shifts in the input onto the stack until it sees abc (read the stack inside out) at the stack top portion. This stack top portion is replaced (popped out and then pushed) with A. String abd can also be parsed similarly. (q 0, abc, Z 0 ) ...  ( q 1, , cbaZ 0 )  ( q 1, , AZ 0 )  ( q 1, , SZ 0 ) As in LL(k) parsing, the parser may encounter an uncertainty in reducing the stack top portion. If the grammar is LR(k), the parser can resolve the ambiguity by looking the input string ahead no more than k cells. We shall see more such examples.

159 Notice that the language of this grammar is {aaaddd}  {aaadddc i | i > 0}. We first examine how string aaabbbccc can be parsed according to LR(k) strategy. The rightmost derivation for this string is shown below. We will show how our parser parses this string by applying the production rules in reverse order of the rules applied for the rightmost derivation, i.e., (3)(4)(6)(5)(5)(1). Example 1. For the following CFG we will construct an LR(k) parser with minimum k. S  ADC  ADCc  ADCcc  ADccc  Adddccc  aaadddccc (1) (5) (5) (6) (4) (3) S  ADC  aaaddd A  aaa D  ddd C  Cc  c (1) (2)(3)(5) (6)(4)

160 Initially there is no string in the stack that can be reduced by applying a production rule. The parser, reading the input symbols, shifts them into the stack until it sees a stack top portion aaa that can possibly be produced by the last step of the rightmost derivation, which is rule (3). So the parser does the following. (q 0, aaadddccc, Z 0 )  ( q 1, aadddccc, aZ 0 )  (q 1, adddccc, aaZ 0 )  (q 1, dddcc, aaaZ 0 )  ? S  ADC  aaaddd A  aaa D  ddd C  Cc  c (1) (2)(3)(5) (6)(4) Can we let the parser simply replace the stack top portion aaa by A? No, because if the input were aaaddd, the parser will fail to parse the string, since A is not used to derive it. For this input it is too early to apply a rule. So the parser needs some information from the input to resolve this problem.

161 Notice that if the input were aaabbb, the parser should have shifted the whole input string into the stack and reduce it by S, which results in a successful parsing as follows. (Recall that the parser reads the stack top portion inside out.) (2) (q 0, aaaddd, Z 0 )  ( q 1, aaddd, aZ 0 ) ....  (q 1, , dddaaaZ 0 )  (q 1, , SZ 0 ) However, for input string aaadddccc, the parser should not shift in the d’s, because it will lose the chance for applying rule (3) to reduce aaa by A at the stack top. To choose the right step, the parser needs to look ahead at least 4 cells to see if there appears c after 3 d’s. For the given input the parser looks ahead and sees dddc, and decides that it is right time to reduce (i.e., replace) the stack top portion aaa with A applying rule (3) as follows. S  ADC  aaaddd A  aaa D  ddd C  Cc  c (1) (2)(3)(5) (6)(4) (3) (q 0, aaadddccc, Z 0 ) ..  ( q 1, aadddccc, aaZ 0 )  (q 1, dddccc, aaaZ 0 )  (q 1, dddccc, AZ 0 )  ?

162 Now the parser sees no stack top portion that is reducible until three d’s are shifted in. This string ddd is reduced by applying rule (4) without looking ahead, because this is the only case. When the next c is shifted in onto the stack, it is easily identified that rule (6) must be applied, because the leftmost c is produced by this rule. Thus, the c at the stack top is reduced by C. Notice that the parser does not need to looking ahead for this reduction. S  ADC  aaaddd A  aaa D  ddd C  Cc  c (1) (2)(3)(5) (6)(4) Now, there is ADC in the stack top portion which could be reducible by S. However, since there are more c’s in the input, it is too early to apply this rule. To resolve this problem the parser needs 1 look-ahead, and shift in the remaining c’s one by one and reduce the stack top portion Cc with C applying rule (5) until all c’s in the input tape are shifted in and reduced. (4) (q 1, dddccc, AZ 0 ) ....  (q 1, ccc, dddAZ 0 )  (q 1, ccc, DAZ 0 )  (6) (q 1, cc, cDAZ 0 )  (q 1, cc, CDAZ 0 )  ?

163 (q 1, cc, CDAZ 0 )  (q 1, c, cCDAZ 0 )  (q 1, c, CDAZ 0 )  (q 1, , cCDAZ 0 )  (q 1, , CDAZ 0 )  (q 1, , SZ 0 ) (5) (1) Only when the last c has been shifted in, and the Cc at the stack top reduced by C, the parser should reduce the ADC by S, and can enter in the successful final configuration (q 1, , SZ 0 ). Clearly, the sequence of production rules applied by this LR(4) parser (3)(4)(6)(5)(5)(1) is exactly the reverse sequence of the rules applied for the rightmost derivation of the input string. This parser is formally defined with, so called, the reduction table as shown on the next slide. S  ADC  aaaddd A  aaa D  ddd C  Cc  c (1) (2)(3)(5) (6)(4)

164 dddc dddB BBBB cxxx xxxx aaa ddd aaaddd c Cc ADC 4 look-ahead Stack top portion A D C S C Shift-in x: don’t care B : blanks S Shift-in Reduction Table S  ADC  aaaddd A  aaa D  ddd C  Cc  c (1) (2)(3)(5) (6)(4)

165 Example 2. Construct an LR(k) parser for the following grammar with minimum k. S  EAF  EBF E  aE  a A  aaab B  aaac F  aaad (1) (2) (3) (4) (5) (6) (7) It is easy to see that this grammar generates the following language. {a i xaaad  i  1, x  { aaab, aaac} } Notice that above grammar is not LL(k) grammar for any constant k, because in the language specification, string x which provides the information for choosing one of rules (1) and (2) can be at arbitrarily far to the right from the first input symbol. We can construct an LR(4) parser for this grammar. As we did for the previous examples, we will first analyze how we can construct such parser with a typical string of the grammar. Consider string aaaaaabaaad which is generated by the following rightmost derivation. S  EAF  EAaaad  Eaaabaaad  aEaaabaaad  aaEaaabaad  aaaaaabaad (1) (7) (5) (3) (3) (4)

166 Since every rightmost derivation of the grammar ends by applying rule (4) E  a, the initial objective of the parser to bring in the a generated by rule (4) from the input tape to the stack top. Then the parser can simply reduces it by E applying rule (4). Notice that for every string in the language {a i xaaad  i  1, x  { aaab, aaac}}, the part a i is produced by E  aE  a, with the last a generated by rule E  a that is followed by either aaab or aaac. Hence, if the parser shifts in a’s from the input until it sees ahead either aaab or aaac, then the a on top of the stack is the one produced by E  a. Thus, the parser does the following based on this observation. (q 0, aaaaaabaaad, Z 0 )  (q 1, aaaaabaaad, aZ 0 )  ….  (q 1, aaabaaad, aaaZ 0 )  (q 1, aaabaad, EaaZ 0 )  ? (4) S  EAF  EBF E  aE  a A  aaab B  aaac F  aaad (1) (2) (3) (4) (5) (6) (7)

167 The stack top can be reduced further by applying E  aE twice. Notice that the a’s in the stack are generated by this rule. (q 1, aaabaaad, EaaZ 0 )  (q 1, aaabaaad, EaZ 0 )  (q 1, aaabaaad, EZ 0 )  ? (3) Since the stack top cannot be reduced further, input symbols are shifted in until the next reduction is possible. Notice that throughout these last moves, the parser needs no look ahead. (q 1, aabaad, aEZ 0 )  …  (q 1, aaad, baaaEZ 0 )  (q 1, aaad, AEZ 0 )  (q 1, aad, aAEZ 0 ) ... (q 1, , daaaAEZ 0 )  (q 1, , FAEZ 0 )  (q 1, , SZ 0 ) (5) (7) (1) S  EAF  EBF E  aE  a A  aaab B  aaac F  aaad (1) (2) (3) (4) (5) (6) (7)

168 The sequence of rules applied by the LR(4) is (7)(6)(5)(4)(3)(2)(1), which is the sequence applied for the rightmost derivation of the input string. Based on this analysis we can construct the following reduction table of the parser. S  EAF  EBF E  aE  a A  aaab B  aaac F  aaad (1) (2) (3) (4) (5) (6) (7) aaab aaac aaaa xxxx 4 look-ahead a aE aaab aaac aaad E E E A B F Stack top EAF EBF S S Shift-in Reduction Table

169 Our LR(3) parser will parse this string applying the sequence of production rules in the reverse order of the rightmost derivation, i.e., (4)(3)(3)(2)(1)(1). S  bS | Accc A  bAc | bc (1) (2) (3) (4) We construct an LR(3) parser for this grammar. The language of this grammar is {b i b n c n ccc | i  0, n  1 }. Clearly, every derivation of this grammar terminates by rule A  bc which generates bc at the center of b n c n. This bc can be shifted in from the tape by simply reading the input and pushing it on the stack until bc appears at the stack top. This is the target of the first reduction. Lets study how an LR(3) parser can parse string bbbbbbccccc of the language. The rightmost derivation of this string is S  bS  bbS  bbAccc  bbbAcccc  bbbbAccccc  bbbbbcccccc (1) (1) (2) (3) (3) (4) Example 3. Construct LR(k) parser with minimum k for the following grammar.

170 String bAc is reduced by A, the next c is shifted in, bAc is reduced by A again, and so on, until there remains last three c’s. The parser shifts the input in until it sees bc (again read inside out) in the stack as follows. (q 0, bbbbbcccccc, Z 0 )  …..  (q 1, ccccc, cbbbbbZ 0 )  String bc is reduced by A applying production A  bc, and then the next input symbol is shifted in because no further reduction is possible. (q 1, ccccc, cbbbbbZ 0 )  (q 1, ccccc, AbbbbZ 0 )  (q 1, cccc, cAbbbbZ 0 )  (4) (3) (q 1, cccc, AbbbZ 0 )  (q 1, ccc, cAbbbZ 0 )  (q 1, ccc, AbbZ 0 )  (3) S  bS | Accc A  bAc | bc (1) (2) (3) (4)

171 Now, bS on the top of the stack can be reduced by S for the production S  bS until only S remains in the stack, which is the accepting configuration. The reduction table of this LR(3) parser is shown on the next slide. (1) (q 1, , SbbZ 0 )  (q 1, , SbZ 0 )  (q 1, , SZ 0 )(q 1, cc, cAbbZ 0 ) .  (q 1, , cccAbbZ 0 )  (q 1, , SbbZ 0 )  (2) Since the last three c’s are generated by S  Accc, after shifting in the next c into the stack, if there remains only two c’s in the input tape, the parser should stop reducing bAc by A. (For this, it needs 3-look-ahead.) The remaining two c’s in the tape should be shifted in onto the stack, and then Accc is reduced by S applying the production (2) S  Accc as follows. S  bS | Accc A  bAc | bc (1) (2) (3) (4)

172 Example 4. Consider the following two CFG’s which generate the same language {a i x | i  1, x  {b,c}}. G 1 : S  Ab|Bc A  Aa|a B  Ba|a G 2 : S  Ab|Bc A  aA|a B  aB|a It is easy to see that there is no LR(k) parser for G 1 for any constant k, while G 2 is an LR(1) grammar. S  bS | Accc A  bAc | bc (1) (2) (3) (4) 3 look-ahead ccc ccB xxx bc bAc Accc bS A AShift-in S S Stack top Reduction Table

173 Definition (LR(k) grammar). For CFG G = (V T, V N, P, S), let  x and  y be two sentential forms of G, for some ,   V* and x, y  V T *, which are derivable by some rightmost derivations. Grammar G is an LR(k) grammar, if for a constant k, it satisfies the following conditions. (i) (k) x = (k) y and (ii) if A  is the last production used to derive  x (i.e.,  Ax   x), then A  must also be used to derive  y from  Ay for the rightmost derivation. Definition. An extended PDA is a PDA which can read some constant number of symbols from the top of the stack and rewrite them by other symbol. An LR(k) parser is an extended DPDA which can look the input ahead k symbols, for some constant k. LR(k) Grammars and Deterministic Bottom-up Left-to-right Parsing (Formal Definition)

174 Ruminating on LL(k) and LR(k) Parsing Since parsers are constructed based on grammars, it does not make sense to say LL(k) languages or LR(k) languages. Recall that a CGF G is ambiguous if there is a string x  L(G) for which two parse trees exist. Obviously, given such x it it impossible for a parser to know which derivation is used to generate x. However, this fact does not implies that we cannot build a parser which generates a sequence of production rules that produces x. Consider the following simple ambiguous CFG whose language is {a}. S  A|B A  a B  a It is trivial to construct an LL(0) parser (or an LR(0) parser) which simply choose either S  A or S  B. However, it is impossible for the parser to decide which production rule should be used to derive a. (Pascal’s if-statement is such case.)

175 We can easily prove that LL(k) parsers need no more than two states. There are LR(k) grammars for which more than two states are needed. In general, LR(k) parsers need some finite number of states depending on the grammar. The following example shows why. S  aaAd | baBd | caCd A  aAd | a B  aBd | a C  aCd | a This grammar generates the language {xa i ad i | x  {a,b,c}, i  1}. Notice that an LR(k) parser for this grammar should push xa i a onto the stack before it begins to reduce the stack by applying one of the productions A  a, B  a and C  a depending on whether x is a, b, or c, which is at the bottom of the stack. It can be located at arbitrarily far down at the bottom of the stack. It is too deep to look down. Reading x, it must be stored in the memory. This implies that the parser needs two more states for this grammar (for example q 1, q 2 and q 3, respectively, for the cases of x = a, x = b, and x = c). Ruminating on LL(k) and LR(k) Parsing(cont’ed)