1 Section 12.3 Context-Free Parsing We know (via a theorem) that the context-free languages are exactly those languages that are accepted by PDAs. When.

Slides:



Advertisements
Similar presentations
1 Chapter Parsing Techniques. 2 Section 12.3 Parsing Techniques We know (via a theorem) that the context- free languages are exactly those languages.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
LR-Grammars LR(0), LR(1), and LR(K).
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Mooly Sagiv and Roman Manevich School of Computer Science
142 Parsing start (a, Z 0 /aZ 0 ) ( a, a/aa ) (b, a/  ) ( , Z 0 /Z 0 ) For a given CFG G, parsing a string w is to check if w  L(G) and, if it is,
Pushdown Automata Part II: PDAs and CFG Chapter 12.
CS5371 Theory of Computation
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
Parsing III (Eliminating left recursion, recursive descent parsing)
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
A shorted version from: Anastasia Berdnikova & Denis Miretskiy.
Chapter 12: Context-Free Languages and Pushdown Automata
Compiler Principles Winter Compiler Principles Exercises on scanning & top-down parsing Roman Manevich Ben-Gurion University.
1 Dr. torng CFG → PDA construction Shows that for any CFL L, there exists a PDA M such that L(M) = L The reverse is true, but we skip the proof Parsing.
Syntax and Semantics Structure of programming languages.
Pushdown Automata.
CMSC 330: Organization of Programming Languages
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
1 Section 3.3 Grammars A grammar is a finite set of rules, called productions, that are used to describe the strings of a language. Notational Example.
1 Chapter Construction Techniques. 2 Section 3.3 Grammars A grammar is a finite set of rules, called productions, that are used to describe the.
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
# 1 CMPS 450 Parsing CMPS 450 J. Moloney. # 2 CMPS 450 Check that input is well-formed Build a parse tree or similar representation of input Recursive.
1 Chapter Regular Language Topics. 2 Section 11.4 Regular Language Topics Regular languages are also characterized by special grammars called regular.
Lecture # 19. Example Consider the following CFG ∑ = {a, b} Consider the following CFG ∑ = {a, b} 1. S  aSa | bSb | a | b | Λ The above CFG generates.
Chapter 5 Context-Free Grammars
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
Syntax and Semantics Structure of programming languages.
6/4/2016IT 3271 The most practical Parsers: Predictive parser: 1.input (token string) 2.Stacks, parsing table 3.output (syntax tree, intermediate codes)
CMSC 330: Organization of Programming Languages
Section 12.4 Context-Free Language Topics
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Chapter 7 Pushdown Automata
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Section 11.4 Regular Language Topics
Top-Down Parsing.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
1 Section 13.1 Turing Machines A Turing machine (TM) is a simple computer that has an infinite amount of storage in the form of cells on an infinite tape.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Exercises on Chomsky Normal Form and CYK parsing
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
1 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations pop, push, and nop. PDAs always.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Syntax and Semantics Structure of programming languages.
Formal Languages, Automata and Models of Computation
Programming Languages Translator
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Theorem 29 Given any PDA, there is another PDA that accepts exactly the same language with the additional property that whenever a path leads to ACCEPT,
Top-Down Parsing.
Chapter 7 Regular Grammars
Compiler Design 7. Top-Down Table-Driven Parsing
CFGs: Formal Definition
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

1 Section 12.3 Context-Free Parsing We know (via a theorem) that the context-free languages are exactly those languages that are accepted by PDAs. When a context-free language can be recognized by a deterministic final-state PDA, it is called a deterministic context-free language. An LL(k) grammar has the property that a parser can be constructed to scan an input string from left to right and build a leftmost derivation by examining next k input symbols to determine the unique production for each derivation step. If a language has an LL(k) grammar, it is called an LL(k) language. LL(k) languages are deterministic context-free languages, but there are deterministic context-free languages that are not LL(k). (See text for an example on page 789.) Example. Consider the language {a n b | n  N}. (1) It has the LL(1) grammar S  aS | b. A parser can examine one input letter to decide whether to use S  aS or S  b for the next derivation step. (2) It has the LL(2) grammar S  aaS | ab | b. A parser can examine two input letters to determine whether to use S  aaS or S  ab for the next derivation step. Notice that the grammar is not LL(1). (3)Quiz. Find an LL(3) grammar that is not LL(2). Solution. S  aaaS | aab | ab | b.

2 Answer: Any derivation starts with S  AB. The next derivation step uses one of the productions A  aAb or  A  , depending on whether the scanned input letter is a or not. The argument is the same for B-productions. It is not LL(1): Let the input be ab. The first letter a tells us to start with S  aSb. The letter a in aSb matches the first input letter, so we lookahead to the second input letter b. This tells us we must use S  T to get S  aSb  aTb. The lookahead remains at b, so we can’t determine whether it is the last b of the input string. So we don’t know whether to choose T  bT or T   for the next step. Thus the grammar is not LL(1). It is not LL(2): Let the input be aabb. The first two input letters aa, tell us S  aSb. The letter a in aSb matches the first input, so we lookahead to the next two-letter substring ab. We must use S  aSb to get S  aSb  aaSbb. Now the lookahead becomes bb, so we must use S  T to get S  aSb  aaSbb  aaTbb. The lookahead remains at bb, so we can’t determine whether these are the last two b’s of the input string. So we don’t know whether to choose T  bT or T   for the next step. Thus the grammar is not LL(2). Quiz (on your time): Find a general argument to show the grammar is not LL(k). Example/Quiz. Why is the following grammar for {a n b n + k | n, k  N} an-LL(1) grammar? S  AB A  aAb |  B  bB |  Example. The following grammar for {a n b n + k | n, k  N} is not LL(k) for every k. S  aSb | T T  bT | .

3 Example. The language {a n + k b n | k, n  N} is not deterministic context-free. So it has no LL(k) grammar for every k. Proof: Any PDA for the language must keep a count of the a’s with the stack so that when the b’s come along the stack can be popped with each b. But there might still be a’s on the stack (e.g., when k > 0), so there must be a nondeterministic state transition to a final state from the popping state. i.e., We need two instructions like, (i, b, a, pop, i) and (i, , a, pop, final). Note: The previous two examples show that {a m b n | m ≤ n} is LL(1) but {a m b n | m ≥ n} is not LL(k) for any k. Grammar Transformations Left factoring: Sometimes we can “left-factor” an LL(k) grammar to obtain an equivalent LL(n) grammar where n < k. Example. The grammar S  aaS | ab | b is LL(2) but not LL(1). But we can factor out the common prefix a from productions S  aaS | ab to obtain S  aT T  aS | b. This gives the new grammar:S  aT | b T  aS | b. Quiz: Find an LL(k) grammar where k is as small as possible that is equivalent to the following grammar. S  abS | abcT | ab T  cT | c. Solution: The given grammar is LL(3) and it can be left-factored to become an LL(1) grammar as follows: S  abR R  S | cT |  T  cU U  T | 

4 Removing left recursion: A left-recursive grammar is one which has a derivation of the form A  + Ax for some nonterminal A and sentential form x. Left-recursive grammars are not LL(k) for any k. Example. The language {ba n | n  N} has a grammar S  Sa | b, which is left-recursive. It is not LL(k) for any k. Consider the following cases: LL(1) case: If the input string is ba, the lookahead is b. So we don’t know whether there are any a’s to the right of b. Therefore we don’t know which production to start the derivation. LL(2) case: If the input string is baa, the lookahead is ba. So the derivation starts with S  Sa. But the a in Sa denotes the a at the right end of the derived string. So the input string could be ba or baa. Therefore we don’t know which production to choose next. LL(k) case: If the input string is ba…a with k a’s, the lookahead is ba…a with k – 1 a’s. So the derivation after k –1 steps is S  Sa  Saa  …  Sa…a. Now, the input string could be ba…a (length k) or ba…a (length k + 1). Therefore we don’t know which production to choose next. Sometimes we can obtain an LL(k) grammar by removing left recursion. Algorithm Idea for direct left recursion: Transform: A  Aw | Au | Av | a | b. To: A  aB | bB B  wB | uB | vB |  Example/Quiz. Remove left recursion. S  Sa | b. Solution.S  bT T  aT | . It is LL(1).

5 Example/Quiz. Remove left recursion. S  Saa | aab | aac. Solution:S  aabT | aacT T  aaT | . It is LL(3). Example (Removing indirect left recursion). Consider the following grammar. S  Ab | a A  Sa | b. The grammar is left-recursive because of the indirect left recursion S  Ab  Sab. To remove the indirect left recursion, replace A in S  Ab by the right side of A  Sa | b to obtain S  Sab | bb. The grammar becomes S  Sab | bb | a. Remove the left recursion: S  bbT | aT T  abT | . Quiz: Remove left recursion from the grammar:S  Ab | a A  SAa | b. Solution: Replace A in S  Ab | a by the right side of A  SAa | b to obtain S  SAab | bb | a A  SAa | b. Now remove the direct left recursion: S  bbT | aT T  AabT |  A  SAa | b. Quiz: Rewrite the previous solution as LL(1). Solution:S  aaU U  bT | cT T  aaT | .

6 Top-Down Parsing of LL Languages LL(k) grammars have top-down parsing algorithms because a leftmost derivation can be constructed by starting at the start symbol and proceeding to the desired string. Example/Quiz. Consider the following LL(1) grammar.S  aSC | b C  cC | d. The string aabcdd has the following leftmost derivation, where each step is uniquely determined by the current lookahead symbol. S  aSC  aaSCC  aabCC  aabcCC  aabcdC  aabcdd. Recursive Descent LL(1) Parsing A procedure is associated with each nonterminal. We’ll use the following procedure for LL(1) grammars to match a symbol with the lookahead symbol. match(x): if lookahead = x then lookahead := next input symbol else error fi. Example. For the preceding example here are two possible recursive descent procedures: S: if lookahead = a then match(a); S; C else match(b) fi. C: if lookahead = c then matct(c); C else match(d) fi. Quiz. Write recursive descent procedures for the following LL(1) grammar. S  aaM M  bT | cT T  aaT | . Solution:S: match(a); match(a); M. M: if lookahead = b then match(b); T else match(c); T fi T: if lookahead = a then match(a); match(a); T fi

7 Table-Driven LL(1) Parsing We’ll give an example showing how to use a table to parse a string. The details on how to construct such a table are a subject in compiler texts. Example. Consider the following LL(1) grammar and its corresponding parse table. Grammar:S  aSb | . ab$ S S  aSbS  S   S  S   Table: Parse of the string aabb: StackInputAction $ Sa a b b $pop, p(b), p(S), p(a) $ b S aa a b b $pop, consume $ b Sa b b $pop, p(b), p(S), p(a) $ b b S aa b b $pop, consume $ b b Sb b $pop $ b b b b $pop, consume $ bb $pop, consume $$accept.

8 LL(k) Facts and Notes In 1969 Kurki-Suoni showed that the LL(k) languages form an infinite hierarchy: For each k there is an LL(k + 1) language that is not LL(k). Example. The language defined by the following grammar is LL(k + 1) but has no LL(k) grammar, where a…a stands for a k-length string of a’s. S  aSA |  A  a…abS | c. Example/Quiz. Why is the following grammar LL(2) but not LL(1)? S  aSA |  A  abS | c. Answer: Consider the string aab. A derivation must start with S  aSA. Now the lookahead is at the second a in aab, but there are two choices to pick from: S  aSA and S  . So the grammar is not LL(1). But two lookahead letters allow the derivation to see the substring ab or ac, so that S  aSA can be chosen for the next step. Context-free Deterministic C-F LL(k) Regular The Picture: Palindromes over {a, b} {a n | n  N}  {a n b n | n  N} (Text 789) {a n b n | n  N}