YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
CPSC 388 – Compiler Design and Construction
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
# 1 CMPS 450 Parsing CMPS 450 J. Moloney. # 2 CMPS 450 Check that input is well-formed Build a parse tree or similar representation of input Recursive.
Parsing III (Top-down parsing: recursive descent & LL(1) )
C Chuen-Liang Chen, NTUCS&IE / 77 TOP-DOWN PARSING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
6/4/2016IT 3271 The most practical Parsers: Predictive parser: 1.input (token string) 2.Stacks, parsing table 3.output (syntax tree, intermediate codes)
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Parsing Top-Down.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
More Parsing CPSC 388 Ellen Walker Hiram College.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Increasing power of LL(k) parsers.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Lesson 4 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax and Semantics Structure of programming languages.
Programming Languages Translator
Context free grammars Terminals Nonterminals Start symbol productions
Lecture #12 Parsing Types.
Top-down parsing cannot be performed on left recursive grammars.
Top-Down Parsing CS 671 January 29, 2008.
CS 540 George Mason University
Compiler Design 7. Top-Down Table-Driven Parsing
Chapter 5 Grammars and Parsers
LL and Recursive-Descent Parsing
Nonrecursive Predictive Parsing
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Presentation transcript:

YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying parsers is also convenient.

YANGYANG 2 Chap 5 LL(1) Parsing Given the productions A   1 A   A   n During a (leftmost) derivation,... A... ...  1... or ...  2... or ...  n... Which route should we choose? (Try-and-error is not a good idea.) » Use the lookahead symbols.

YANGYANG 3 Chap 5 LL(1) Parsing Consider the situation: We are about to expand a nonterminal A and there are several productions whose LHS are A: A   1 A   A   n We choose one of the productions based on the lookahead token. Which one should we choose? Consider First(  1 ) First(  2 ) First(  n ) and if  i , then consider also Follow(A). *

YANGYANG 4 Chap 5 LL(1) Parsing Define predict(A   ) =First(  )  (if  First(  ) then Follow(A)) If the lookahead token a  predict(A  ) then we use the production A  to expand A. What if a  predict(A   1 ) and a  predict(A   2 )? What if a  predict(A  ) for all productions A  whose LHS are A?

YANGYANG 5 Chap 5 LL(1) Parsing Property of LL(1) grammars: If a grammar is LL(1), then for any two productions A  A  First(  Follow(A))  First(  Follow(A)) = 

YANGYANG 6 Chap 5 LL(1) Parsing Given the FIRST and FOLLOW sets in Fig. 5-2 and 5-3, calculate the predict set for each production.

YANGYANG 7 Chap 5 LL(1) Parsing §5.2 LL(1) Parse Table The predict() function may be represented as an LL(1) parse table. T: Vn * Vt  P  {error} a b A 3 B error.... T[A, a] = A  if a  predict(A  ) = error otherwise A grammar is LL(1) iff all entries in the parse table contain a unique production or the error flag.

YANGYANG 8 Chap 5 LL(1) Parsing Figure 5.5 The LL(1) table for Micro

YANGYANG 9 Chap 5 LL(1) Parsing 5.3 LL(1) parsers Similar to scanners, there are two kinds of parsers: 1. built-in: recursive descent 2. table-driven

YANGYANG 10 Chap 5 LL(1) Parsing 1. built-in stmt() { token = next_token(); switch(token) { case ID: /*production 5:stmt-->ID:= ;*/ match(ID); match(ASSIGN); exp(); match(SEMICOLON); break; case READ: /*production 6*/... case WRITE: /*production 7*/... default: syntax_error(....); }

YANGYANG 11 Chap 5 LL(1) Parsing It is obvious that these recursive descent parsing procedures can be generated automatically from the grammar. grammar LL(1) table parser generator recursive descent parser However, it is difficult for the parser generator to integrate the semantic routines into the (generated) recursive descent parser automatically.

YANGYANG 12 Chap 5 LL(1) Parsing 2. table-driven parser (+) generic driver Only the LL(1) table needs to be changed when the grammar is modified. (+) non-recursive (faster) Parser maintains a stack itself. No recursive calls.

YANGYANG 13 Chap 5 LL(1) Parsing lldriver() { push( START_SYMBOL ); a := next_token; while stack is not empty do { X := symbol on stack top if ( X is a nondeterminal && T[X, a] == X  Y 1  Y m ) ’ ) pop(1); push Y m, Y m-1, , Y 1 else if ( x == a ) pop(1); a := next_token(); else if ( x is an action symbol ) pop(1); call correspond routine else sntax_error(); }

YANGYANG 14 Chap 5 LL(1) Parsing Ex. begin A := B A; end $ a = begin X = parse stack Trace the action of the parser on this example.

YANGYANG 15 Chap 5 LL(1) Parsing 5.5 Action symbols Action symbols may be processed by the parser in a similar way. 1. in recursive descent parsers Ex. gen_action( “ ID:= #assign ” ); ” ) will generate the following code: match(ID); match(ASSIGN); exp(); assign(); match(semicolon); Parameters are transmitted through a semantic stack. Semantic stack is a stack of semantic records. Parser stack is a stack of grammar (and action) symbols.

YANGYANG 16 Chap 5 LL(1) Parsing 2. in LL(1) driver Action symbols are pushed into the parse stack in the same way as grammar symbols. When action symbols are on stack top, the driver calls corresponding semantic routines. See previous slide for lldriver. Parameters are also transmitted through semantic stack.

YANGYANG 17 Chap 5 LL(1) Parsing §5.6 Making grammars LL(1) Not all grammars are LL(1). However, some non-LL(1) grammars can be made LL(1) by simple modifications. When is a grammar not LL(1)? When there is an entry in the parse table that contains more than one productions. Ex ID ,5.... This is called a conflict, which means we do not know which production to use when is on stack top and ID is the next input token.

YANGYANG 18 Chap 5 LL(1) Parsing Conflicts are classfied into two categories: 1. common prefix 2. left recursion Common prefix Ex.  if then  if then else Consider when is on stack top,  ‘ if ’ is the next input token. We cannot choose which production to use at this time. In general, if we have two productions A  A  and First(  )  First(  )  , then we have a conflict.

YANGYANG 19 Chap 5 LL(1) Parsing Solution: factor out the common prefix Ex.  if then   else

YANGYANG 20 Chap 5 LL(1) Parsing 2. left recursion: productions of the form: A  A  grammar with left-recursive productions are not LL(1) because we may have A  A  A  same lookahead

YANGYANG 21 Chap 5 LL(1) Parsing Problem: left recursion A  A  A  A  Intuition: all the strings derivable from A have the form: , , ,  , , ,  Solution: replace the productions So we may use the following productions instead: A  T A  T T  T  T

YANGYANG 22 Chap 5 LL(1) Parsing Ex. Given the left-recursive grammar: E  E + T E  T T  T * P T  P P  ID After eliminating left recursion, we get E  T A A  A  + T A T  P B B  B  * P B P  ID

YANGYANG 23 Chap 5 LL(1) Parsing 3. more general solution ex.   ID :   ID := ; We cannot decide which production to use when is on the stack top and ID is the next token: ? lookahead ID ID

YANGYANG 24 Chap 5 LL(1) Parsing Solution: use the following productions (which essentially look ahead 2 tokens)  ID  :  := ;  ID := ; Try two examples: A: B := C ; B := C ;

YANGYANG 25 Chap 5 LL(1) Parsing 4. For more difficult cases, we use semantic routines to help parsing. Ex. In Ada, we may declare arrays as A: array(I.. J, BOOLEAN) A straightforward grammar is (for array bound) ..  ID  … and ID  First( ) This grammar is not LL(1) because we cannot make a decision when is on stack top and ID is the next token.

YANGYANG 26 Chap 5 LL(1) Parsing Solution:   .. All grammars can be transformed into Greibach Normal Form, in which a production has the form: A  a  terminal So given a grammar G, we can do G  GNF  no common prefix no left recursion but still NOT LL(1)! Ex. S  a A a S  b A b a A  b A  consider A is on stacktop; b is next token.

YANGYANG 27 Chap 5 LL(1) Parsing §5.7 The dangling-else problem Consider if a then if b then x := 1 else x := 2 Two possibilities: a a T T F b b T F T x := 2 x := 1 x := 2 x := 1 The problem is which ‘  if ’  the ‘  else ’ belong to. In essence, we are trying to find an LL(1) grammar for the set { [ i ] j | i  j  0} But is it possible?

YANGYANG 28 Chap 5 LL(1) Parsing 1st attempt: G1 S  [ S C S  C  ] C  This grammar is ambiguous. Consider [ [ ] S S [ S C [ S C [ S C [ S C ] ]

YANGYANG 29 Chap 5 LL(1) Parsing 2nd attempt: we can make ] be associated with the nearest unpaired [ as follows: S  [ S S  T T  [ T ] T  This grammar is not ambiguous. Consider [ [ ] S [ S [ T ] However, this grammar is not LL(1), either. Consider the case when S is on stack top and [ is the next input token. [  First( [ S ) [  First( T ) This grammar can be parsed with a bottom-up parser, but not a top-down parser.

YANGYANG 30 Chap 5 LL(1) Parsing Solution: conflicts + special rules 1. G  S ; 2. S  if S E 3. S  other 4. E  else S 5. E  The parse table if else other ; G 1 1 S 2 3 E 4,5 5 conflicts We can enforce that T[E, else] = 4th rule. This essentially forces ‘ else ’ to be matched with the nearest unpaired ‘ if ’.

YANGYANG 31 Chap 5 LL(1) Parsing Alternative solution: change the language. Add ‘ end if ’ at the end of every ‘ if ’. S  if S E S  other E  else S end if E  end if

YANGYANG 32 Chap 5 LL(1) Parsing §5.9 Properties of LL(1) parsers: A correct leftmost parse is guaranteed. All LL(1) grammars are un-ambiguous. linear time and linear space

YANGYANG 33 Chap 5 LL(1) Parsing § llgen Page 776 of the book output from llgen *define decrtn 1 ifprocess 2

YANGYANG 34 Chap 5 LL(1) Parsing § LL(k) parsing Recall a grammar is LL(1) only if for any two productions A  and A , First(  Follow(A))  First(  Follow(A)) =  To generalize, we write for any two productions A  and A  First k (  Follow k (A))  First k (  Follow k (A)) =  if G is strong LL(k). The word ‘ strong ’ means G imposes too strong a condition.

YANGYANG 35 Chap 5 LL(1) Parsing Consider G  S $ S  a A a S  b A b a A  b A  – This grammar is not LL(1) When A is on stack top and b is next token, we cannot choose between A  b and A . stack input b..... A Does it help if we can look ahead two tokens? NO! if the next two tokens are bb then we should choose A  b. if the next two tokens are ba then we cannot make a choice.

YANGYANG 36 Chap 5 LL(1) Parsing case 1. input is aba a A A S a a G $ $ $ lookahead match lookahead ab a ba at this point, we should choose A  b case 2. input is bba b A A b b S a a G $ $ $ lookahead match lookahead bb b ba at this point, we should choose A 

YANGYANG 37 Chap 5 LL(1) Parsing So the problem is not the limited number of lookahead tokens. The problem is in the ‘ context ’.

YANGYANG 38 Chap 5 LL(1) Parsing Therefore, the grammar is not strong LL(1). Actually, we can verify that the grammar is not strong LL(k) for all k  1 by verify that First k ( ba$ )  First k ( bFollow k (A) )  First k ( Follow k (A) ) for all k  1

YANGYANG 39 Chap 5 LL(1) Parsing However, it is possible to parse the language of the grammar under the following conditions: 1. look ahead two tokens 2. from left to right 3. using the left context We call such grammars LL(2), rather than strong LL(2). Note that LL(2)  strong LL(2) LL(1) = strong LL(1)