CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Compiler Designs and Constructions
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Chapter 4-2 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR Other.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Bottom-up parsing Goal of parser : build a derivation
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
1 Syntactic Analysis and Parsing (Based on: Compilers, Principles, Techniques and Tools, by Aho, Sethi and Ullman, 1986)
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
Joey Paquet, 2000, 2002, 2012, Lecture 6 Bottom-Up Parsing.
410/510 1 of 21 Week 2 – Lecture 1 Bottom Up (Shift reduce, LR parsing) SLR, LR(0) parsing SLR parsing table Compiler Construction.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
Chapter 3-3 Chang Chi-Chung Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation)  LR(0), SLR, Canonical LR = LR(1), LALR 
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-Down Parsing.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Syntax and Semantics Structure of programming languages.
Programming Languages Translator
Compiler design Bottom-up parsing Concepts
Bottom-Up Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Syntactic Analysis and Parsing
Top-down parsing cannot be performed on left recursive grammars.
Fall Compiler Principles Lecture 4: Parsing part 3
UNIT 2 - SYNTAX ANALYSIS Role of the parser Writing grammars
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lecture 7 Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Design 7. Top-Down Table-Driven Parsing
Bottom Up Parsing.
Lecture 7 Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Chap. 3 BOTTOM-UP PARSING
Presentation transcript:

CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and Tools

CSI 3120, Syntactic analysis, page 2 Compilers A compiler is an application that reads a program written in the source language and translates it into the target language. A compiler operates in phases. Each phase transforms the source program from one representation to another. Source program  Lexical Analyzer  Syntax Analyzer  Semantic Analyzer  Intermediate Code Generator  Code Optimizer  Code Generator  Target Program The part of the compiler on which we will focus here is the Syntax Analyzer or Parser.

CSI 3120, Syntactic analysis, page 3 Parsing Parsing determines whether a grammar can generate a string of tokens. It builds a parse tree. Most parsing methods fall into one of two classes: top-down and bottom-up methods. In top-down parsing, construction starts at the root and proceeds down to the leaves. In bottom-up parsing, construction starts at the leaves and proceeds up towards the root. Efficient top-down parsers are easily built by hand. Bottom-up parsing, however, can handle a larger class of grammars. They are not as easy to build, but there are tools that generate parsers directly from a grammar.

CSI 3120, Syntactic analysis, page 4 Part I: Top-Down Parsing Points: Basic ideas in top-down parsing Predictive parsers Left-recursive grammars Left-factoring a grammar Constructing a predictive parser LL(1) grammars

CSI 3120, Syntactic analysis, page 5 Basic Idea in Top-Down Parsing Top-Down Parsing is an attempt to find a left-most derivation for an input string Example: S  c A d Find a derivation for A  a b | a for w  c a d S S backtrack S / | \  / | \  / | \ c A d c A d c A d / \ | a b a

CSI 3120, Syntactic analysis, page 6 Predictive Parsers: Generalities In many cases, careful writing – left recursion eliminated and left-factoring considered – we can get a grammar that parses using recursive descent, and needs no backtracking. Such parsers are called predictive parsers.

CSI 3120, Syntactic analysis, page 7 Left Recursive Grammars (1) A grammar is left-recursive if it has a non-terminal A such that for some string α there is a derivation A  A α A top-down parser can loop when it faces a left-recursive rule. Therefore, such rules must be eliminated. As an example, the left-recursive rule A  A α | β can be replaced by: A  β R where R is a new non-terminal R  α R |  and  is the empty string The new grammar is right-recursive.

CSI 3120, Syntactic analysis, page 8 Left-Recursive Grammars (2) Here is the general procedure for removing direct left recursion that occurs in one rule: Group the A-rules like this: A  A α 1 |… | A α m | β 1 | β 2 |…| β n where no β begins with A. Replace the original A-rules with A  β 1 A’ | β 2 A’ | … | β n A’ A’  α 1 A’ | α 2 A’ | … | α m A’ |  This procedure will not eliminate indirect left recursion of the kind: A  B a A B  A b [There is another procedure (we will skip it).] Direct or indirect left recursion is problematic for all top-down parsers. It is not a problem for bottom-up parsing algorithms.

CSI 3120, Syntactic analysis, page 9 Left-Recursive Grammars (3) Here is an example of a (directly) left- recursive grammar: E  E + T | T T  T * F | F F  ( E ) | id This grammar can be re-written as the following not left-recursive grammar: E  T E’ E’  + T E’ |  T  F T’ T’  * F T’ |  F  ( E ) | id

CSI 3120, Syntactic analysis, page 10 Left-Factoring a Grammar (1) Left recursion is not the only property that hinders top-down parsing. Another difficulty is the parser’s inability always to choose the correct right-hand side on the basis of the next input token. The idea is to consider only the first token generated by the leftmost non- terminal in the current derivation. To ensure it, we need to left-factor formerly left- recursive grammars – as the one generated in the preceding example.

CSI 3120, Syntactic analysis, page 11 Left-Factoring a Grammar (2) The procedure of left-factoring a grammar For each non-terminal A, find the longest prefix α common to two or more of its alternatives. The A productions are as follows: A  α β 1 | α β 2 … | α β n |  (  denotes all alternatives that do not begin with α) Replace that with: A  α A’ |  A’  β 1 | β 2 | … | β n

CSI 3120, Syntactic analysis, page 12 Left-Factoring a Grammar (3) Here is an example of a well-known grammar that needs left-factoring: S  if E then S | if E then S else S | a E  b Left-factored, this grammar becomes: S  if E then S S’ | a S’  else S |  E  b

CSI 3120, Syntactic analysis, page 13 Predictive Parsers: Details The key problem during predictive parsing: determine the production to be applied to a non-terminal. This is done using a parsing table. A parsing table is a two-dimensional array M[A,  ] where A is a non-terminal, and  is either a terminal or the symbol $ that denotes the end of input string. Other data for a predictive parser: The input buffer contains the string to be parsed, followed by $. The stack contains a sequence of grammar symbols; initially, it is $S (end of the input string and the grammar’s start symbol).

CSI 3120, Syntactic analysis, page 14 Predictive Parsers: Informal Procedure The predictive parser considers X, the symbol on the top of the stack, and , the current input symbol. It uses the parsing table M. X =  = $  stop with success X =  ≠ $  pop X off the stack and advance the input pointer to the next symbol X is a non-terminal  check M[X,  ] If the entry is a production, then pop X and push the right- hand side of this production (one by one) If the entry is blank, then stop with failure

CSI 3120, Syntactic analysis, page 15 Predictive Parsers: an Example id+*()$ E E  T E’ E’ E’  + T E’ E’   T T  F T’ T’ T’   T’  * F T’ T’   F F  id F  (E) StackInputOutput $Eid+id*id$ $E’Tid+id*id$ E  TE’ $E’T’Fid+id*id$ T  FT’ $E’T’idid+id*id$ F  id $E’T’ +id*id$ $E’ +id*id$ T’   $E’T+ +id*id$ E’  +TE’ $E’T id*id$ $E’T’F id*id$ T  FT’ $E’T’id id*id$ F  id $E’T’ *id$ $E’T’F* *id$ T’  *FT’ $E’T’F id$ $E’T’id id$ F  id $E’T’ $ $E’ $ T’   $ $ E’    Parsing Trace Parsing Table

CSI 3120, Syntactic analysis, page 16 Constructing a Parsing Table (1): First and Follow First(y) is the set of terminals that begin the strings derived from y. Follow(A) is the set of terminals that can appear to the right of A. First and Follow are used in the construction of the parsing table. To compute First: X is a terminal  First(X) = {X} X   is a production  add  to First(X) X is a non-terminal and X  Y 1 Y 2 … Y k is a production  place z in First(X) if z is in First(Y i ) for some i and  is in all of First(Y 1 ) … First(Y i-1 )

CSI 3120, Syntactic analysis, page 17 Constructing a Parsing Table (2): First and Follow To compute Follow Place $ in Follow(S), where S is the start symbol and $ is the end-of-input marker. There is a production A   B β  everything in First(β) except for  is placed in Follow(B). There is a production A   B, or a production A   B β where First(β) contains   everything in Follow(A) is placed in Follow(B)

CSI 3120, Syntactic analysis, page 18 Constructing a Parsing Table (3): First and Follow, an Example E  T E’ E’  + T E’ |  T  F T’ T’  * F T’ |  F  ( E ) | id First(E) = First(T) = First(F) = {(, id} First(E’) = {+,  } First(T’) = {*,  } Follow(E) = Follow(E’) = {), $} Follow(F) = {+, *, ), $} Follow(T) = Follow(T’) = {+, ), $}

CSI 3120, Syntactic analysis, page 19 Constructing a Parsing Table (4) An algorithm for constructing a predictive parsing table for a grammar: 1.For each production A  , do steps 2 and 3 2.For each terminal t in First(  ), add A   to M[A, t] 3.If  is in First(  ), add A   to M[A, t] for each terminal t in Follow(A). If  is in First(  ) and $ is in Follow(A), add A   to M[A, $]. 4.Mark each undefined element of M as an error.

CSI 3120, Syntactic analysis, page 20 LL(1) Grammars A grammar whose parsing table does not contain multiply-defined entries is said to be LL(1). No ambiguous grammar and no left-recursive grammar can be LL(1). A grammar is LL(1) iff for any pair of productions A   and A  β the following conditions hold: there is no terminal t for which both  and β derive strings beginning with t at most one of  and β can derive the empty string  if β can (directly or indirectly) derive , then  does not derive any string beginning with a terminal in Follow(A)

CSI 3120, Syntactic analysis, page 21 Part II: Bottom-Up Parsing One of several methods of bottom-up syntactic analysis is Shift-Reduce parsing. It has several different forms. Operator-precedence parsing is such form; another, much more general, is LR parsing. In this presentation, we will look at LR parsing. It has three varieties. Simple LR parsing (SLR) is an efficient but restricted version. Canonical LR parsing is the most powerful, but also most expensive version. LALR is intermediate in cost and power. Our focus will be on SLR Parsing.

CSI 3120, Syntactic analysis, page 22 LR Parsing: Advantages Warning: advertisement LR parsers recognize any language for which a context-free grammar can be written. LR parsing is the most general non- backtracking shift-reduce method known, yet it is as efficient as other shift-reduce algorithms. The class of languages that can be parsed by an LR parser is a proper superset of what can be parsed by a predictive parser. An LR-parser can detect a syntactic error as early as possible during a left-to-right scan of the input.

CSI 3120, Syntactic analysis, page 23 LR Parsing: Downside (easily prevented) It is a lot of work to construct an LR parser by hand for a typical grammar of a programming language. But: there are specialized tools that build LR parsers automatically. With such tools, one must write a precise context-free grammar. From it, a parser generator automatically produces a parser for the underlying language. An example of such a tool is Yacc – Yet Another Compiler-Compiler.

CSI 3120, Syntactic analysis, page 24 LR Parsing Algorithms: Details (1) An LR parser consists of an input, output, a stack, a driver program and a parsing table. The driver program is the same for all languages parsed. Only parsing tables differ. The stack stores a sequence of the form s 0 X 1 s 1 X 2 … X m s m (s m is at the top of the stack, s 0 at the bottom). s k is a state symbol, X i is a grammar symbol. Together, state and grammar symbols determine a shift-reduce parsing decision.

CSI 3120, Syntactic analysis, page 25 LR Parsing Algorithms: Details (2) The parsing table, indexed by states and grammar symbols, has two parts. They define a parsing action function and a goto function. The LR parsing program determines s m, the state on top of the stack, and a i, the current input token. In action[s m, a i ] there can be one of four values: Shift Reduce Accept Error

CSI 3120, Syntactic analysis, page 26 LR Parsing Algorithms: Details (3) action[s m, a i ] = Shift s (s is a state)  the parser pushes a i and s on the stack. action[s m, a i ] = Reduce A  β  A replaces a sequence that “covers” β. [hand-waving] Let the state now right below A in the stack be s. The value in goto[s, A] is a state: we push it onto the stack over A. action[s m, a i ] = Accept  parsing succeeds action[s m, a i ] = Error  

CSI 3120, Syntactic analysis, page 27 LR Parsing Example: The Grammar i.E  E + T ii.E  T iii.T  T * F iv.T  F v.F  (E) vi.F  id The numbers i-vi will appear in Reduce actions in the table.

CSI 3120, Syntactic analysis, page 28 LR-Parser Example: The Parsing Table StateActionGoto id+*()$ETF 0S 5S S 6Acc 2R iiS 7R ii 3R iv 4S 5S R vi 6S 5S 493 7S 5S 410 8S 6S 11 9R iS 7R i 10R iii 11R v

CSI 3120, Syntactic analysis, page 29 LR-Parser Example: a Trace StackInputAction (1)0id * id + id $Shift (2) 0 id 5* id + id $ Reduce F  id (3) 0 F 3* id + id $ Reduce T  F (4) 0 T 2* id + id $Shift (5) 0 T 2 * 7id + id $Shift (6) 0 T 2 * 7 id 5+ id $ Reduce F  id (7) 0 T 2 * 7 F 10+ id $ Reduce T  T * F (8) 0 T 2+ id $ Reduce E  T (9) 0 E 1+ id $Shift (10) 0 E 1 + 6id $Shift (11) 0 E id 5$ Reduce F  id (12) 0 E F 3$ Reduce T  F (13) 0 E T 9$ Reduce E  E + T (14) 0 E 1$Accept

CSI 3120, Syntactic analysis, page 30 SLR Parsing Definition An LR(0) item of grammar G is a production of G with a dot inserted into the right- hand side. Example From A  X Y Z we can get four items: A . X Y Z A  X. Y Z A  X Y. Z A  X Y Z. Production A   generates only one item, A . Intuitively, the dot in an item shows how much of a production we have already seen at a given moment in the parsing process.

CSI 3120, Syntactic analysis, page 31 SLR Parsing To create an SLR Parsing table, we define three new elements: an augmentation of the initial grammar G. We add the production S’ . S where S is the start symbol of G. This new starting production will tell the parser when it should stop and accept the input. the closure operation the goto function

CSI 3120, Syntactic analysis, page 32 SLR Parsing: The Closure Operation Let W be a set of productions. We construct closure(W) by applying two rules. 1.Every production in W is added to closure(W). 2.If A  . B  is in closure(W) and B   is a production, then add B .  to W, if it is not already there. We apply this rule until no more new items can be added to closure(W).

CSI 3120, Syntactic analysis, page 33 SLR Parsing: The Closure Operation – Example 0. E’ . E 1. E  E + T 2. E  T 3. T  T * F 4. T  F 5. F  ( E ) 6. F  id Let W initially be { E’ . E }. Closure(W) = { E’ . E, E . E + T, E . T, T . T * F, T . F, F . ( E ), F . id }  Added production  The original grammar

CSI 3120, Syntactic analysis, page 34 SLR Parsing: The Goto Operation goto(W, X), where W is a set of items and X is a grammar symbol, is defined as the closure of the set of all items A   X. B  such that A  . X B  is in W. Example: if W is {E  E. + T, E’  E.}, then goto(W, +) contains E  E +.T T . T * F T . F F . ( E ) F . id

CSI 3120, Syntactic analysis, page 35 SLR Parsing: Sets-of-Items Construction procedure items(G’) C = {Closure({[S’ .S]})} repeat for each set of items W in C and each grammar symbol X such that goto(W, X) is not empty and not in C do add goto(W, X) to C until no more sets of items can be added to C

CSI 3120, Syntactic analysis, page 36 Example: The Canonical LR(0) collection for grammar G’ W0:E’ . E E . E + T E . T T . T * F T . F F . ( E ) F . id W1:E’  E. E  E. + T W2:E  T. T  T. * F W3:T  F. W4:F  (. E ) E . E + T E . T T . T * F T . F F . ( E ) F . id W5:F  id. W6:E  E +. T T . T * F T . F F . ( E ) F . id W7:T  T *. F F . ( E ) F . id W8:F  ( E. ) E  E. + T W9:E  E + T. T  T. * F W10:T  T * F. W11:F  ( E ).

CSI 3120, Syntactic analysis, page 37 Constructing an SLR Parsing Table 1.Construct C = {W 0, W 1, … W n }, the collection of sets of LR(0) items for G’. 2.State i is constructed from W i. The parsing actions for state i are determined as follows: W i contains A  . t  and goto(W i, t) = W j  action[i, t] = “Shift j”. Here, t must be a terminal. W i contains A  .  action[i, t] = “Reduce A  t” for all t in Follow(A). Here A may not be S’. W i contains S’  S.  action[i, $] = “Accept” If any conflicting actions are generated by the above rules, we say that the grammar is not SLR(1). The algorithm then fails to produce a parser.

CSI 3120, Syntactic analysis, page 38 Constructing an SLR Parsing Table (continued) 3. The goto transitions for state i are constructed for all nonterminals A like this: if goto(W i, A) = W j, then goto[i, A] = j. 4. All entries not defined by rules (2) and (3) are set to “Error”. 5. The initial state of the parser is the one constructed from the set of items containing S’  S. This requires practice...