CH4.1 CSE244 Sections 4.1-4.4: Syntax Analysis Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road.

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

Compiler Construction
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Top-Down Parsing - recursive descent - predictive parsing
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
CH4.1 CSE244 Sections 4.5,4.6 Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-155 Storrs,
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
1 Problems with Top Down Parsing  Left Recursion in CFG May Cause Parser to Loop Forever.  Indeed:  In the production A  A  we write the program procedure.
Pembangunan Kompilator.  The parse tree is created top to bottom.  Top-down parser  Recursive-Descent Parsing ▪ Backtracking is needed (If a choice.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Top-Down Parsing The parse tree is created top to bottom. Top-down parser –Recursive-Descent Parsing.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Problems with Top Down Parsing
lec02-parserCFG May 8, 2018 Syntax Analyzer
Context free grammars Terminals Nonterminals Start symbol productions
Chapter 4 Syntax Analysis
Introduction to Top Down Parser
Top-down parsing cannot be performed on left recursive grammars.
Midterm Content – Excepted from PPTs
Top-Down Parsing.
Syntax Analysis Sections :.
Chapter 4: Syntax Analysis Part 2: Top-Down Parsing
Syntax Analysis Sections :.
Syntax Analysis Sections :.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7 Predictive Parsing
Chapter 4: Syntax Analysis
Top-Down Parsing Identify a leftmost derivation for an input string
Top-Down Parsing The parse tree is created top to bottom.
Chapter 4 Top Down Parser.
R.Rajkumar Asst.Professor CSE
Computing Follow(A) : All Non-Terminals
Lecture 7 Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
lec02-parserCFG May 27, 2019 Syntax Analyzer
Presentation transcript:

CH4.1 CSE244 Sections : Syntax Analysis Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road Storrs, CT

CH4.2 CSE244 Syntax Analysis - Parsing  An overview of parsing :  Functions & Responsibilities  Context Free Grammars  Concepts & Terminology  Writing and Designing Grammars  Resolving Grammar Problems / Difficulties  Top-Down Parsing  Recursive Descent & Predictive LL  Bottom-Up Parsing  LR & LALR  Concluding Remarks/Looking Ahead

CH4.3 CSE244 An Overview of Parsing Why are Grammars to formally describe Languages Important ? 1.Precise, easy-to-understand representations 2.Compiler-writing tools can take grammar and generate a compiler 3. allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones ADA  ADA9x, C++ Adds: Templates, exceptions, How do grammars relate to parsing process ?

CH4.4 CSE244 Parsing During Compilation interm repres errors lexical analyzer parser rest of front end symbol table source program parse tree get next token token regular expressions also technically part or parsing includes augmenting info on tokens in source, type checking, semantic analysis uses a grammar to check structure of tokens produces a parse tree syntactic errors and recovery recognize correct syntax report errors

CH4.5 CSE244 Parsing Responsibilities Syntax Error Identification / Handling Recall typical error types: Lexical : Misspellings Syntactic : Omission, wrong order of tokens Semantic : Incompatible types Logical : Infinite loop / recursive call Majority of error processing occurs during syntax analysis NOTE: Not all errors are identifiable !! Which ones?

CH4.6 CSE244 Key Issues – Error Processing Detecting errors Finding position at which they occur Clear / accurate presentation Recover (pass over) to continue and find later errors Don’t impact compilation of “correct” programs

CH4.7 CSE244 What are some Typical Errors ? #include int f1(int v) { int i,j=0; for (i=1;i<5;i++) { j=v+f2(i) } return j; } int f2(int u) { int j; j=u+f1(u*u) return j; } int main() { int i,j=0; for (i=1;i<10;i++) { j=j+i*I; printf(“%d\n”,i); printf("%d\n",f1(j)); return 0; } Which are “easy” to recover from? Which are “hard” ? As reported by MS VC++ 'f2' undefined; assuming extern returning int syntax error : missing ';' before ‘}‘ syntax error : missing ';' before ‘return‘ fatal error : unexpected end of file found

CH4.8 CSE244 Error Recovery Strategies Panic Mode– Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input  miss declaration – causing more errors  miss errors in skipped material -- Advantages: simple  suited to 1 error per statement Phrase Level – Local correction on input -- “,”  ”;” – Delete “,” – insert “;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped

CH4.9 CSE244 Error Recovery Strategies – (2) Error Productions: -- Augment grammar with rules -- Augment grammar used for parser construction / generation -- example: add a rule for := in C assignment statements Report error but continue compile -- Self correction + diagnostic messages Global Correction: -- Adding / deleting / replacing symbols is chancy – may do many changes ! -- Algorithms available to minimize changes costly - key issues

CH4.10 CSE244 Motivating Grammars Regular Expressions  Basis of lexical analysis  Represent regular languages Context Free Grammars  Basis of parsing  Represent language constructs  Characterize context free languages Reg. Lang. CFLs EXAMPLE: a n b n, n  1 : Is it regular ?

CH4.11 CSE244 Context Free Grammars : Concepts & Terminology Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where: T: Terminals / tokens of the language NT: Non-terminals to denote sets of strings generated by the grammar & in the language S: Start symbol, S  NT, which defines all strings of the language PR: Production rules to indicate how T and NT are combined to generate valid strings of the language. PR: NT  (T | NT)* Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model

CH4.12 CSE244 Context Free Grammars : A First Look assign_stmt  id := expr ; expr  expr operator term expr  term term  id term  real term  integer operator  + operator  - What do “blue” symbols represent? Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a sequence of terminals / tokens. Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.

CH4.13 CSE244Derivation Let’s derive: id := id + real – integer ; assign_stmt assign_stmt  id := expr ;  id := expr ;expr  expr operator term  id := expr operator term;expr  expr operator term  id := expr operator term operator term; expr  term  id := term operator term operator term; term  id  id := id operator term operator term; operator  +  id := id + term operator term; term  real  id := id + real operator term; operator  -  id := id + real - term; term  integer  id := id + real - integer; using production:

CH4.14 CSE244 Example Grammar expr  expr op expr expr  ( expr ) expr  - expr expr  id op  + op  - op  * op  / op   Black : NT Blue : T expr : S 9 Production rules To simplify / standardize notation, we offer a synopsis of terminology.

CH4.15 CSE244 Example Grammar - Terminology Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings Non Terminals: A,B,C,S, black strings T or NT: X,Y,Z Strings of Terminals: u,v,…,z in T* Strings of T / NT:  ,  in ( T  NT)* Alternatives of production rules: A   1 ; A   2 ; …; A   k ;  A   1 |  2 | … |  1 First NT on LHS of 1 st production rule is designated as start symbol ! E  E A E | ( E ) | -E | id A  + | - | * | / | 

CH4.16 CSE244 Grammar Concepts A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule. EXAMPLE: E  -E (the  means “derives” in one step) using the production rule: E  -E EXAMPLE: E  E A E  E * E  E * ( E ) DEFINITION:  derives in one step  derives in  one step  derives in  zero steps + * EXAMPLES:  A      if A   is a production rule  1   2  …   n   1   n ;    for all  If    and    then    ** **

CH4.17 CSE244 How does this relate to Languages? Let G be a CFG with start symbol S. Then S  W (where W has no non-terminals) represents the language generated by G, denoted L(G). So W  L(G)  S  W. + + W : is a sentence of G When S  (and  may have NTs) it is called a sentential form of G. * EXAMPLE: id * id is a sentence Here’s the derivation: E  E A E  E * E  id * E  id * id E  id * id Sentential forms

CH4.18 CSE244  lm Other Derivation Concepts Leftmost: Replace the leftmost non-terminal symbol E  E A E  id A E  id * E  id * id Rightmost: Replace the rightmost non-terminal symbol E  E A E  E A id  E * id  id * id rm lm rm Important Notes: A   If  A    , what’s true about  ? If  A    , what’s true about  ?  rm Derivations: Actions to parse input can be represented pictorially in a parse tree.

CH4.19 CSE244 Examples of LM / RM Derivations E  E A E | ( E ) | -E | id A  + | - | * | / |  A leftmost derivation of : id + id * id A rightmost derivation of : id + id * id

CH4.20 CSE244 Derivations & Parse Tree E EEA E EEA * E EEA id* E EEA *  E * E E  E A E  id * E  id * id

CH4.21 CSE244 Parse Trees and Derivations Consider the expression grammar: E  E+E | E*E | (E) | -E | id Leftmost derivations of id + id * id E  E + E E + E  id + E E EE+ id E EE* id + E  id + E * E E EE+ id E EE+

CH4.22 CSE244 Parse Tree & Derivations - continued E EE* id + E * E  id + id * E E EE+ id id + id * E  id + id * id E EE* E EE+ id

CH4.23 CSE244 Alternative Parse Tree & Derivation E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E EE+ E EE* id WHAT’S THE ISSUE HERE ? Two distinct leftmost derivations!

CH4.24 CSE244 Resolving Grammar Problems/Difficulties Regular Expressions : Basis of Lexical Analysis Reg. Expr.  generate/represent regular languages Reg. Languages  smallest, most well defined class of languages Context Free Grammars: Basis of Parsing CFGs  represent context free languages CFLs  contain more powerful languages Reg. Lang. CFLs a n b n – CFL that’s not regular.

CH4.25 CSE244 Resolving Problems/Difficulties – (2) Since Reg. Lang.  Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA. start 03 b 21 ba a b Recall: (a | b)*abb

CH4.26 CSE244 Resolving Problems/Difficulties – (3) Construct CFG as follows: 1.Each State I has non-terminal A i : A 0, A 1, A 2, A 3 2.If then A i  a A j 3.If then A i  bA j 4.If I is an accepting state, A i   : A 3   5.If I is a starting state, A i is the start symbol : A 0 ij a ij b : A 0  aA 0, A 0  aA 1 : A 0  bA 0, A 1  bA 2 : A 2  bA 3 T={a,b}, NT={A 0, A 1, A 2, A 3 }, S = A 0 PR ={ A 0  aA 0 | aA 1 | bA 0 ; A 1  bA 2 ; A 2  bA 3 ; A 3   } start 0 3 b 21 ba a b

CH4.27 CSE244 How Does This CFG Derive Strings ? start 03 b 21 ba a b vs. A 0  aA 0, A 0  aA 1 A 0  bA 0, A 1  bA 2 A 2  bA 3, A 3   How is abaabb derived in each ?

CH4.28 CSE244 Regular Expressions vs. CFGs Regular expressions for lexical syntax 1.CFGs are overkill, lexical rules are quite simple and straightforward 2.REs – concise / easy to understand 3.More efficient lexical analyzer can be constructed 4.RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion. CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, … Intended for structural associations between tokens ! Are tokens in correct order ?

CH4.29 CSE244 Resolving Grammar Difficulties : Motivation The structure of a grammar affects the compiler design recall “syntax-directed” translation Different parsing approaches have different needs Top-Down vs. Bottom-Up redesigning a grammar may assist in producing better parsing methods. - ambiguity -  -moves - cycles - left recursion - left factoring Grammar Problems

CH4.30 CSE244 Resolving Problems: Ambiguous Grammars Consider the following grammar segment: stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) What’s problem here ? Let’s consider a simple parse tree: stmt expr E1E1 E2E2 S3S3 S1S1 S2S2 then else if stmt Else must match to previous then. Structure indicates parse subtree for expression.

CH4.31 CSE244 Example : What Happens with this string? If E 1 then if E 2 then S 1 else S 2 How is this parsed ? if E 1 then if E 2 then S 1 else S 2 if E 1 then if E 2 then S 1 else S 2 vs. What’s the issue here ?

CH4.32 CSE244 Parse Trees for Example Form 1: stmt expr E1E1 S2S2 thenelseif expr E2E2 S1S1 then if stmt expr E1E1 thenif stmt expr E2E2 S2S2 S1S1 then else if stmt Form 2: What’s the issue here ?

CH4.33 CSE244 Removing Ambiguity Take Original Grammar: stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) Or to write more simply: S  i E t S | i E t S e S | s E  a The problem string: i a t i a t s e s

CH4.34 CSE244 Revise to remove ambiguity: stmt  matched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt S  M | U M  i E t M e M | s U  i E t S | i E t M e U E  a S  i E t S | i E t S e S | s E  a i a t i a t s e s Try the above on

CH4.35 CSE244 Resolving Difficulties : Left Recursion A left recursive grammar has rules that support the derivation : A  A , for some . + Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination. A  A   A   A  … etc. A  A  |  Take left recursive grammar: A  A  |  To the following: A   A’ A’   A’ | 

CH4.36 CSE244 Why is Left Recursion a Problem ? Consider: E  E + T | T T  T * F | F F  ( E ) | id Derive : id + id + id E  E + T  How can left recursion be removed ? E  E + T | T What does this generate? E  E + T  T + T E  E + T  E + T + T  T + T + T  How does this build strings ? What does each string have to start with ?

CH4.37 CSE244 Resolving Difficulties : Left Recursion (2) Informal Discussion: Take all productions for A and order as: A  A  1 | A  2 | … | A  m |  1 |  2 | … |  n Where no  i begins with A. Now apply concepts of previous slide: A   1 A’ |  2 A’ | … |  n A’ A’   1 A’ |  2 A’ | … |  m A’ |  For our example: E  E + T | T T  T * F | F F  ( E ) | id E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id

CH4.38 CSE244 Resolving Difficulties : Left Recursion (3) Problem: If left recursion is two-or-more levels deep, this isn’t enough S  Aa | b A  Ac | Sd |  S  Aa  Sda Algorithm: Input: Grammar G with ordered Non-Terminals A 1,..., A n Output: An equivalent grammar with no left recursion 1.Arrange the non-terminals in some order A 1 =start NT,A 2,…A n 2.for i := 1 to n do begin for j := 1 to i – 1 do begin replace each production of the form A i  A j  by the productions A i   1  |  2  | … |  k  where A j   1 |  2 |…|  k are all current A j productions ; end eliminate the immediate left recursion among A i productions end

CH4.39 CSE244 Using the Algorithm Apply the algorithm to: A 1  A 2 a | b|  A 2  A 2 c | A 1 d i = 1 For A 1 there is no left recursion i = 2 for j=1 to 1 do Take productions: A 2  A 1  and replace with A 2   1  |  2  | … |  k  | where A 1   1 |  2 | … |  k are A 1 productions in our case A 2  A 1 d becomes A 2  A 2 ad | bd | d What’s left: A 1  A 2 a | b |  A 2  A 2 c | A 2 ad | bd | d Are we done ?

CH4.40 CSE244 Using the Algorithm (2) No ! We must still remove A 2 left recursion ! A 1  A 2 a | b |  A 2  A 2 c | A 2 ad | bd | d Recall: A  A  1 | A  2 | … | A  m |  1 |  2 | … |  n A   1 A’ |  2 A’ | … |  n A’ A’   1 A’ |  2 A’ | … |  m A’ |  Apply to above case. What do you get ?

CH4.41 CSE244 Removing Difficulties :  -Moves Transformation: In order to remove A   find all rules of the form B  uAv and add the rule B  uv to the grammar G. Why does this work ? E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id Examples: A 1  A 2 a | b A 2  bd A 2 ’ | A 2 ’ A 2 ’  c A 2 ’ | bd A 2 ’ | 

CH4.42 CSE244 Removing Difficulties : Cycles How would cycles be removed ? Make sure every production is adding some terminal(s) (except a single  - production in the start NT)… e.g. S  SS | ( S ) |  Has a cycle: S  SS  S S   Transform to: S  S ( S ) | ( S ) |  Transformation: substitute A  uBv by the productions A  u  1 v|... | u  n v assuming that all the productions for B are B   1 |... |  n

CH4.43 CSE244 Removing Difficulties : Left Factoring Problem : Uncertain which of 2 rules to choose: stmt  if expr then stmt else stmt | if expr then stmt When do you know which one is valid ? What’s the general form of stmt ? A   1 |  2  : if expr then stmt  1 : else stmt  2 :  Transform to: A   A’ A’   1 |  2 EXAMPLE: stmt  if expr then stmt rest rest  else stmt | 

CH4.44 CSE244 Resolving Grammar Problems Note: Not all aspects of a programming language can be represented by context free grammars / languages. Examples: 1. Declaring ID before its use 2. Valid typing within expressions 3. Parameters in definition vs. in call These features are call context-sensitive and define yet another language class, CSL. Reg. Lang. CFLs CSLs

CH4.45 CSE244 Context-Sensitive Languages - Examples Examples: L 1 = { wcw | w is in (a | b)* } : Declare before use L 2 = { a n b m c n d m | n  1, m  1 } a n b m : formal parameter c n d m : actual parameter What does it mean when L is context-sensitive ?

CH4.46 CSE244 How do you show a Language is a CFL? L 3 = { w c w R | w is in (a | b)* } L 4 = { a n b m c m d n | n  1, m  1 } L 5 = { a n b n c m d m | n  1, m  1 } L 6 = { a n b n | n  1 }

CH4.47 CSE244Solutions L 3 = { w c w R | w is in (a | b)* } L 4 = { a n b m c m d n | n  1, m  1 } L 5 = { a n b n c m d m | n  1, m  1 } L 6 = { a n b n | n  1 } S  a S a | b S b | c S  a S d | a A d A  b A c | bc S  XY X  a X b | ab Y  c Y d | cd S  a S b | ab

CH4.48 CSE244 Example of CS Grammar L 2 = { a n b m c n d m | n  1, m  1 } S  a A c H A  a A c | B B  b B D | b D D c  c D D D H  D H d c D H  c d

CH4.49 CSE244 Top-Down Parsing Identify a leftmost derivation for an input string Why ? By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input. A  aBc  adDc  adec (scan a, scan d, scan e, scan c - accept!) Recursive-descent parsing concepts Predictive parsing Recursive / Brute force technique non-recursive / table driven Error recovery Implementation

CH4.50 CSE244 Top-Down Parsing  From Grammar to Parser, take I

CH4.51 CSE244 Recursive Descent Parsing General category of Parsing Top-Down Choose production rule based on input symbol May require backtracking to correct a wrong choice. Example: S  c A d A  ab | a input: cad cad S cd A S cd A ab S cd A ab Problem: backtrack cad S cd A a S cd A a

CH4.52 CSE244 Top-Down Parsing  From Grammar to Parser, take II

CH4.53 CSE244 Predictive Parsing Backtracking is bad! To eliminate backtracking, what must we do/be sure of for grammar? no left recursion apply left factoring (frequently) when grammar satisfies above conditions: current input symbol in conjunction with current non-terminal uniquely determines the production that needs to be applied. Utilize transition diagrams: For each non-terminal of the grammar do following: 1. Create an initial and final state 2. If A  X 1 X 2 …X n is a production, add path with edges X 1, X 2, …, X n Once transition diagrams have been developed, apply a straightforward technique to algorithmicize transition diagrams with procedure and possible recursion.

CH4.54 CSE244 Transition Diagrams Unlike lexical equivalents, each edge represents a token Transition implies: if token, match input else call proc Recall earlier grammar and its associated transition diagrams E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id 021 TE’ E: T E’:5 E’  798 FT’ T:1013 * 11 F T’:12 T’  1417 ( 15 E F:16 ) id How are transition diagrams used ? Are  -moves a problem ? Can we simplify transition diagrams ? Why is simplification critical ?

CH4.55 CSE244 How are Transition Diagrams Used ? main() { TD_E(); } TD_T() { TD_F(); TD_T’(); } TD_E() { TD_T(); TD_E’(); } TD_E’() { token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); } } TD_F() { token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} else... } TD_E’() { token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); } } What happened to  -moves? … “else unget() and terminate” NOTE: not all error conditions have been represented.

CH4.56 CSE244 How can Transition Diagrams be Simplified ? 6 E’ E’: T 

CH4.57 CSE244 How can Transition Diagrams be Simplified ? (2) 6 E’ E’: T  T  6 

CH4.58 CSE244 How can Transition Diagrams be Simplified ? (3) 6 E’ E’: T  T  6  T  6

CH4.59 CSE244 How can Transition Diagrams be Simplified ? (4) 6 E’ E’: T  T  6  T  6 21 E’T 0E:

CH4.60 CSE244 How can Transition Diagrams be Simplified ? (5) 6 E’ E’: T  T  6  T  6 21 E’T 0E: T T  6

CH4.61 CSE244 Additional Transition Diagram Simplifications Similar steps for T and T’ Simplified Transition diagrams: * F 7T:10  13T’:10 * 11 F  ( 15 E F:16 ) id Why is simplification important ? How does code change?

CH4.62 CSE244 Top-Down Parsing  From Grammar to Parser, take III

CH4.63 CSE244 Motivating Table-Driven Parsing 1. Left to right scan input 2. Find leftmost derivation Grammar: E  TE’ E’  +TE’ |  T  id Input : id + id $ Derivation: E  Processing Stack: Terminator

CH4.64 CSE244 Non-Recursive / Table Driven Empty stack symbol a+b$ Y X $ Z Input Predictive Parsing Program StackOutput Parsing Table M[A,a] (String + terminator)NT + T symbols of CFG What actions parser should take based on stack / input General parser behavior: X : top of stack a : current input 1. When X=a = $ halt, accept, success 2. When X=a  $, POP X off stack, advance input, go to When X is a non-terminal, examine M[X,a] if it is an error  call recovery routine if M[X,a] = {X  UVW}, POP X, PUSH W,V,U DO NOT expend any input

CH4.65 CSE244 Algorithm for Non-Recursive Parsing Set ip to point to the first symbol of w$; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = X  Y 1 Y 2 …Y k then begin pop X from stack; push Y k, Y k-1, …, Y 1 onto stack, with Y 1 on top output the production X  Y 1 Y 2 …Y k end else error() until X=$ /* stack is empty */ Input pointer May also execute other code based on the production used

CH4.66 CSE244Example E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id Our well-worn example ! Table M Non- terminal INPUT SYMBOL id+*()$ E E’ T T’ F E  TE’ T  FT’ F  id E’  +TE’ T’  T’  *FT’ F(E)F(E) T  FT’ E  TE’ T’  E’  T’ 

CH4.67 CSE244 Trace of Example STACKINPUTOUTPUT

CH4.68 CSE244 Trace of Example Expend Input $E $E’T $E’T’F $E’T’id $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’id $E’T’ $E’ $ id + id * id$ + id * id$ id * id$ * id$ id$ $ E  TE’ T  FT’ F  id T’   E’  +TE’ T  FT’ F  id T’  *FT’ F  id T’   E’   STACKINPUTOUTPUT

CH4.69 CSE244 Leftmost Derivation for the Example The leftmost derivation for the example is as follows: E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id

CH4.70 CSE244 What’s the Missing Puzzle Piece ? Constructing the Parsing Table M ! 1 st : Calculate First & Follow for Grammar 2 nd : Apply Construction Algorithm for Parsing Table ( We’ll see this shortly ) Basic Tools: First: Let  be a string of grammar symbols. First(  ) is the set that includes every terminal that appears leftmost in  or in any string originating from . NOTE: If   , then  is First(  ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals a that can appear directly to the right of A in some sentential form. (S   Aa , for some  and  ). NOTE: If S   A, then $ is Follow(A). * * *

CH4.71 CSE244 Motivation Behind First & Follow First: Follow: Is used to help find the appropriate reduction to follow given the top-of-the-stack non-terminal and the current input symbol. Example: If A  , and a is in First(  ), then when a=input, replace A with  (in the stack). ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Is used when First has a conflict, to resolve choices, or when First gives no suggestion. When    or   , then what follows A dictates the next choice to be made. Example: If A  , and b is in Follow(A ), then when    and b is an input character, then we expand A with , which will eventually expand to , of which b follows! (    : i.e., First(  ) contains .) * * *

CH4.72 CSE244 An example. $Sabbd$ STACKINPUTOUTPUT S  a B C d B  CB |  | S a C  b

CH4.73 CSE244 Computing First(X) : All Grammar Symbols 1. If X is a terminal, First(X) = {X} 2. If X   is a production rule, add  to First(X) 3. If X is a non-terminal, and X  Y 1 Y 2 …Y k is a production rule Place First(Y 1 ) in First(X) if Y 1  , Place First(Y 2 ) in First(X) if Y 2  , Place First(Y 3 ) in First(X) … if Y k-1  , Place First(Y k ) in First(X) NOTE: As soon as Y i  , Stop. Repeat above steps until no more elements are added to any First( ) set. Checking “Y j   ?” essentially amounts to checking whether  belongs to First(Y j ) * * * * *

CH4.74 CSE244 Computing First(X) : All Grammar Symbols - continued Informally, suppose we want to compute First(X 1 X 2 … X n ) = First (X 1 ) “+” First(X 2 ) if  is in First(X 1 ) “+” First(X 3 ) if  is in First(X 2 ) “+” … First(X n ) if  is in First(X n-1 ) Note 1: Only add  to First(X 1 X 2 … X n ) if  is in First(X i ) for all i Note 2: For First(X 1 ), if X 1  Z 1 Z 2 … Z m, then we need to compute First(Z 1 Z 2 … Z m ) !

CH4.75 CSE244 Example 1 Given the production rules: S  i E t SS’ | a S’  eS |  E  b

CH4.76 CSE244 Example 1 Given the production rules: S  i E t SS’ | a S’  eS |  E  b Verify that First(S) = { i, a } First(S’) = { e,  } First(E) = { b }

CH4.77 CSE244 Example 2 Computing First for: E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id

CH4.78 CSE244 Example 2 Computing First for: E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E) First(TE’) First(T) First(T) “+” First(E’) First(F) First((E)) “+” First(id) First(F) “+” First(T’) “(“ and “id” Not First(E’) since T   Not First(T’) since F   * * Overall: First(E) = { (, id } = First(F) First(E’) = { +,  } First(T’) = { *,  } First(T)  First(F) = { (, id }

CH4.79 CSE244 Computing Follow(A) : All Non-Terminals 1. Place $ in Follow(S), where S is the start symbol and $ signals end of input 2. If there is a production A  B , then everything in First(  ) is in Follow(B) except for . 3. If A  B is a production, or A  B  and    (First(  ) contains  ), then everything in Follow(A) is in Follow(B) (Whatever followed A must follow B, since nothing follows B from the production rule) * We’ll calculate Follow for two grammars.

CH4.80 CSE244 The Algorithm for Follow – pseudocode 1. Initialize Follow(X) for all non-terminals X to empty set. Place $ in Follow(S), where S is the start NT. 2. Repeat the following step until no modifications are made to any Follow-set For any production For any production X  X 1 X 2 … X m For j=1 to m, if is a non-terminal then: if X j is a non-terminal then: Follow(Follow( Follow(X j )=Follow(X j )  (First(X j+1,…,X m )-{  }); If If First(X j+1,…,X m ) contains  or X j+1,…,X m =  then Follow(Follow(Follow( then Follow(X j )=Follow(X j )  Follow(X);

CH4.81 CSE244 Computing Follow : 1 st Example S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Recall:

CH4.82 CSE244 Computing Follow : 1 st Example S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Recall: Follow(S) – Contains $, since S is start symbol Since S  i E t SS’, put in First(S’) – not  Since S’  , Put in Follow(S) Since S’  eS, put in Follow(S’) So…. Follow(S) = { e, $ } Follow(S’) = Follow(S) HOW? Follow(E) = { t } *

CH4.83 CSE244 Example 2 Compute Follow for: E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id

CH4.84 CSE244 Example 2 Compute Follow for: E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id Follow E$ ) E’$ ) T+ $ ) T’+ $ ) F+ * $ ) First E( id E’  + T( id T’  * F( id

CH4.85 CSE244 Constructing Parsing Table Algorithm: Table has one row per non-terminal / one column per terminal (incl. $ ) 1.Repeat Steps 2 & 3 for each rule A  2.Terminal a in First(  )? Add A  to M[A, a ] 3.1  in First(  )? Add A  to M[A, b ] for all terminals b in Follow(A). 3.2  in First(  ) and $ in Follow(A)? Add A  to M[A, $ ] 4. All undefined entries are errors.

CH4.86 CSE244 Constructing Parsing Table – Example 1 S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Follow(S) = { e, $ } Follow(S’) = { e, $ } Follow(E) = { t }

CH4.87 CSE244 Constructing Parsing Table – Example 1 S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Follow(S) = { e, $ } Follow(S’) = { e, $ } Follow(E) = { t } S  i E t SS’ S  a E  b First(i E t SS’)={i} First(a) = {a} First(b) = {b} S’  eS S   First(eS) = {e} First(  ) = {  } Follow(S’) = { e, $ } INPUT SYMBOL a$tie S S’ E b Non- terminal S  aS  iEtSS’ S  E  b S’  S’  eS

CH4.88 CSE244 Constructing Parsing Table – Example 2 E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E,E’) = { ), $} Follow(F) = { *, +, ), $ } Follow(T,T’) = { +, ), $}

CH4.89 CSE244 Constructing Parsing Table – Example 2 E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E,E’) = { ), $} Follow(F) = { *, +, ), $ } Follow(T,T’) = { +, ), $} Expression Example: E  TE’ : First(TE’) = First(T) = { (, id } M[E, ( ] : E  TE’ M[E, id ] : E  TE’ (by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’ (by rule 3) E’   :  in First(  ) T’   :  in First(  ) by rule 2 M[E’, )] : E’   (3.1) M[T’, +] : T’   (3.1) M[E’, $] : E’   (3.2) M[T’, )] : T’   (3.1) (Due to Follow(E’) M[T’, $] : T’   (3.2)

CH4.90 CSE244 LL(1) Grammars L : Scan input from Left to Right L : Construct a Leftmost Derivation 1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action LL(1) grammars == they have no multiply-defined entries in the parsing table. Properties of LL(1) grammars: Grammar can’t be ambiguous or left recursive Grammar is LL(1)  when A  1.  &  do not derive strings starting with the same terminal a 2. Either  or  can derive , but not both. Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar

CH4.91 CSE244 Error Recovery a+b$ Y X $ Z Input Predictive Parsing Program StackOutput Parsing Table M[A,a] When Do Errors Occur? Recall Predictive Parser Function: 1.If X is a terminal and it doesn’t match input. 2.If M[ X, Input ] is empty – No allowable actions Consider two recovery techniques: A. Panic Mode B. Phrase-level Recovery

CH4.92 CSE244 Panic-Mode Recovery  Assume a non-terminal on the top of the stack.  Idea: skip symbols on the input until a token in a selected set of synchronizing tokens is found.  The choice for a synchronizing set is important.  some ideas:  define the synchronizing set of A to be FOLLOW(A). then skip input until a token in FOLLOW(A) appears and then pop A from the stack. Resume parsing...  add symbols of FIRST(A) into synchronizing set. In this case we skip input and once we find a token in FIRST(A) we resume parsing from A.  Productions that lead to  if available might be used.  If a terminal appears on top of the stack and does not match to the input == pop it and and continue parsing (issuing an error message saying that the terminal was inserted).

CH4.93 CSE244 Panic Mode Recovery, II General Approach: Modify the empty cells of the Parsing Table. 1.if M[A,a] = {empty} and a belongs to Follow(A) then we set M[A,a] = “synch” Error-recovery Strategy : If A=top-of-the-stack and a=current-input, 1.If A is NT and M[A,a] = {empty} then skip a from the input. 2.If A is NT and M[A,a] = {synch} then pop A. 3.If A is a terminal and A!=a then pop token (essentially inserting it).

CH4.94 CSE244 Revised Parsing Table / Example Non- terminal INPUT SYMBOL id+*()$ E E’ T T’ F E  TE’ T  FT’ F  id E’  +TE’ T’  T’  *FT’ F(E)F(E) T  FT’ E  TE’ T’  E’  T’  From Follow sets. Pop top of stack NT “synch” action Skip input symbol

CH4.95 CSE244 Revised Parsing Table / Example(2) $E $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’ $ + id * + id$ id * + id$ * + id$ + id$ id$ $ STACKINPUTRemark error, M[F,+] = synch F has been popped error, skip + Possible Error Msg: “Misplaced + I am skipping it” Possible Error Msg: “Missing Term”

CH4.96 CSE244 Writing Error Messages  Keep input counter(s)  Recall: every non-terminal symbolizes an abstract language construct.  Examples of Error-messages for our usual grammar  E = means expression.  top-of-stack is E, input is + “Error at location i, expressions cannot start with a ‘+’” or “error at location i, invalid expression”  Similarly for E, *  E’= expression ending.  Top-of-stack is E’, input is * or id “Error: expression starting at j is badly formed at location i”  Requires: every time you pop an ‘E’ remember the location

CH4.97 CSE244 Writing Error-Messages, II  Messages for Synch Errors.  Top-of-stack is F input is +  “error at location i, expected summation/multiplication term missing”  Top-of-stack is E input is )  “error at location i, expected expression missing”

CH4.98 CSE244 Writing Error Messages, III  When the top-of-the stack is a terminal that does not match…  E.g. top-of-stack is id and the input is +  “error at location i: identifier expected”  Top-of-stack is ) and the input is terminal other than )  Every time you match an ‘(‘ push the location of ‘(‘ to a “left parenthesis” stack. –this can also be done with the symbol stack.  When the mismatch is discovered look at the left parenthesis stack to recover the location of the parenthesis.  “error at location i: left parenthesis at location m has no closing right parenthesis” –E.g. consider ( id * + (id id) $

CH4.99 CSE244 Incorporating Error-Messages to the Table  Empty parsing table entries can now fill with the appropriate error-reporting techniques.

CH4.100 CSE244 Phrase-Level Recovery Fill in blanks entries of parsing table with error handling routines that do not only report errors but may also: change/ insert / delete / symbols into the stack and / or input stream + issue error message Problems: Modifying stack has to be done with care, so as to not create possibility of derivations that aren’t in language infinite loops must be avoided Essentially extends panic mode to have more complete error handling

CH4.101 CSE244 How Would You Implement TD Parser Stack – Easy to handle. Write ADT to manipulate its contents Input Stream – Responsibility of lexical analyzer Key Issue – How is parsing table implemented ? One approach: Assign unique IDS Non- terminal INPUT SYMBOL id+*()$ E E’ T T’ F E  TE’ T  FT’ F  id E’  +TE’ T’  T’  *FT’ F(E)F(E) T  FT’ E  TE’ T’  E’  T’  synch All rules have unique IDs Ditto for synch actions Also for blanks which handle errors

CH4.102 CSE244 Revised Parsing Table: Non- terminal INPUT SYMBOL id+*()$ E E’ T T’ F E  TE’ 2 E’  +TE’ 3 E’  4 T  FT’ 5 T’  *FT’ 6 T’  7 F  (E) 8 F  id 9 – 17 : Sync Actions 18 – 25 : Error Handlers

CH4.103 CSE244 Revised Parsing Table: (2) Each # ( or set of #s) corresponds to a procedure that: Uses Stack ADT Gets Tokens Prints Error Messages Prints Diagnostic Messages Handles Errors

CH4.104 CSE244 How is Parser Constructed ? One large CASE statement: state = M[ top(s), current_token ] switch (state) { case 1: proc_E_TE’( ) ; break ; … case 8: proc_F_id( ) ; break ; case 9: proc_sync_9( ) ; break ; … case 17: proc_sync_17( ) ; break ; case 18: … case 25: } Procs to handle errors Some error handlers may be similar Combine  put in another switch Some sync actions may be same

CH4.105 CSE244 Final Comments – Top-Down Parsing So far, We’ve examined grammars and language theory and its relationship to parsing Key concepts: Rewriting grammar into an acceptable form Examined Top-Down parsing: Brute Force : Transition diagrams & recursion Elegant : Table driven We’ve identified its shortcomings: Not all grammars can be made LL(1) ! Bottom-Up Parsing - Future