Chapter 4: Syntax Analysis

Chapter 4: Syntax Analysis
Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT (860) Dr. Robert LaBarre United Technologies Research Center 411 Silver Lane E. Hartford, CT 06018

Syntax Analysis - Parsing
An overview of parsing : Functions & Responsibilities Context Free Grammars Concepts & Terminology Writing and Designing Grammars Resolving Grammar Problems / Difficulties Top-Down Parsing Recursive Descent & Predictive LL Bottom-Up Parsing LR & LALR Concluding Remarks/Looking Ahead

How do grammars relate to parsing process ?
An Overview of Parsing Why are Grammars to formally describe Languages Important ? Precise, easy-to-understand representations Compiler-writing tools can take grammar and generate a compiler allow language to be evolved (new statements, changes to statements, etc.) Languages are not static, but are constantly upgraded to add new features or fix “old” ones ADA  ADA9x, C++ Adds: Templates, exceptions, How do grammars relate to parsing process ?

Parsing During Compilation
regular expressions errors lexical analyzer parser rest of front end symbol table source program parse tree get next token token interm repres also technically part or parsing includes augmenting info on tokens in source, type checking, semantic analysis uses a grammar to check structure of tokens produces a parse tree syntactic errors and recovery recognize correct syntax report errors

Parsing Responsibilities
Syntax Error Identification / Handling Recall typical error types: Lexical : Misspellings Syntactic : Omission, wrong order of tokens Semantic : Incompatible types Logical : Infinite loop / recursive call Majority of error processing occurs during syntax analysis NOTE: Not all errors are identifiable !! Which ones?

Key Issues – Error Processing
Detecting errors Finding position at which they occur Clear / accurate presentation Recover (pass over) to continue and find later errors Don’t impact compilation of “correct” programs

What are some Typical Errors ?
program prmax(input, output); var x, y: integer; function max(I:integer; j:integer) : integer; { return maximum of integers i and j } begin if i > j then max := i else max := j end; readln (x,y); writeln (max(x,y)) end. Which are “easy” to recover from? Which are “hard” ?

Error Recovery Strategies
Panic Mode – Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input miss declaration – causing more errors miss errors in skipped material -- Advantages: simple suited to 1 error per statement Phase Level – Local correction on input -- “,” ”;” – Delete “,” – insert “;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped

Error Recovery Strategies – (2)
Error Productions: -- Augment grammar with rules -- Augment grammar used for parser construction / generation -- Add a rule for := in C assignment statements Report error but continue compile -- Self correction + diagnostic messages -- What are other rules ? (supported by yacc) Global Correction: -- Adding / deleting / replacing symbols is chancy – may do many changes ! -- Algorithms available to minimize changes costly - key issues What do you think of each approach? What approach do compilers you’ve used take ?

Motivating Grammars Regular Expressions  Basis of lexical analysis
 Represent regular languages Context Free Grammars  Basis of parsing  Represent language constructs  Characterize context free languages Reg. Lang. CFLs EXAMPLE: anbn , n  1 : Is it regular ?

Context Free Grammars : Concepts & Terminology
Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where: T: Terminals / tokens of the language NT: Non-terminals to denote sets of strings generatable by the grammar & in the language S: Start symbol, SNT, which defines all strings of the language PR: Production rules to indicate how T and NT are combines to generate valid strings of the language. PR: NT  (T | NT)* Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !

Context Free Grammars : A First Look
assign_stmt  id := expr ; expr  term operator term term  id term  real term  integer operator  + operator  - What do “blue” symbols represent? Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens. Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.

How is Grammar Used ? Given the rules on the previous slide, suppose
id := real + int; is input. Is it syntactically correct? How do we know? expr is represented as: expr  term operator term Is this accurate / complete? expr  expr operator term expr  term How does this affect the derivations which are possible?

Derivation and Parse Tree
Let’s derive: id := id + real – integer ; What’s its parse tree ?

Example Grammar expr  expr op expr expr  ( expr ) expr  - expr
expr  id op  + op  - op  * op  / op   Black : NT Blue : T expr : S 9 Production rules To simplify / standardize notation, we offer a synopsis of terminology.

Example Grammar - Terminology
Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings Non Terminals: A,B,C,S, black strings T or NT: X,Y,Z Strings of Terminals: u,v,…,z in T* Strings of T / NT:  ,  in ( T  NT)* Alternatives of production rules: A 1; A 2; …; A k;  A  1 | 2 | … | 1 First NT on LHS of 1st production rule is designated as start symbol ! E  E A E | ( E ) | -E | id A  + | - | * | / | 

Grammar Concepts A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule. EXAMPLE: E  -E (the  means “derives” in one step) using the production rule: E  -E EXAMPLE: E  E A E  E * E  E * ( E ) DEFINITION:  derives in one step  derives in  one step  derives in  zero steps + * EXAMPLES:  A      if A  is a production rule 1  2 …  n  1  n ;    for all  If    and    then    * * * *

How does this relate to Languages?
+ Let G be a CFG with start symbol S. Then S W (where W has no non-terminals) represents the language generated by G, denoted L(G). So WL(G)  S W. + W : is a sentence of G When S  (and  may have NTs) it is called a sentential form of G. EXAMPLE: id * id is a sentence Here’s the derivation: E  E A E E * E  id * E  id * id E  id * id Sentential forms *

Other Derivation Concepts
Leftmost: Replace the leftmost non-terminal symbol E  E A E  id A E  id * E  id * id Rightmost: Replace the leftmost non-terminal symbol E  E A E  E A id  E * id  id * id lm lm lm lm rm rm rm rm Important Notes: A   If A    , what’s true about  ? If A    , what’s true about  ?  lm  rm Derivations: Actions to parse input can be represented pictorially in a parse tree.

Examples of LM / RM Derivations
E  E A E | ( E ) | -E | id A  + | - | * | / |  A leftmost derivation of : id + id * id A rightmost derivation of : id + id * id

Derivations & Parse Tree
E  E A E E A *  E * E E A id *  id * E E A id *  id * id

Parse Trees and Derivations
Consider the expression grammar: E  E+E | E*E | (E) | -E | id Leftmost derivations of id + id * id E + E  id + E E + id E + E  E + E E * id + E  id + E * E + id

Parse Tree & Derivations - continued
* id + E * E  id + id * E + id id + id * E  id + id * id E * + id

Alternative Parse Tree & Derivation
E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E * + id WHAT’S THE ISSUE HERE ?

Resolving Grammar Problems/Difficulties
Regular Expressions : Basis of Lexical Analysis Reg. Expr.  generate/represent regular languages Reg. Languages  smallest, most well defined class of languages Context Free Grammars: Basis of Parsing CFGs  represent context free languages CFLs  contain more powerful languages Reg. Lang. CFLs anbn – CFL that’s not regular! Try to write DFA/NFA for it !

Resolving Problems/Difficulties – (2)
Since Reg. Lang.  Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA. Recall: (a | b)*abb start 3 b 2 1 a

Resolving Problems/Difficulties – (3)
Construct CFG as follows: Each State I has non-terminal Ai : A0, A1, A2, A3 If then Ai a Aj If then Ai Aj If I is an accepting state, Ai  : A3   If I is a starting state, Ai is the start symbol : A0 i j a : A0 aA0, A0 aA1 : A0 bA0, A1 bA2 : A2 bA3 i j b T={a,b}, NT={A0, A1, A2, A3}, S = A0 PR ={ A0 aA0 | aA1 | bA0 ; A1  bA2 ; A2  3A3 ; A3   }

How Does This CFG Derive Strings ?
start 3 b 2 1 a vs. A0 aA0, A0 aA1 A0 bA0, A1 bA2 A2 bA3, A3  How is abaabb derived in each ?

Regular Expressions vs. CFGs
Regular expressions for lexical syntax CFGs are overkill, lexical rules are quite simple and straightforward REs – concise / easy to understand More efficient lexical analyzer can be constructed RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion. Why? CFGs : Match tokens “(“ “)”, begin / end, if-then-else, whiles, proc/func calls, … Intended for structural associations between tokens ! Are tokens in correct order ?

Resolving Grammar Difficulties : Motivation
Humans write / develop grammars Different parsing approaches have different needs Top-Down vs. Bottom-Up For: 1  remove “errors” For: 2  put / redesign grammar ambiguity -moves cycles left recursion left factoring Grammar Problems

Resolving Problems: Ambiguous Grammars
Consider the following grammar segment: stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) What’s problem here ? Let’s consider a simple parse tree: stmt expr E1 E2 S3 S1 S2 then else if Else must match to previous then. Structure indicates parse subtree for expression.

Example : What Happens with this string?
If E1 then if E2 then S1 else S2 How is this parsed ? if E1 then if E2 then S1 else S2 if E1 then if E2 then S1 else S2 vs. What’s the issue here ?

Parse Trees for Example
Form 1: stmt expr E1 then if E2 S2 S1 else Form 2: stmt expr E1 S2 then else if E2 S1 What’s the issue here ?

Removing Ambiguity Take Original Grammar: Revise to remove ambiguity:
stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) Revise to remove ambiguity: stmt  matched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt How does this grammar work ?

Resolving Difficulties : Left Recursion
A left recursive grammar has rules that support the derivation : A  A, for some . + Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination. A  A  A  A … etc. A A |  Take left recursive grammar: A  A |  To the following: A’  A’ A’  A’ | 

Why is Left Recursion a Problem ?
Derive : id + id + id E  E + T  Consider: E  E + T | T T  T * F | F F  ( E ) | id How can left recursion be removed ? E  E + T | T What does this generate? E  E + T  T + T E  E + T  E + T + T  T + T + T  How does this build strings ? What does each string have to start with ?

Resolving Difficulties : Left Recursion (2)
Informal Discussion: Take all productions for A and order as: A  A1 | A2 | … | Am | 1 | 2 | … | n Where no i begins with A. Now apply concepts of previous slide: A  1A’ | 2A’ | … | nA’ A’  1A’ | 2A’ | … | m A’ |  For our example: E  E + T | T T  T * F | F F  ( E ) | id E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ | 

Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep, this isn’t enough S  Aa | b A  Ac | Sd |  S  Aa  Sda Algorithm: Input: Grammar G with no cycles or -productions Output: An equivalent grammar with no left recursion Arrange the non-terminals in some order A1,A2,…An for i := 1 to n do begin for j := 1 to i – 1 do begin replace each production of the form Ai  Aj by the productions Ai  1 | 2 | … | k where Aj  1|2|…|k are all current Aj productions; end eliminate the immediate left recursion among Ai productions

Using the Algorithm Apply the algorithm to: A1  A2a | b
A2  A2c | A1d |  i = 1 For A1 there is no left recursion i = 2 for j=1 to 1 do Take productions: A2  A1 and replace with A2  1  | 2  | … | k | where A1 1 | 2 | … | k are A1 productions in our case A2  A1d becomes A2  A2ad | bd What’s left: A1 A2a | b A2  A2 c | A2 ad | bd |  Are we done ?

Using the Algorithm (2) No ! We must still remove A2 left recursion !
A1 A2a | b A2  A2 c | A2 ad | bd |  Recall: A  A1 | A2 | … | Am | 1 | 2 | … | n A  1A’ | 2A’ | … | nA’ A’  1A’ | 2A’ | … | m A’ |  Apply to above case. What do you get ?

Removing Difficulties : Cycles
How would cycles be removed ? S  SS | ( S ) |  Has: S  SS  S S  

Removing Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose: stmt  if expr then stmt else stmt | if expr then stmt When do you know which one is valid ? What’s the general form of stmt ? A  1 |   : if expr then stmt 1: else stmt 2 :  Transform to: A   A’ A’  1 | 2 EXAMPLE: stmt  if expr then stmt rest rest  else stmt | 

Resolving Grammar Problems : Another Look
Forcing Matching Within Grammar - if-then-else  else must go with most recent if stmt  if expr then stmt | if expr then stmt else stmt | other Leads to: stmt  matched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt

Resolving Grammar Problems : Another Look (2)
Left Factoring - Know correct choice in advance ! where_queries  where are_does declarations of symbol | where are_does “search_option” appear Rules given in Assignment or where_queries  where rest_where rest_where  are declarations of symbol | does “search_option” appear

Left Recursion - Avoid infinite loop when choosing rules A  A’ A’  A’ |  A  A |  EXAMPLE: E  E + T | T T  T * F | F F  ( E ) | id E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  If we use the original production rules on input: id + id (with leftmost): E  E + T  E + T + T  E + T + T + T  T + T + T + … + T * not assured If we use the transformed production rules on input: id + id (with leftmost): E  TE’  T + TE’  id + TE’  id + id E’  id + id * *

Removing -Moves E  TE’ E’  + TE’ |  E  T E  TE’ E’  + TE’ E’  + T Is there still a problem ?

Resolving Grammar Problems
Note: Not all aspects of a programming language can be represented by context free grammars / languages. Examples: 1. Declaring ID before its use 2. Valid typing within expressions 3. Parameters in definition vs. in call These features are call context-sensitive and define yet another language class, CSL. Reg. Lang. CFLs CSLs

Context-Sensitive Languages - Examples
L1 = { wcw | w is in (a | b)* } : Declare before use L2 = { an bm cn dm | n  1, m  1 } an bm : formal parameter cn dm : actual parameter What does it mean when L is context-sensitive ?

How do you show a Language is a CFL?
L3 = { w c wR | w is in (a | b)* } L4 = { an bm cm dn | n  1, m  1 } L5 = { an bn cm dm | n  1, m  1 } L6 = { an bn | n  1 }

Solutions L3 = { w c wR | w is in (a | b)* }
L4 = { an bm cm dn | n  1, m  1 } L5 = { an bn cm dm | n  1, m  1 } L6 = { an bn | n  1 } S  a S a | b S b | c S  a S d | a A d A  b A c | bc S  XY X  a X b | ab Y  c Y d | cd S  a S b | ab

Top-Down Parsing Identify a leftmost derivation for an input string
Why ? By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input. A  aBc  adDc  adec (scan a, scan d, scan e, scan c - accept!) Recursive-descent parsing concepts Predictive parsing Recursive / Brute force technique non-recursive / table driven Error recovery Implementation

Recursive Descent Parsing Concepts
General category of Parsing Top-Down Choose production rule based on input symbol May require backtracking to correct a wrong choice. Example: S  c A d A  ab | a input: cad cad S c d A a b Problem: backtrack cad S c d A S cad S c d A a b cad S c d A a cad c d A a

Predictive Parsing : Recursive
To eliminate backtracking, what must we do/be sure of for grammar? no left recursion apply left factoring remove -moves When grammar satisfies above conditions, we can utilize current input symbol in conjunction with non-terminals to be expanded to uniquely determine the next action Utilize transition diagrams: For each non-terminal of the grammar do following: 1. Create an initial and final state 2. If A X1X2…Xn is a production, add path with edges X1, X2, … , Xn Once transition diagrams have been developed, apply a straightforward technique to algorithmicize transition diagrams with procedure and possible recursion.

Transition Diagrams Unlike lexical equivalents, each edge represents a token Transition implies: if token, choose edge, call proc Recall earlier grammar and its associated transition diagrams E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id 2 1 T E’ E: How are transition diagrams used ? Are -moves a problem ? Can we simplify transition diagrams ? Why is simplification critical ? 3 6 + 4 T E’: 5 E’  7 9 8 F T’ T: 10 13 * 11 F T’: 12 T’  14 17 ( 15 E F: 16 ) id

How are Transition Diagrams Used ?
main() { TD_E(); } TD_E’() { token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); } } What happened to -moves? NOTE: not all error conditions have been represented. TD_F() { token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} ... } TD_E() { TD_T(); TD_E’(); } TD_T() { TD_F(); TD_T’(); } TD_E’() { token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); } }

How can Transition Diagrams be Simplified ?
6 E’ E’: 5 3 + 4 T 

How can Transition Diagrams be Simplified ? (2)
6 E’ E’: 5 3 + 4 T  E’: 5 3 + 4 T  6

6 E’ E’: 5 3 + 4 T  E’: 3 + 4 T  6 E’: 5 3 + 4 T  6

6 E’ E’: 5 3 + 4 T  E’: 3 + 4 T  6 E’: 5 3 + 4 T  6 2 1 E’ T E:

6 E’ E’: 5 3 + 4 T  E’: 3 + 4 T  6 E’: 5 3 + 4 T  6 2 1 E’ T E: T E: 3 + 4  6

5 3 + 4 T  E’: 3 + 4 T  6 E’: 5 3 + 4 T  6 2 1 E’ T E: + T E: 3  6 T E: 3 + 4  6 How ?

Additional Transition Diagram Simplifications
Similar steps for T and T’ Simplified Transition diagrams: * F 7 T: 10  13 Why is simplification important ? How does code change? T’: 10 * 11 F  13 14 17 ( 15 E F: 16 ) id

Motivating Table-Driven Parsing
1. Left to right scan input 2. Find leftmost derivation Terminator Grammar: E  TE’ E’  +TE’ |  T  id Input : id + id $ Derivation: E  Processing Stack:

Non-Recursive / Table Driven
+ b $ Y X Z Input Predictive Parsing Program Stack Output Parsing Table M[A,a] (String + terminator) NT + T symbols of CFG What actions parser should take based on stack / input Empty stack symbol General parser behavior: X : top of stack a : current input 1. When X=a = $ halt, accept, success 2. When X=a  $ , POP X off stack, advance input, go to 1. 3. When X is a non-terminal, examine M[X,a] if it is an error  call recovery routine if M[X,a] = {X  UVW}, POP X, PUSH W,V,U DO NOT expend any input

Algorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = XY1Y2…Yk then begin pop X from stack; push Yk, Yk-1, … , Y1 onto stack, with Y1 on top output the production XY1Y2…Yk end until X=$ /* stack is empty */ Input pointer May also execute other code based on the production used

Example Our well-worn example ! Table M E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id Our well-worn example ! Table M Non-terminal INPUT SYMBOL id + * ( ) $ E E’ T T’ F ETE’ TFT’ Fid E’+TE’ T’ T’*FT’ F(E) E’

Trace of Example Expend Input STACK INPUT OUTPUT $E $E’T $E’T’F
$E’T’id $E’T’ $E’ $E’T+ $E’T’F* $ id + id * id$ + id * id$ id * id$ * id$ id$ E TE’ T FT’ F  id T’   E’  +TE’ T’  *FT’ E’   STACK INPUT OUTPUT Expend Input

Leftmost Derivation for the Example
The leftmost derivation for the example is as follows: E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id

What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M ! 1st : Calculate First & Follow for Grammar 2nd: Apply Construction Algorithm for Parsing Table ( We’ll see this shortly ) Conceptual Perspective: First: Let  be a string of grammar symbols. First() are the first terminals that can appear in  in any possible derivation. NOTE: If   , then  is First( ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals that can appear directly to the right of A in some sentential form. (S  Aa, for some  and ). NOTE: If S  A, then $ is Follow(A). * * *

Computing First(X) : All Grammar Symbols
1. If X is a terminal, First(X) = {X} 2. If X  is a production rule, add  to First(X) 3. If X is a non-terminal, and X Y1Y2…Yk is a production rule Place First(Y1) in First(X) if Y1 , Place First(Y2) in First(X) if Y2  , Place First(Y3) in First(X) … if Yk-1  , Place First(Yk) in First(X) NOTE: As soon as Yi   , Stop. May repeat 1, 2, and 3, above for each Yj * * * *

Computing First(X) : All Grammar Symbols - continued
Informally, suppose we want to compute First(X1 X2 … Xn ) = First (X1) “+” First(X2) if  is in First(X1) “+” First(X3) if  is in First(X2) “+” … First(Xn) if  is in First(Xn-1) Note 1: Only add  to First(X1 X2 … Xn) if  is in First(Xi) for all i Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !

Example Computing First for: E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id First(TE’) First(T) “+” First(E’) First(E) * Not First(E’) since T   First(T) First(F) “+” First(T’) First(F) * Not First(T’) since F   First((E)) “+” First(id) “(“ and “id” Overall: First(E) = { ( , id } = First(F) First(E’) = { + ,  } First(T’) = { * ,  } First(T)  First(F) = { ( , id }

Example 2 Given the production rules: Verify that S  i E t SS’ | a
S’  eS |  E  b Verify that First(S) = { i, a } First(S’) = { e,  } First(E) = { b }

Computing Follow(A) : All Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $ signals end of input 2. If there is a production A B, then everything in First() is in Follow(B) except for . 3. If A B is a production, or A B and    (First() contains  ), then everything in Follow(A) is in Follow(B) (Whatever followed A must follow B, since nothing follows B from the production rule) * We’ll calculate Follow for two grammars.

Example Compute Follow for: E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id Follow(E) - contains $ since E is the start symbol. Also, since F  (E) then First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ } Follow(E’) : E  TE’ implies Follow(E) is in Follow(E’), and Follow(E’) = { ) , $ } Follow(T) : E  TE’ implies put in First(E’). Since E’  , put in Follow(E). Since E’  +TE’ , Put in First(E’), and since E’  , put in Follow(E’). Thus Follow(T) = { +, ), $ }. Follow(T’) Follow(F) * * You do these !

Computing Follow : 2nd Example
Recall: S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Follow(S) – Contains $, since S is start symbol Since S  i E t SS’ , put in First(S’) – not  Since S’  , Put in Follow(S) Since S’  eS, put in Follow(S’) So…. Follow(S) = { e, $ } Follow(S’) = Follow(S) HOW? Follow(E) = { t } *

First & Follow – One More Look
Consider the following derivation: E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: First(E) = { ( , id } What’s First for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: First(T) = { ( , id } What’s First for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: First(T’) = { * ,  } What’s First for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: First(E’) = { + ,  } What’s First for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: First(F) = { ( , id } What’s First for each non-terminal ? You do First(F) ! E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: What’s First for each non-terminal ? Still needs your First(F) E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: Follow(E) = { ( , id } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: Follow(T) = { + , ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  This “+” in Follow(T) comes from the First(E’) * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: Follow(T’) = { + , ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: Follow(E’) = { ) , $ } What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: Follow(F) = { +, *, ) , $ } You do Follow(F) ! What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: Still needs your Follow(F) What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: Still needs your First(F) and Follow(F) What’s First for each non-terminal ? What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: What are implications ? ( id ) * id + id$ (input) M - Table E  TE’  FT’E’  ( E ) T’E’  1. E  TE’ and ( in First(E) 2. TFT’ and ( in First(T) 3. F (E) and ( in First(F) 4. E’  and ) in Follow(E’) 5. Since $ in Follow(T’), T’ 6. Since $ in Follow(E’), E’ 1. M [ E, ( ] 2. M [ T, ( ] 3. M [ F, ( ] ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  4. M [ E’, ) ] ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ 5. M [ T’, $ ] 6. M [ E’, $ ]

Motivation Behind First & Follow
Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream) Example: If A   , and a is in First(), then when a=input, replace with . ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Follow: Is used when First has a conflict, to resolve choices. When    or   , then what follows A dictates the next choice to be made. * Example: If A   , and b is in Follow(A ), then when a  , and if b is an input character, then we expand A with  , which will eventually expand to , of which b follows! ( Above   . Here First( ) contains .) * *

Constructing Parsing Table
Algorithm: Repeat Steps 2 & 3 for each rule A Terminal a in First()? Add A  to M[A, a ] 3.1  in First()? Add A  to M[A, a ] for all terminals b in Follow(A). 3.2  in First() and $ in Follow(A)? Add A  to M[A, $ ] 4. All undefined entries are errors.

Constructing Parsing Table - Example
E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E,E’) = { ), $} Follow(F) = { *, +, ),  } Follow(T,T’) = { +, ),  } Expression Example: E  TE’ : First(TE’) = First(T) = { (, id } M[E, ( ] : E  TE’ M[E, id ] : E  TE’ (by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’ (by rule 3) E’   :  in First( ) T’   :  in First( ) by rule 2 M[E’, )] : E’   (3.1) M[T’, +] : T’   (3.1) M[E’, $] : E’   (3.2) M[T’, )] : T’   (3.1) (Due to Follow(E’) M[T’, $] : T’   (3.2)

Constructing Parsing Table – Example 2
S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Follow(S) = { e, $ } Follow(S’) = { e, $ } Follow(E) = { t } S  i E t SS’ S  a E  b First(i E t SS’)={i} First(a) = {a} First(b) = {b} S’  eS S   First(eS) = {e} First() = {} Follow(S’) = { e, $ } Non-terminal INPUT SYMBOL a b e i t $ S S a S iEtSS’ S’ S’  S’ eS S  E E b

LL(1) Grammars L : Scan input from Left to Right
L : Construct a Leftmost Derivation 1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action LL(1) grammars have no multiply-defined entries in the parsing table. Properties of LL(1) grammars: Grammar can’t be ambiguous or left recursive Grammar is LL(1) when A  1.  &  do not dervie strings starting with the same terminal a 2. Either  or  can derive , but not both. Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar

Predictive Parsing Program
Error Recovery When Do Errors Occur? Recall Predictive Parser Function: a + b $ Y X Z Input Predictive Parsing Program Stack Output Parsing Table M[A,a] If X is a terminal and it doesn’t match input. If M[ X, Input ] is empty – No allowable actions Consider two recovery techniques: A. Panic Mode B. Phase-level Recovery

Panic Mode Recovery Augment parsing table with action that attempts to realign / synchronize token stream with the expected input. Suppose : A on top of stack doesn’t mesh with current input symbol 1. Use Follow(A) to remove input tokens – sync (discard) 2. Use First(A) to determine when to restart parsing 3. Incorporate higher level language concepts (begin/end, while, repeat/until) to sync actions  we don’t skip tokens unnecessarily. Other actions: 4. When A  , use it to manipulate stack to postpone error detection 5. Use non-matching terminal on stack as token that is inserted into input.

Revised Parsing Table / Example
Non-terminal INPUT SYMBOL id + * ( ) $ E E’ T T’ F ETE’ TFT’ Fid E’+TE’ T’ T’*FT’ F(E) E’ synch synch synch synch Skip input symbol From Follow sets. Pop stack entry – T or NT

Revised Parsing Table / Example(2)
$E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’ $E’T+ $ ) id * + id$ id * + id$ * + id$ + id$ id$ STACK INPUT Remark error, skip ) id is in First(E) error, M[F,+] = synch F has been popped

Phase-Level Recovery Fill in blanks entries of parsing table with error handling routines These routines modify stack and / or input stream issue error message Problems: Modifying stack has to be done with care, so as to not create possibility of derivations that aren’t in language infinite loops must be avoided Can be used in conjunction with panic mode to have more complete error handling

How Would You Implement TD Parser
Stack – Easy to handle. Write ADT to manipulate its contents Input Stream – Responsibility of lexical analyzer Key Issue – How is parsing table implemented ? One approach: Assign unique IDS Non-terminal INPUT SYMBOL id + * ( ) $ E E’ T T’ F ETE’ TFT’ Fid E’+TE’ T’ T’*FT’ F(E) E’ synch All rules have unique IDs Also for blanks which handle errors Ditto for synch actions

Revised Parsing Table:
INPUT SYMBOL Non-terminal id + * ( ) $ E 1 18 19 1 9 10 E’ 20 2 21 22 3 3 T 4 11 23 4 12 13 T’ 24 6 5 25 6 6 F 8 14 15 7 16 17 1 ETE’ 2 E’+TE’ 3 E’ 4 TFT’ 5 T’*FT’ 6 T’ 7 F(E) 8 Fid 9 – 17 : Sync Actions 18 – 25 : Error Handlers

Revised Parsing Table: (2)
Each # ( or set of #s) corresponds to a procedure that: Uses Stack ADT Gets Tokens Prints Error Messages Prints Diagnostic Messages Handles Errors

How is Parser Constructed ?
One large CASE statement: state = M[ top(s), current_token ] switch (state) { case 1: proc_E_TE’( ) ; break ; … case 8: proc_F_id( ) ; case 9: proc_sync_9( ) ; case 17: proc_sync_17( ) ; case 18: case 25: } Combine  put in another switch Some sync actions may be same Some error handlers may be similar Procs to handle errors

Final Comments – Top-Down Parsing
So far, We’ve examined grammars and language theory and its relationship to parsing Key concepts: Rewriting grammar into an acceptable form Examined Top-Down parsing: Brute Force : Transition diagrams & recursion Elegant : Table driven We’ve identified its shortcomings: Not all grammars can be made LL(1) ! Bottom-Up Parsing - Future

Chapter 4: Syntax Analysis

Similar presentations

Presentation on theme: "Chapter 4: Syntax Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4: Syntax Analysis

Similar presentations

Presentation on theme: "Chapter 4: Syntax Analysis"— Presentation transcript:

Similar presentations

About project

Feedback