Midterm Content – Excepted from PPTs

Midterm Content – Excepted from PPTs

Chapter 3: Lexical Analysis and Flex
Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT (860) Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

Rules for Specifying Regular Expressions:
 is a regular expression denoting { } If a is in , a is a regular expression that denotes {a} Let r and s be regular expressions with languages L(r) and L(s). Then (a) (r) | (s) is a regular expression  L(r)  L(s) (b) (r)(s) is a regular expression  L(r) L(s) (c) (r)* is a regular expression  (L(r))* (d) (r) is a regular expression  L(r) All are Left-Associative. precedence

Algebraic Properties of Regular Expressions
AXIOM DESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r (s t) r = r r = r r* = ( r |  )* r ( s | t ) = r s | r t ( s | t ) r = s r | t r r** = r* | is commutative | is associative concatenation is associative concatenation distributes over | relation between * and   Is the identity element for concatenation * is idempotent

Towards Token Definition
Regular Definitions: Associate names with Regular Expressions For Example : PASCAL IDs letter  A | B | C | … | Z | a | b | … | z digit  0 | 1 | 2 | … | 9 id  letter ( letter | digit )* Shorthand Notation: “+” : one or more r* = r+ |  & r+ = r r* “?” : zero or one [range] : set range of characters (replaces “|” ) [A-Z] = A | B | C | … | Z Using Shorthand : PASCAL IDs id  [A-Za-z][A-Za-z0-9]* We’ll Use Both Techniques

Epsilon-Transitions Given the regular expression: (a (b*c)) | (a (b |c+)?) Find a transition diagram NFA that recognizes it. Solution ?

NFA to DFA Conversion – Main Idea
Look at the state reachable without consuming any input, and Aggregate them in macro states

Final Result A state is final If one of the NFA states was final

Regular Expression to NFA Construction
We now focus on transforming a Reg. Expr. to an NFA This construction allows us to take: Regular Expressions (which describe tokens) To an NFA (to characterize language) To a DFA (which can be computerized) The construction process is componentwise Builds NFA from components of the regular expression in a special order with particular techniques. NOTE: Construction is syntax-directed translation, i.e., syntax of regular expression is determining factor for NFA construction and structure.

Construct NFA Solutions
 start : a : b: ab: a start b start b  a start

Construct NFA Solutions
 | ab : a* ( | ab )* : a   <

Construction Algorithm : R.E.  NFA
Construction Process : 1st : Identify subexpressions of the regular expression   symbols r | s rs r* 2nd : Characterize “pieces” of NFA for each subexpression

Piecing Together NFAs 1. For  in the regular expression, construct NFA L()  start i f 2. For a   in the regular expression, construct NFA a start i f L(a)

Piecing Together NFAs – continued(1)
3.(a) If s, t are regular expressions, N(s), N(t) their NFAs s|t has NFA:  i f N(s) N(t) L(s)  L(t) start where i and f are new start / final states, and -moves are introduced from i to the old start states of N(s) and N(t) as well as from all of their final states to f.

3.(b) If s, t are regular expressions, N(s), N(t) their NFAs st (concatenation) has NFA: start i f N(s) N(t) L(s) L(t) Alternative: overlap  where i is the start state of N(s) (or new under the alternative) and f is the final state of N(t) (or new). Overlap maps final states of N(s) to start state of N(t).

3.(c) If s is a regular expressions, N(s) its NFA, s* (Kleene star) has NFA:  N(s)  start i f  where : i is new start state and f is new final state -move i to f (to accept null string) -moves i to old start, old final(s) to f -move old final to old start (WHY?)

Parse Tree for this regular expression:
Detailed Example See example 3.16 in textbook for (a | b)*abb 2nd Example - (ab*c) | (a(b|c*)) Parse Tree for this regular expression: r13 r12 r5 r3 r11 r4 r9 r10 r8 r7 r6 r0 r1 r2 b * c a | ( ) What is the NFA? Let’s construct it !

Detailed Example – Construction(1)
b r2: c b  r1:

r4 : r1 r2 b  c r5 : r3 r4 b  a c

 r8: r11: a r7: b r6: c

 r9 : r7 | r8 b r10 : r9 c  r12 : r11 r10 b a

Organized by Portion of Parse Tree
c b  r1: r4 : r1 r2 b  c r5 : r3 r4 b  a c

Organized by Portion of Parse Tree
c  r8: r11: a r7: b r6: c c  r9 : r7 | r8 b r10 : r9 c  r12 : r11 r10 b a

Detailed Example – Final Step
r13 : r5 | r12 b  a c 1 6 5 4 3 8 2 10 9 12 13 14 11 15 7 16 17

Converting NFA to DFA – 1st Look
8 5 4 7 3 6 2 1  b a c From State 0, Where can we move without consuming any input ? This forms a new state: 0,1,2,6,8 What transitions are defined for this new state ?

Showing the Steps a 3 0, 1, 2, 6, 8 a a c b 1, 2, 5, 6, 7, 8
1, 2, 4, 5, 6, 8 c c

The Resulting DFA Which States are FINAL States ?
1, 2, 5, 6, 7, 8 1, 2, 4, 5, 6, 8 0, 1, 2, 6, 8 3 c b a Which States are FINAL States ? D C A B c b a How do we handle alphabet symbols not defined for A, B, C, D ?

Chapter 4: Syntax Analysis Part 1: Grammar Concepts

Context Free Grammars : Concepts & Terminology
Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where: T: Terminals / tokens of the language NT: Non-terminals to denote sets of strings generatable by the grammar & in the language S: Start symbol, SNT, which defines all strings of the language PR: Production rules to indicate how T and NT are combines to generate valid strings of the language. PR: NT  (T | NT)* Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !

Context Free Grammars : A First Look
assign_stmt  id := expr ; expr  term operator term term  id term  real term  integer operator  + operator  - What do “BLUE” symbols represent? What do “BLACK” symbols represent? Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens. Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.

Example Grammar expr  expr op expr expr  ( expr ) expr  - expr
expr  id op  + op  - op  * op  / op   Black : NT Blue : T expr : S 9 Production rules To simplify / standardize notation, we offer a synopsis of terminology.

Leftmost and Rightmost Derivations
Leftmost: Replace the leftmost non-terminal symbol E  E A E  id A E  id * E  id * id Rightmost: Replace the rightmost non-terminal symbol E  E A E  E A id  E * id  id * id lm lm lm lm rm rm rm rm Important Notes: A   If A    , what’s true about  ? If A    , what’s true about  ?  lm  rm Derivations: Actions to parse input can be represented pictorially in a parse tree.

Examples of LM Derivations
E  E A E | ( E ) | -E | id A  + | - | * | / |  A leftmost derivation of : id + id * id  id +E A E E  E A E  E A E A E  id A E A E  id + id A E  id + id * E  id + id * id Another leftmost derivation of : id + id * id E  E A E  id A E  id + E  id + E A E  id + id * id  id + id A E  id + id * E

Examples of RM Derivations
E  E A E | ( E ) | -E | id A  + | - | * | / |  A rightmost derivation of : id + id * id E  E A E  E A E A E  E A E A id  E A E * id  E A id * id  E + id * id  id + id * id Another rightmost derivation of : id + id * id E  E A E  E A id  E * id  E A E * id  id + id * id  E A id * id  E + id * id

Resolving Grammar Problems/Difficulties
How do Regular Expressions and Context Free Grammars Related? What are Possible Grammar Problems? Ambiguous Grammars Where Does Else Match? Left Recursion in Grammar How Do you know When to Stop Derviation Epsilon Moves (ε-Moves) How Do Empty Moves impact Grammars? Cycles What if the Grammar Cycles Back to Earlier Rule? Left Factoring What if two Grammar Rules are Too Similar?

Resolving Grammar Difficulties: Motivation
Humans write / develop grammars Different parsing approaches have different needs LL(k) Recursive LR(k) LALR(k) Top-Down vs. Bottom-Up For: 1  remove “errors” For: 2  put / redesign grammar ambiguity left recursion -moves cycles left factoring Grammar Problems

Resolving Problems: Ambiguous Grammars
Consider the following grammar segment: stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) What’s problem here ? Consider the Program: if e1 then if e2 then s1 else s2 Else must match to previous then. Structure indicates parse subtree for expression.

Resulting Parse Tree Easy case Else must match to previous then. if e1
then s1 else if e2 then s2 else s3

Example : What Happens with this string?
If E1 then if E2 then S1 else S2 How is this parsed ? if E1 then if E2 then S1 else S2 if E1 then if E2 then S1 else S2 vs. What’s the issue here ?

Parse Trees for Example
if e1 then if e2 then s1 else s2 Form 1: Form 2: What’s the issue here ?

Removing Ambiguity Take Original Grammar: Revise to remove ambiguity:
stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) Revise to remove ambiguity: stmt  matched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt How does this grammar work ?

How does this grammar work ?
matched_stmt Forces the else to match to the if The then and else can also be matched_stmt Else always matches the previous then regardless of the nesting unmatched_stmt Allows a link to stmt (matched/unmatched) Second option allows matched or unmatched stmt  matched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt

Resolving Difficulties : Left Recursion
A left recursive grammar has rules that support the derivation : A  A, for some . + Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination. Don’t know when to Choose ! A  A  A  A … etc. A A |  Transition left recursive grammar: A  A |  To the following: A  A’ A’  A’ | 

Why is Left Recursion a Problem ?
Input: id + id + id Consider: E  E + T | T T  T * F | F F  ( E ) | id E  E + T  T + T How do you pick the right one? E  E + T  E + T + T  T + T + T E  E + T  E + T + T  E + T + T + T  What’s the Problem Here?

A left recursive grammar has rules that support the derivation : A  A, for some . + Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination. A  A  A  A … etc. A A |  Take left recursive grammar: A  A |  To the following: A  A’ A’  A’ | 

Informal Discussion: Take all productions for A and order as: A  A1 | A2 | … | Am | 1 | 2 | … | n Where no i begins with A. Now apply concepts of previous slide: A  1A’ | 2A’ | … | nA’ A’  1A’ | 2A’ | … | m A’ |  For our example A  A1 |  A  A1 | 1 E  E + T | T T  T * F | F

Left Recursion Summary
This Only Works for Direct (Obvious – can See) Left Recursion: Take all productions for A and order as: A  A1 | A2 | … | Am | 1 | 2 | … | n Where no i begins with A. Now apply concepts of previous slide: A  1A’ | 2A’ | … | nA’ A’  1A’ | 2A’ | … | m A’ |  For our example: E  E + T | T T  T * F | F F  ( E ) | id E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ | 

Problem: If left recursion is two-or-more levels deep, this isn’t enough For below: Where is Direct Left Recursion? What can Happen in a Derivation of adcca? S  Aa | b A  Ac | Sd |  S  Aa S  Aa  Aca A  Ac  Acca A  Ac  Sdcca A  Sd  Aadcca S  Aa  adcca A  

Problem: If left recursion is two-or-more levels deep, this isn’t enough S  Aa | b A  Ac | Sd |  S  Aa  Aca  Acca  Sdcca Algorithm: Input: Grammar G with no cycles or -productions Output: An equivalent grammar with no left recursion Arrange the non-terminals in some order A1,A2,…An for i := 1 to n do begin for j := 1 to i – 1 do begin replace each production of the form Ai  Aj by the productions Ai  1 | 2 | … | k where Aj  1|2|…|k are all current Aj productions; end j for loop eliminate the immediate left recursion among Ai productions end i for loop

First –Rename Non-Terminals and Set Values of two for Loops
Original Grammar: S  Aa | b A  Ac | Sd |  Renamed Grammar: A1  A2a | b A2  A2c | A1d |  General Algorithm for Loops – n number of NTs for i := 1 to n do begin for j := 1 to i – 1 do Specific Algorithm for Loops for i := 1 to 2 do begin

Using the Algorithm Apply the algorithm to: A1  A2a | b
A2  A2c | A1d |  i = 1 j loop is j=1 to 0 so doesn’t execute eliminate the immediate left recursion among A1 productions For A1 there is no left recursion i = 2 for j=1 to 1 do Take productions: A2  A1 and replace with A2  1  | 2  | … | k | where A1 1 | 2 | … | k are A1 productions in our case A2  A1d becomes A2  A2ad | bd in A2  A1d replace A1 with A1  A2a | b What’s left: A1 A2a | b A2  A2 c | A2 ad | bd |  Are we done ?

Using the Algorithm No ! We must now remove A2 direct left recursion !
A1 A2a | b A2  A2 c | A2 ad | bd |  Recall: A  A1 | A2 | … | Am | 1 | 2 | … | n A  1A’ | 2A’ | … | nA’ A’  1A’ | 2A’ | … | m A’ |  Apply to above case. What do you get ? A1 A2a | b A2  bd A2’ | A2’ A2’  c A2’ | ad A2’ | 

Apply Algorithm to Revised Earlier Ex
E  E + T | T T  T * F | F F  E | ( E ) | id A1  A1 + A2 | A2 A2  A2 * A3 | A3 A3  A1 | (A1 ) | id i = 1 j loop is j=1 to 0 so doesn’t execute eliminate the immediate left recursion among A1 productions A1  A2 A1’ A1’  + A2 A1’ |  Grammar is now: A1  A2 A1’ A1’  + A2 A1’ |  A2  A2 * A3 | A3 A3  A1 | (A1 ) | id

E  E + T | T T  T * F | F F  E | ( E ) | id A1  A1 + A2 | A2 A2  A2 * A3 | A3 A3  A1 | (A1 ) | id i = 2 j loop is j=1 to 1 so does execute one time Does A2  A1  ? No – don’t do anything eliminate the immediate left recursion among A2 productions A2  A3 A2’ A2’ *A3 A3’ |  Grammar is now: A1  A2 A1’ A1’  + A2 A1’ |  A2  A3 A2’ A2’ *A3 A3’ |  A3  A1 | (A1 ) | id

A1  A2 A1’ A1’  + A2 A1’ |  A2  A3 A2’ A2’ *A3 A3’ |  A3  A1 | (A1 ) | id i = 3 j loop is j=1 to 2 so executes two times j=1 Does A3  A1  ? Yes! Replace A1 with A2 A1’ in A3  A1 A3  A2 A1 | (A1 ) | id Grammar is now A1  A2 A1’ A1’  + A2 A1’ |  A2  A3 A2’ A2’ *A3 A3’ |  A3  A2 A1 | (A1 ) | id

Removing Difficulties : -Moves A First Look
Very Simple: A  and B uAv implies add B uv to the grammar G. Replace A with on all r.h.s. rules Why does this work ? Examples: E  TE’ E  T E’  + TE’ E’  + T E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  T  FT’ T  F T’  * FT’ T’  * F F  ( E ) | id

In Class Left Recursion Example Spring 92 Midterm
S  A A |  A  S B | x B  A y | y for i := 1 to 3 do begin (for S, A, and B) i=1: Just look for direct left recursion of S eliminate the immediate left recursion among S productions i=2: j=1 Does A  S   if so replace S with all alternatives eliminate the immediate left recursion among A productions i=3: j=1 Does B  S  if so replace S with all alternatives i=3: j=2 Does B  A  if so replace A with all alternatives eliminate the immediate left recursion among B productions end i for loop

Removing Difficulties : -Moves
Very Simple: A  and B uAv implies add B uv to the grammar G. A1  A2 a | b A2  bd A2’ | A2’ A2’  c A2’ | bd A2’ |  First do: A2’   Add: A2  bd |  A2’  c | bd Revised Grammar A1  A2 a | b A2  bd A2’ | bd |  A2’  c A2’ | bd A2’ | c | bd Now do: A2   Add: A1  a Final Grammar A1  A2 a | b | a A2  bd | A2’ bd A2’  c A2’ | bd A2’ | c | bd

Removing Difficulties : -Moves Fall 91 Midterm
S  ( L ) | E b | a L  c L’ | ( L ) L’ | E b L’ | a L’ L’  , S L’ |  E  [ E ] E’ | c L’ c E’ | ( L ) L’ c E’ | a L’ c E’ E’  d E’ | b L’ c E’ |  Do: L’   S  ( L ) | E b | a L  c L’ | ( L ) L’ | E b L’ | a L’ | c | ( L ) | E b | a L’  , S L’ | , S E  [ E ] E’ | c L’ c E’ | ( L ) L’ c E’ | a L’ c E’ | c c E’ | ( L ) c E’ | a c E’ E’  d E’ | b L’ c E’ | b c E’ |  Do: E’   S  ( L ) | E b | a L  c L’ | ( L ) L’ | E b L’ | a L’ | c | ( L ) | E b | a L’  , S L’ | , S E  [ E ] E’ | c L’ c E’ | ( L ) L’ c E’ | a L’ c E’ | c c E’ | ( L ) c E’ | a c E’ | [ E ] | c L’ c | ( L ) L’ c | a L’ c | c c | ( L ) c | a c E’  d E’ | b L’ c E’ | b c E’ | d | b L’ c | b c

Removing Difficulties : -Moves Spring 92 Midterm
S  A A |  A  B A’ | x A’ A’  A B A’ |  B  x A’ y B’ | y B’ B’  A’ y B’ |  Do A’   S  A A |  A  B A’ | x A’ | B | x A’  A B A’ | A B B  x A’ y B’ | y B’ | x y B’ | y B’ B’  A’ y B’ |  | y B’ Do B’   S  A A |  A  B A’ | x A’ | B | x A’  A B A’ | A B B  x A’ y B’ | y B’ | x y B’ | y B’ | x A’ y | y | x y | y B’  A’ y B’ | y B’ | A’ y | y

Removing Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose: stmt  if expr then stmt else stmt | if expr then stmt When do you know which one is valid ? What to you Notice about the Two Rules?

Removing Difficulties : Left Factoring
How can you Fix the Problem? Abstract out Commonalities – Rework the Rule stmt  if expr then stmt else stmt | if expr then stmt What’s the general form of stmt ? A  1 |   : if expr then stmt 1: else stmt 2 :  Transform to: A   A’ A’  1 | 2 EXAMPLE: stmt  if expr then stmt rest rest  else stmt | 

Chapter 4: Syntax Analysis Part 2: Top-Down Parsing

LL(1) Grammars for Top-Down Parsing
L : Scan input from Left to Right L : Construct a Leftmost Derivation 1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action LL(1) grammars have no multiply-defined entries in the parsing table. Properties of LL(1) grammars: Grammar can’t be ambiguous or left recursive Grammar is LL(1) when A  1.  &  do not derive strings starting with the same terminal a 2. Either  or  can derive , but not both. Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar

Non-Recursive / Table Driven
+ b $ Y X Z Input Predictive Parsing Program Stack Output Parsing Table M[A,a] (String + terminator) NT + T symbols of CFG What actions parser should take based on stack / input Empty stack symbol General parser behavior: X : top of stack a : current input 1. When X=a = $ halt, accept, success 2. When X=a  $ , POP X off stack, advance input, go to 1. 3. When X is a non-terminal, examine M[X,a] if it is an error  call recovery routine if M[X,a] = {X  UVW}, POP X, PUSH W,V,U in reverse order DO NOT expend any input

Algorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = XY1Y2…Yk then begin pop X from stack; push Yk, Yk-1, … , Y1 onto stack, with Y1 on top output the production XY1Y2…Yk end until X=$ /* stack is empty */ Input pointer May also execute other code based on the production used

Example Our well-worn example ! Table M E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id Our well-worn example ! Table M Non-terminal INPUT SYMBOL id + * ( ) $ E E’ T T’ F ETE’ TFT’ Fid E’+TE’ T’ T’*FT’ F(E) E’

Trace of Example Expend Input STACK INPUT OUTPUT $E $E’T $E’T’F
$E’T’id $E’T’ $E’ $E’T+ $E’T’F* $ id + id * id$ + id * id$ id * id$ * id$ id$ $ E TE’ T FT’ F  id T’   E’  +TE’ T’  *FT’ E’   Expend Input

Leftmost Derivation for the Example
The leftmost derivation for the example is as follows: E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id

What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M ! 1st : Calculate First & Follow for Grammar 2nd: Apply Construction Algorithm for Parsing Table Conceptual Perspective: First: Let  be a string of grammar symbols. First() are the first terminals that can appear in  in any possible derivation. NOTE: If   , then  is First( ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals that can appear directly to the right of A in some sentential form. (S  Aa, for some  and ). NOTE: If S  A, then $ is Follow(A). * * *

Conceptually – What is First?
Calculated for Each Non-Terminal of the Grammar First (E), First (E’), First (T), First (T’), First(F) What is the First terminal that the non-terminal turns into? Consider Grammar Below: What is First(F)? What is First (T’)? What is First (E’)? What is First (T)? What is First (E)? { ( , id } { * } { + } {( , id } {( , id } E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id

Conceptually – What is First?
So far: What is Unique about E’ and T’ Rules? E’   T’   Add in epsilon to First calculations Now, what is First (T)? First(T) = First (FT’) = First(F) + First(T’) if  in First(F) So Finally, what is First (E)? First(E) = First (TE’) = First(T) + First(E’) if  in First(T) First(F) = { ( , id } First(E’) = { + } First(T’) = { * } First(F) = { ( , id } First(E’) = { +,  } First(T’) = { *,  } First(T) = {( , id } First(E) = {( , id }

Computing First(X) : All Grammar Symbols
1. If X is a terminal, First(X) = {X} 2. If X  is a production rule, add  to First(X) 3. If X is a non-terminal, and X Y1Y2…Yk is a production rule Place First(Y1) in First(X) if Y1 , Place First(Y2) in First(X) - if  in First(Y2) if Y2  , Place First(Y3) in First(X) - if  in First(Y2) & First(Y3) … if Yk-1  , Place First(Yk) in First(X) - if  in First(Y2) to First(Yk-1) NOTE: As soon as Yi   , Stop. May repeat 1, 2, and 3, above for each Yj * * * *

Computing First(X) : All Grammar Symbols - continued
Informally, suppose we want to compute First(X1 X2 … Xn ) = First (X1) “+” First(X2) if  is in First(X1) “+” First(X3) if  is in First(X2) “+” … First(Xn) if  is in First(Xn-1) Note 1: Only add  to First(X1 X2 … Xn) if  is in First(Xi) for all i Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !

Conceptually: What is First (E, T, …) in Derivation?
The leftmost derivation for the example is as follows: INPUT: id + id * id $ E $  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’  id + id T’E’  id + id * FT’E’  id + id * id T’E’  id + id * id E’  id + id * id $ First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  }

Example Computing First for: E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id First(TE’) First(T) “+” First(E’) First(E) * Not First(E’) since T   First(T) First(F) “+” First(T’) First(F) * Not First(T’) since F   First((E)) “+” First(id) “(“ and “id” Overall: First(E) = { ( , id } = First(F) First(E’) = { + ,  } First(T’) = { * ,  } First(T)  First(F) = { ( , id }

Conceptually – What is Follow?
Again, Calculated for Each Non-Terminal of the Grammar Follow (E), Follow (E’), Follow (T), Follow (T’), Follow (F) What is first terminal that follows Non-Terminal in a derivation? Consider Grammar Below: Look at where Non-Terminal appears on the Right Hand Side of Each Rule E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id

Conceptually – What is Follow?
Examine each Non-Terminal appears on the Right Hand Side of Each Rule What is Follow(E)? Rule: F  ( E ) So - Follow(E) = { )} What is Follow(T)? Rules: E  TE’ and E’  +TE’ So - Follow(T)={+} = First(E’) w/o  What is Follow(F)? Rules: T  FT’ and T’  *FT’ So - Follow(F) ={*}= First(T’) w/o  First(E) = { ( , id } First(T) = { ( , id } First(F) = { ( , id } First(E’) = { + ,  } First(T’) = { * ,  }

Conceptually – What is Follow? Are we Done Yet?
Some other Things to Consider Assume that the Input Stream has a final token “$” So for the Start Symbol E, $ Follows any possible input so Follow(E) = { ), $} Consider Follow(T) in E  TE’ where First(E’) contains  Since E  TE’ and  in First(E’) Place Follow(E) into Follow(T) So Follow(T)= { +, ), $ } First(E) = { ( , id } First(T) = { ( , id } First(F) = { ( , id } First(E’) = { + ,  } First(T’) = { * ,  }

Let’s Explore in Detail
Consider the Derivation: Input id$ E$  TE’ $ FT’E’$  idT’E’ $ idE’$  id$ What is in Follow(T) if T’   and E’   ? Whatever Follows E! In Rule, if E’  , E  TE’ becomes E  T and whatever follows T is what follows E! So what do we add in? E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E) = { ), $} Follow(F) = { * } Follow(T) = { +, ) }

Computing Follow(A) : All Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $ signals end of input 2. If there is a production A B, then everything in First() is in Follow(B) except for . 3. If A B is a production, or A B and    (First() contains  ), then everything in Follow(A) is in Follow(B) (Whatever followed A must follow B, since nothing follows B from the production rule) * We’ll calculate Follow for two grammars.

Conceptually: What is Follow in Derivation?
The leftmost derivation for the example is as follows: INPUT: id + id * id $ E$  TE’ $ FT’E’$  id T’E’ $ id E’$  id + TE’$  id + FT’E’$ id + id T’E’$  id + id * FT’E’$  id + id * id T’E’ $  id + id * id E’$  id + id * id $ What Follow(E)? What Follow(F)? First(T’) E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E) = { ), $} Follow(F) = { * } Follow(T) = { +, }

Let’s Explore in Detail
Consider the Derivation: Input id$ E$  TE’ $ FT’E’$  idT’E’ $ idE’$  id$ What is in Follow(T) if T’   and E’   ? Whatever Follows E! In Rule, if E’  , E  TE’ becomes E  T and whatever follows T is what follows E! So what do we add in? E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E) = { ), $} Follow(F) = { * } Follow(T) = { +, ) }

Example Compute Follow for: E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id First(E) = { ( , id } First(T) = { ( , id } First(F) = { ( , id } First(E’) = { + ,  } First(T’) = { * ,  } Follow(E) - contains $ since E is the start symbol. Also, since F  (E) then First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ } Follow(E’) : E  TE’ implies Follow(E) is in Follow(E’), and Follow(E’) = { ) , $ } E’ at end of rule, whatever follows E follows E’ Follow(T) : E  TE’ implies put in First(E’). Since E’  , put in Follow(E). Since E’  +TE’ , Put in First(E’), and since E’  , put in Follow(E’). Thus Follow(T) = { +, ), $ }. Follow(T’) Follow(F) * * You do these !

Computing Follow : 2nd Example
Recall: S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Follow(S) – Contains $, since S is start symbol Since S  i E t SS’ , put in First(S’) – not  Since S’  , Put in Follow(S) Since S’  eS, put in Follow(S’) So…. Follow(S) = { e, $ } Follow(S’) = Follow(S) HOW? Follow(E) = { t } *

First & Follow – Examine the Derivation
Consider the following derivation: E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$

Consider the following derivation: What’s First for each non-terminal ? Still needs your First(F) E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: Still needs your Follow(F) What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’  ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

Consider the following derivation: Still needs your First(F) and Follow(F) What’s First for each non-terminal ? What’s Follow for each non-terminal ? E  TE’  FT’E’  ( E ) T’E’   ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’   ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’   ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’   T’   * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ E’  

In Class Exercise One – First and Follow
A  B A’ A’  o B A’ A’   B  D B  ( C ) C  D C’ C’  a D C’ C’   D  x D  n x First(A) = { x, n, ( } First(A’) = { o,  } First(B) = { x, n, ( } First(C) = {x, n } First(C’) = { a,  } First(D) = {x, n } Follow(A) = { $ } Follow(A’) = { $ } Follow(B) = { $, o} Follow(C) = { ) } Follow(C’) = { ) } Follow(D) = {$, a, o}

In Class Exercise Two – First and Follow
S  ( L ) | E b | a L  L , S | S | c E  [ E ] | L c | E d S  ( L ) | E b | a L  c L’ | ( L ) L’ | E b L’ | a L’ L’  , S L’ |  E  [ E ] E’ | c L’ c E’ | ( L ) L’ c E’ | a L’ c E’ E’  d E’ | b L’ c E’ |  First(S) = { (, a, [, c } First(L) = {(, a, [, c } First(E) = {(, a, [, c } First(E’) = {b, d, } First(L’) = { “,”, } Follow(S) = { $ } Follow(L) = { c, ‘,’, )} Follow(E)= { b, d, ] } Follow(L’)= {c, ‘,’, )} Follow(E’)= { b, d, ]}

In Class Exercise Three– First and Follow
S  A A |  A  B A’ | z A’ A’  A B A’ |  B  x A’ y B’ | y B’ B’  A’ y B’ |  First(S) = {x, y, z , } First(A) = {x, y, z} First(A’) = {x, y, z , } First(B) = {x, y} First(B’) = {x, y, z , } Follow(S) = { $ } Follow(A) = {x, y, z, $} Follow(A’) = {x, y, z, d} Follow(B) = {x, y, z, $} Follow(B’) = {x, y, z, $ }

Constructing Top-Down Parsing Table
Algorithm: Repeat Steps 2 & 3 for each rule A Terminal a in First()? Add A  to M[A, a ] 3.1  in First()? Add A  to M[A, a ] for all terminals b in Follow(A). 3.2  in First() and $ in Follow(A)? Add A  to M[A, $ ] 4. All undefined entries are errors.

Step 2:Each rule A  -- First()
E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E,E’) = { ), $} Follow(F) = { *, +, ),  } Follow(T,T’) = { +, ),  } F  ( E ) F  id E  TE’ T  FT’ E’  + TE’ T’  * FT’ E’   T’   INPUT SYMBOL Non-terminal id + * ( ) $ E ETE’ ETE’ Table M E’ E’+TE’ E’ E’ T TFT’ TFT’ T’ T’ T’*FT’ T’ T’ F Fid F(E)

Constructing Top-Down Parsing Table Example 2
E  TE’ E’  + TE’ |  T  FT’ T’  * FT’ |  F  ( E ) | id First(E,F,T) = { (, id } First(E’) = { +,  } First(T’) = { *,  } Follow(E,E’) = { ), $} Follow(F) = { *, +, ),  } Follow(T,T’) = { +, ),  } Expression Example: E  TE’ : First(TE’) = First(T) = { (, id } M[E, ( ] : E  TE’ M[E, id ] : E  TE’ (by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’ (by rule 3) E’   :  in First( ) T’   :  in First( ) by rule 2 M[E’, )] : E’   (3.1) M[T’, +] : T’   (3.1) M[E’, $] : E’   (3.2) M[T’, )] : T’   (3.1) (Due to Follow(E’) M[T’, $] : T’   (3.2) (Due to Follow(T’)

Example Our well-worn example ! Table M E  TE’ E’  + TE’ | 
T  FT’ T’  * FT’ |  F  ( E ) | id Our well-worn example ! Table M Non-terminal INPUT SYMBOL id + * ( ) $ E E’ T T’ F ETE’ TFT’ Fid E’+TE’ T’ T’*FT’ F(E) E’

Constructing Top-Down Parsing Table Example 2
S  i E t SS’ | a S’  eS |  E  b First(S) = { i, a } First(S’) = { e,  } First(E) = { b } Follow(S) = { e, $ } Follow(S’) = { e, $ } Follow(E) = { t } S  i E t SS’ S  a E  b First(i E t SS’)={i} First(a) = {a} First(b) = {b} S’  eS S   First(eS) = {e} First() = {} Follow(S’) = { e, $ } Non-terminal INPUT SYMBOL a b e i t $ S S a S iEtSS’ S’ S’  S’ eS S  E E b

First & Follow – Derivation to Parsing Table
Consider the following derivation: What are implications ? ( id ) * id + id$ (input) M - Table E  TE’  FT’E’  ( E ) T’E’  1. E  TE’ and ( in First(E) 2. TFT’ and ( in First(T) 3. F (E) and ( in First(F) 4. E’  and ) in Follow(E’) 5. Since $ in Follow(T’), T’ 6. Since $ in Follow(E’), E’ 1. M [ E, ( ] 2. M [ T, ( ] 3. M [ F, ( ] ( TE’ ) T’E’  ( FT’E’ ) T’E’  ( id T’E’ ) T’E’  ( id E’ ) T’E’  ( id ) T’E’  ( id ) * FT’E’  4. M [ E’, ) ] ( id ) * id T’E’  ( id ) * id E’  ( id ) * id + TE’  * ( id ) * id + FT’E’  ( id ) * id + T’E’  ( id ) * id + id$ 5. M [ T’, $ ] 6. M [ E’, $ ]

Detailed Example Step 1 Compute Follow First T → F T’ T’ → * F T’ → 
F → ( E ) → Id S → E $ E → T E’ E’ → + T E’ Overall: First(S) = { } First(E) = { ( , id } = First(F) First(E’) = { + ,  } First(T’) = { * ,  } First(T)  First(F) = { ( , id } Follow(E) = Follow(E’) = { ), $ } Follow(T) = Follow(T’) = {+, ), $ } Follow(F) = {+, *, ), $ }

Detailed Example Step 2 Build the parser table Step 3
Input: Id + Id * Id $ T → F T’ T’ → * F T’ →  F → ( E ) → Id S → E $ E → T E’ E’ → + T E’ Input Symbols NT Id + * ( ) $ S S → E$ E E → TE’ E →TE’ E’ E’ →+TE’ E’ →  T T → FT’ T →FT’ T’ T’ →  T’ →*FT’ F F → Id F → (E) Parser Table

Midterm Content – Excepted from PPTs

Similar presentations

Presentation on theme: "Midterm Content – Excepted from PPTs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Midterm Content – Excepted from PPTs

Similar presentations

Presentation on theme: "Midterm Content – Excepted from PPTs"— Presentation transcript:

Similar presentations

About project

Feedback