Lecture 6 Grammar Modifications

Lecture 6 Grammar Modifications
CSCE 531 Compiler Construction Lecture 6 Grammar Modifications Topics Grammars for expressions and if-then-else Formal proofs of L(G) Top-down parsing Left factoring Removing left recursion Readings: Homework: 4.1, 4.2a, 4.6a, 4.11a January 30, 2006

Overview Last Time Today’s Lecture References
Should have mentioned DFA minimization Grammars, Derivations, Ambiguity Lec05-Grammars: Slides 1-27 Today’s Lecture Ambiguity in classic programming language grammars Expressions If-Then-Else Top-Down parsing References Sections Parse demos Chomsky Hierarchy – types of grammars and recognizers Homework: 4.1, 4.2a, 4.6a, 4.11a

DFA Minimization Si Sa Sk Sna Algorithm 3.6 in text
We will not cover this algorithm other than this slide. Partition states into F and Q-F (final and non final states) Refine the partitioning as much as possible. Refinement – a string x=x1x2…xt distinguishes between two states Si and Sk if starting in each and following the path determined by x one ends in an accepting state and the other ends in a non-accepting state x Si Sa Accepting x Sk Sna Non-accepting

LM Derivation of 5 * X + 3 * Y +17
E  E + T | E – T | T T  T * F | T / F | F F  id | num | ( E ) E  E+T  E+E+T  T+E+T T*F+E+T T*F+E+T F*F+E+T num*F+E+T  num*id+E+T  num*id+T+T  num*id+T*F+T  num*id+F*F+T  num*id+num*F+T  … Parse tree E

Notes on rewritten grammar
It is more complex; more nonterminals, more productions. It requires more steps in the derivation But it does eliminate the ambiguity, so we make the right choices in derivations.

Ambiguous Grammar 2 If-else
Another classic ambiguity problem in programming languages is the IF-ELSE Stmt  if Expr then Stmt | if Expr then Stmt else Stmt | other stmts S  if E then S | if E then S else S | OS

Ambiguity This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2

Removing the ambiguity
To eliminate the ambiguity We must rewrite the grammar to avoid generating the problem We must associate each else with the innermost unmatched if S  withElse

Removing the IF-ELSE Ambiguity
Stmt  if Expr then Stmt | if Expr then Stmt else Stmt | other stmts Stmt  MatchedStmt | UnmatchedStmt MatchedStmt  if Expr then MatchedStmt else MatchedStmt | OthersStatements UnmatchedStmt  if Expr then MatchedStmt else | if Expr then MatchedStmt else UmatchedStmt

Ambiguity if Expr1 then if Expr2 then Stmt1 else Stmt2

Ambiguity that is more than Grammar
The examples of Ambiguity that we have looked at are solved by tweaking the CFG Overloading can create deeper ambiguity, a = f(17) In some languages, f could be either a function or a subscripted variable Disambiguating this requires semantics not just syntax Declarations, type information to say what “f” is. Requires an extra-grammatical solution Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar

Regular versus Context free Languages
A regular language is a set of strings that can be: Recoginzed by a DFA, Recognized by an NFA, or (/and) Denoted by regular expressions. Example of non-regular languages? A context free language is one that is generated by a context free grammar. S 0S1 | ε

Formal verification of L(G)
Example 4.7: Induction on length of derivation of a sentential forms Formulate inductive hypothesis in terms of sentential forms Basis step n=1 Assume derivations of length n satisfy the Inductive Hypothesis. Show that derivations of length n+1 also satisfy

Regular Grammars (Linear Grammars)
A right-linear grammar is a restricted form of context free grammar in which the productions have a special form: N  T* N2 N  T* Where N and N2 (possibly the same) are non-terminals and T* is a string of tokens In these productions if there is a non-terminal on the right hand side then it is the last symbol Linear grammars (right and left linear) are also called regular grammars. Why?

DFA  Right-linear Grammar
Consider DFA M = (Q, Σ, δ, q0, F) (notice re-ordering! and Q!) Construct a grammar G = (N,T,P,S) where N = Q i.e. each state corresponds to a non-terminal T = Σ For each transition δ(si, a) = sj, we have a production Si  a Sj And for each state S in F we add a production S  ε Then L(M) = L(G) How would we formally prove this? Thus regular languages are a subset of the Context free languages

Example DFA  Regular Grammar
Fig 3.23 p 117 N0  a N1 | b N0 N1  a N1 | b N2 N2  … N3  …

Chomsky Hierarchy Noam Chomsky linguist: Formal levels of grammars
Regular grammars, N  T* N Context-free grammars, N  (N U T)* Context sensitive grammars, αNω  αβω We can rewrite αNω  β, but only in the “context” αNω Unrestricted grammars, α  β with α and β in (N U T)* Recognizers: DFA (regular) Pushdown automata, DFA augmented with stack Linear bounded Turing machine Turing machine

Non-Context Free Languages
Certain languages cannot have a context free grammar that generates them, they are not context free languages Examples Σ = { a, b, c}, L = {wcw | w is in Σ*} {anbncn | n > 0} However they are context sensitive, or are they? Well, not relevant for this course. We would eliminate any non-context-free construct from a programming language! (at least for parsing) S  abc | aSBc cB  Bc bB  bb Alternative form of Cont. Sensitive productions α β satisfy |α| <= |β|

Parsing Techniques Top-down parsers Bottom-up parsers
Start at the root and try to generate the parse tree Pick a production and try to match the input If we make a bad choice then backtrack and try another choice Grammars that allow backtrack-free parsing sometimes will exist and are Bottom-up parsers Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars

Top-down Parsing Algorithm
Add the start symbol as the root of the parse tree While the frontier of the parse tree != input { Pick the “leftmost” non-terminal in the frontier, A Choose an A-production, A  β1, β2, … βk, and expand the tree (other choices saved on stack) If a token is added to the frontier that does not match the input backtrack and choose another production (if we run out of choices the parse fails.) } We now will look at modifications to grammars to facilitate top-down parsing.

Reconsider Our Expression Grammar
Prod num Sentential Form 1 E+T How did we choose this one? E+E +T How/why? 3 T +E+T ? 4 T*F +E+T ? 6 F*F +E+T 8 num *F +E+T 7 num* id +E+T First we number the productions for documentation E  E + T E  E – T E  T T  T * F T  T / F T  F F  id F  num F  ( E ) Example: 5 * X + 3 * Y +17 Token seq.: num * id + num * id + num

How do we choose which production?
It should be guided by trying to match the input E.g., if the next input symbol is the token “if” and we are choosing between between S  if Expr then S else S S  while Expr do S What choice is best? Well the choice is obvious! But if the next input symbol is the token “if” and we are choosing between between

How do we choose which production? (continued)
But if the next input symbol is the token “if” and we are choosing between S  if Expr then S else S S  if Expr then S What choice is best? Well now the choice is not obvious!

Other Grammar Modifications to Guide Parser
Left Factoring Stmt  if Expr then Stmt else Stmt | if Expr then Stmt If the next tokens are “if” and “id” then we have no basis to choose, in fact we have to look ahead to see the “else” Stmt  if Expr then Stmt Rest Rest  else Stmt | ε Left Recursion A  Aα | β Why recursive? AAαAαα Aααα … Aαn  βαn What do we do? A  βA’ and A’  αA’ | ε A βA’  βαA’  βααA’ … βαnA’ βαn

General Left Factoring Algorithm
Input: a grammar G Output: an equivalent left-factored grammar. Method: For each nonterminal A find the longest prefix α common to two or more A-productions A  αβ1 | αβ2 | … | αβm | ξ , where ξ represents the A-productions that don’t start with the prefix α Replace with A  αA’ | ξ A’  β1 | β2 | … | βm

From Engineering a Compiler by Keith D. Cooper and Linda Torczon
Left Factoring A graphical explanation for the same idea becomes … A 1 3 2 A  1 | 2 | 3 Z 1 3 2 A A   Z Z  1 | 2 | n From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Left Factoring Graphically becomes … Identifier No basis for choice Word determines correct choice Factor Identifier [ ExprList ] Identifier ( ExprList )  Factor Identifier [ ExprList ] ( ExprList ) From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Eliminating Left Recursion: Expr Grammar
Replace T  T * F | F with T  F T’ T’  * F T’ | ε No replacing needed for the F productions, so the grammar becomes: E  T E’ E’  + T E’ | - T E’ | ε T’  * F T’ | / F T’ | ε F  id | num | ( E )

Eliminating Immediate Left Recursion
In general consider all the A productions A  Aα1 | Aα2 | … | Aαn | β1 | β2 | … | βm Replace them with A  β1A’ | β2A’ | … | βmA’ A’  α1A’ | α2A’ | … | αnA’ | ε But not all left recursion is immediate. Consider S  Aa | Bb |c Then SAaCaaScaa A  Ca | aA | a A * Aβ C  Sc B  b B | b

Eliminating Left Recursion Algorithm
Algorithm 4.1 Eliminating Left Recursion Input: Grammar with no cycles or ε-productions Output: Equivalent Grammar with no left recursion Arrange the nonterminals in order A1, A2, … Ann for i = 1 to n do for J = 1 to i-1 do replace each production of the form Ai  AJξ δ by the productions Ai  δ1ξ | δ2ξ | … | δkξ where AJ  δ1 | δ2 | … | δk the current Ai-productions end Eliminate immediate left recursion in the Ai-productions

Eliminating Left Recursion
How does this algorithm work? 1. Impose arbitrary order on the non-terminals 2. Outer loop cycles through Nonterminals in some order 3. Inner loop ensures that a production expanding Ai has no non-terminal AJ in its rhs, for J < i 4. Last step in outer loop converts any direct recursion on Ai to right recursion using the transformation showed earlier 5. New non-terminals are added at the end of the order & have no left recursion At the start of the ith outer loop iteration For all k < i, no production that expands Ak contains a non-terminal As in its rhs, for s < k

Example Order of symbols: G, E, T G  E E  E + T E  T T  E ~ T T  id From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Example Order of symbols: G, E, T 1. Ai = G G  E E  E + T E  T T  E ~ T T  id From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Example Order of symbols: G, E, T 1. Ai = G G  E E  E + T E  T T  E ~ T T  id 2. Ai = E G  E E  T E' E'  + T E' E'  e T  E ~ T T  id From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Example Order of symbols: G, E, T 1. Ai = G G  E E  E + T E  T T  E ~ T T  id 2. Ai = E G  E E  T E' E'  + T E' E'  e T  E ~ T T  id 3. Ai = T, As = E G  E E  T E' E'  + T E' E'  e T  T E' ~ T T  id Go to Algorithm From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Example Order of symbols: G, E, T 1. Ai = G G  E E  E + T E  T T  E ~ T T  id 2. Ai = E G  E E  T E' E'  + T E' E'  e T  E ~ T T  id 3. Ai = T, As = E G  E E  T E' E'  + T E' E'  e T  T E' ~ T T  id 4. Ai = T G  E E  T E' E'  + T E' E'  e T  id T' T'  E' ~ T T' T'  e From Engineering a Compiler by Keith D. Cooper and Linda Torczon

Predictive Parsing Basic idea FIRST sets
Given A    , the parser should be able to choose between  &  FIRST sets For some rhs G, define FIRST() as the set of tokens that appear as the first symbol in some string that derives from  That is, x  FIRST() iff  * x , for some  If A   and A   both appear in the grammar, and FIRST()  FIRST() =  This would appear to allow the parser to make a correct choice with a lookahead of exactly one symbol ! (if there are no e-productions then it does.)

Lecture 6 Grammar Modifications

Similar presentations

Presentation on theme: "Lecture 6 Grammar Modifications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 6 Grammar Modifications

Similar presentations

Presentation on theme: "Lecture 6 Grammar Modifications"— Presentation transcript:

Similar presentations

About project

Feedback