Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.

Similar presentations


Presentation on theme: "Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax."— Presentation transcript:

1 compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax Error Handling and Recovery

2 compiler Constreuction2 Introduction  Why CFG  CFG gives a precise syntactic specification of a programming language.  Automatic efficient parser generator  Enabling automatic translator generator  Language extension becomes easier  The role of the parser  Taking tokens from scanner, parsing, reporting syntax errors  Not just parsing, in a syntax-directed translator, the parser also conducts type checking, semantic analysis and IR generation.

3 compiler Constreuction3 Example of CFG  A C– program is made out of functions, a function out of declarations and blocks, a block out of statements, a statement out of expressions, … etc    | e  | e    id ( ) { }  id ( ) { }  | e  | e  | | e  | | e  |  |  void | int | float  void | int | float  ….  ….  { }  { }

4 compiler Constreuction4 Notational Conventions  Following symbols are terminals  Lower case letters such as a,b,c.  Operators (+,-, etc) and punctuation symbols (parentheses, commas, etc)  Digits such as 0,1,2,etc  Boldface strings such as id or if

5 compiler Constreuction5 Notational Conventions  Nonterminals  Upper case letters such as A,B,C  The letter S – the start symbol  Lower case italic names such as expr or stmt  Grammar symbols  upper case, late in the alphabet, such as X,Y,Z,.  Strings of terminals  lower case letters late in the alphabet, such as u,v,.. z  Strings of grammar symbols  Lower-case Greek letters, such as 

6 compiler Constreuction6 Example expr  expr op expr expr  (expr) expr  - expr expr  id op  + op  - op  * op  / op   Using the notational shorthand E  E A E | (E) | -E | id A  + | - | * | / |  Non-terminals: E and A Start symbol: E

7 compiler Constreuction7 Derivation Given a string  A  If    is a production, then we can replace  A  by , written as  A     means derives in one-step  + means derive in one or more steps  * means drive in zero or more steps The language L(G) generated by G is the set of terminal strings w such that S  + w. The string w is called a sentence of G. If S  *  where  may contain nonterminals, we say  is a sentential form of G

8 compiler Constreuction8 Exercise  What is a sentence of language L defined by the C++ grammar G?  Is the following string a sentence or a sentential form? int parse( ) {} a C++ program A sentential form

9 compiler Constreuction9 Derivation (cont.) Consider the following grammar G0 E  E + E | E * E | (E) | -E | id The string -(id + id) is a sentence of G0 because there is a derivation E  - E  - (E)  - (E+E)  - (id +E)  -(id + id) Leftmost derivation: only the leftmost nonterminal is replaced Rightmost derivation: only the rightmost nonterminal is replaced Exercise: is id-id a sentence of G0? Is –id+id a sentence? No Yes

10 compiler Constreuction10 Parse Tree and Derivation A Parse tree can be viewed as a graphical representation for a derivation that ignore replacement order. E  - E  - (E)  - (E+E)  - (id +E)  -(id + id) E -E (E) E+E id Interior node: non-terminal Leaves: terminal Children: right-hand side

11 compiler Constreuction11 CFG is more powerful than RE  Every RE can be described by a CFG  Example(a|b)*abb A  aA | bA | abb  Converting a NFA into a CFG  For each state I of the NFA, create a nonterminal symbol Ai  If state i goes to stat j on input a, add production Ai  aAj  Ai  Aj if state i goes to j on e  Ai  e if state i is an accepting state

12 compiler Constreuction12 Why do we need RE?  RE is sufficiently powerful for lexical rules  RE is more concise and easier to understand  More efficient lexical analyzer can be constructed from RE than from CFG  Separating lexical from nonlexical part has a few advantages such as modularization, easier to port, etc.  Exercise: what if we don’t have token definition?

13 compiler Constreuction13 Defects in CFG Defects in CFG  Useless nonterminals  S  A | B A  a A  a B  Bb B  Bb C  c C  c  Ambiguity  Top-Down parsing issues  Left recursion  Left factoring

14 compiler Constreuction14 Ambiguity  A grammar is ambiguous if it produces more than one parse tree for some sentences  example 1: A+B+C ( is it (A+B)+C or A+(B+C) )  Improper production: expr  expr + expr | id  example 2: A+B*C ( is it (A+B)*C or A+(B*C) )  Improper production: expr  expr + expr | expr * expr  example 3: if E1 then if E2 then S1 else S2 (which then does the else match with)  Improper production:  stmt  if expr then stmt | if expr then stmt else stmt | if expr then stmt else stmt

15 compiler Constreuction15 Two parse trees of example 3 stmt ifE1thenstmt ifE2thenS1elseS2 stmt ifE1thenstmtelseS2 ifE2thenS1

16 compiler Constreuction16 Eliminating Ambiguity  Operator Associativity  expr  expr + term | term  Operator Precedence  expr  expr + term | term term  term * factor | factor term  term * factor | factor  Dangling Else  stmt  matched | unmatched matched  if expr then matched else matched matched  if expr then matched else matched unmatched  if expr then stmt unmatched  if expr then stmt | if expr then matched else unmatched | if expr then matched else unmatched

17 compiler Constreuction17 Eliminating Left Recursion  Immediate left recursion  Example: A  A  |   Transformation A  A  1 | A  | … |  |  2 | … Where no  begins with A, we replace A productions by A   1A’ |  2A’ | …. A’   1A’ |  2A’ | … | 

18 compiler Constreuction18  Indirect Left Recursion  Example: S  Aa | b A  Ac | Sd |   Transformation (assuming no cycles A  + A) 1. Arrange nonterminals in order A1, A2, … An 2. for i := 1 to n do for j := 1 to i-1 do begin Replace Ai  Aj  by  i  .. Replace Ai  Aj  by  i  .. where Aj   | … are current Aj prod where Aj   | … are current Aj prod end end Eliminate the immediate left recursion among Ai Eliminate the immediate left recursion among Aiend

19 compiler Constreuction19  In the above example, S  Aa | b A  Ac | Sd |  A  Sd will be replaced by A  Ac | Aad | bd | , then eliminates immediate recursion among A productions and yields the following S  Aa | b A  bdA’ | A’ A’  cA’ | adA’ | 

20 compiler Constreuction20 Algorithm 4.1 Eliminating Left Recursion  This algorithm will systematically eliminate left recursions from a grammar.  This is about how to remove indirect left recursions.  Precondition: the grammar has no cycles or  - productions. A cycle means: A  + A To avoid getting A  A type of productions during nonterminal replacement. For example, A  BA, B  Ab |  when A  BA is derived to A   A  a cycle shows up.  -production also makes the algorithm more complex because A  BCD may be derived to A  CD so handling the leftmost non-terminal only is not sufficient  -production also makes the algorithm more complex because A  BCD may be derived to A  CD so handling the leftmost non-terminal only is not sufficient

21 compiler Constreuction21 Indirect Left Recursion A  Bb | a B  Cc | b C  Dd | c D  Aa | d A  Bb  Ccb  Ddcb  Aadcb C  Dd  Aad  Bbad  Ccbad Need to expose immediate left recursions and then eliminate them. Some ordering is needed. Suppose we replace A  Bb by A  Ccb and then start with B  Cc  Ddc  Aadc  Ccbabc, this would never expose the immediate left recursion in this example. Need to expose immediate left recursions and then eliminate them. Some ordering is needed. Suppose we replace A  Bb by A  Ccb and then start with B  Cc  Ddc  Aadc  Ccbabc, this would never expose the immediate left recursion in this example.

22 compiler Constreuction22 Algorithm 4.1 For i:= 1 to n do begin For j:= 1 to i-1 do begin replace each production of the form Ai  Aj  by the productions  i  .. where Aj   | … are current Aj production End End eliminate the immediate left recursion among Ai- productions End Key idea: For each non-terminal Ai, all references to lower numbered non-terminal Aj, (where j < i) will be replaced by higher numbered non-terminals.

23 compiler Constreuction23. A1  … A2  Ai-1  Ai+k  … … Ai  Ai-1  | A2  … …An After replacement, there will be no backward references

24 compiler Constreuction24 Left Factoring Consider the following grammar A   1 |  It is not easy to determine whether to expand A to  or  A transformation called left factoring can be applied. It becomes: A   A’ A’  

25 compiler Constreuction25 Exercise stmt  if expr then stmt | if expr then stmt else stmt | if expr then stmt else stmt For the following grammar form: A   1 |  2 What is  ?  1?  2?  : if expr then stmt   else stmt

26 compiler Constreuction26 Syntax Error Handling  Different type of errors  Lexical  Syntactic  Semantic  Logical  Error handling goals  Report errors clearly and accurately  Recover quickly  Fast

27 compiler Constreuction27 Error Handling Strategies  Don’t quit after detecting the 1 st error.  Avoid introducing “spurious” errors  Inhibit error messages that stem from errors uncovered too close together  Simple error repair will be sufficient due to the increasing emphasis on interactive computing and good programming environment.

28 compiler Constreuction28 Error Recovery Strategies  Panic mode  Deleting input tokens until one of a designated set of synchronizing tokens is found.  Phrase level  Local correction to repair punctuation errors  Error productions  Augment the grammar with error productions  Global correction  Globally least-cost correction to a string, costly to implement.


Download ppt "Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax."

Similar presentations


Ads by Google