Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
LESSON 18.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Context-Free Grammars Lecture 7
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
COP4020 Programming Languages
Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Context-Free Grammars
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
LESSON 16.
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Syntax Specification and Analysis
Syntax Analysis Chapter 4.
Context-Free Grammars
Compiler Construction
Syntax Analysis Sections :.
CSE 3302 Programming Languages
Syntax Analysis Sections :.
Context-Free Grammars
Context-Free Grammars
Lecture 7: Introduction to Parsing (Syntax Analysis)
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CSC 4181 Compiler Construction Context-Free Grammars
Context-Free Grammars
lec02-parserCFG May 27, 2019 Syntax Analyzer
Context-Free Grammars
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax Error Handling and Recovery

compiler Constreuction2 Introduction  Why CFG  CFG gives a precise syntactic specification of a programming language.  Automatic efficient parser generator  Enabling automatic translator generator  Language extension becomes easier  The role of the parser  Taking tokens from scanner, parsing, reporting syntax errors  Not just parsing, in a syntax-directed translator, the parser also conducts type checking, semantic analysis and IR generation.

compiler Constreuction3 Example of CFG  A C– program is made out of functions, a function out of declarations and blocks, a block out of statements, a statement out of expressions, … etc    | e  | e    id ( ) { }  id ( ) { }  | e  | e  | | e  | | e  |  |  void | int | float  void | int | float  ….  ….  { }  { }

compiler Constreuction4 Notational Conventions  Following symbols are terminals  Lower case letters such as a,b,c.  Operators (+,-, etc) and punctuation symbols (parentheses, commas, etc)  Digits such as 0,1,2,etc  Boldface strings such as id or if

compiler Constreuction5 Notational Conventions  Nonterminals  Upper case letters such as A,B,C  The letter S – the start symbol  Lower case italic names such as expr or stmt  Grammar symbols  upper case, late in the alphabet, such as X,Y,Z,.  Strings of terminals  lower case letters late in the alphabet, such as u,v,.. z  Strings of grammar symbols  Lower-case Greek letters, such as 

compiler Constreuction6 Example expr  expr op expr expr  (expr) expr  - expr expr  id op  + op  - op  * op  / op   Using the notational shorthand E  E A E | (E) | -E | id A  + | - | * | / |  Non-terminals: E and A Start symbol: E

compiler Constreuction7 Derivation Given a string  A  If    is a production, then we can replace  A  by , written as  A     means derives in one-step  + means derive in one or more steps  * means drive in zero or more steps The language L(G) generated by G is the set of terminal strings w such that S  + w. The string w is called a sentence of G. If S  *  where  may contain nonterminals, we say  is a sentential form of G

compiler Constreuction8 Exercise  What is a sentence of language L defined by the C++ grammar G?  Is the following string a sentence or a sentential form? int parse( ) {} a C++ program A sentential form

compiler Constreuction9 Derivation (cont.) Consider the following grammar G0 E  E + E | E * E | (E) | -E | id The string -(id + id) is a sentence of G0 because there is a derivation E  - E  - (E)  - (E+E)  - (id +E)  -(id + id) Leftmost derivation: only the leftmost nonterminal is replaced Rightmost derivation: only the rightmost nonterminal is replaced Exercise: is id-id a sentence of G0? Is –id+id a sentence? No Yes

compiler Constreuction10 Parse Tree and Derivation A Parse tree can be viewed as a graphical representation for a derivation that ignore replacement order. E  - E  - (E)  - (E+E)  - (id +E)  -(id + id) E -E (E) E+E id Interior node: non-terminal Leaves: terminal Children: right-hand side

compiler Constreuction11 CFG is more powerful than RE  Every RE can be described by a CFG  Example(a|b)*abb A  aA | bA | abb  Converting a NFA into a CFG  For each state I of the NFA, create a nonterminal symbol Ai  If state i goes to stat j on input a, add production Ai  aAj  Ai  Aj if state i goes to j on e  Ai  e if state i is an accepting state

compiler Constreuction12 Why do we need RE?  RE is sufficiently powerful for lexical rules  RE is more concise and easier to understand  More efficient lexical analyzer can be constructed from RE than from CFG  Separating lexical from nonlexical part has a few advantages such as modularization, easier to port, etc.  Exercise: what if we don’t have token definition?

compiler Constreuction13 Defects in CFG Defects in CFG  Useless nonterminals  S  A | B A  a A  a B  Bb B  Bb C  c C  c  Ambiguity  Top-Down parsing issues  Left recursion  Left factoring

compiler Constreuction14 Ambiguity  A grammar is ambiguous if it produces more than one parse tree for some sentences  example 1: A+B+C ( is it (A+B)+C or A+(B+C) )  Improper production: expr  expr + expr | id  example 2: A+B*C ( is it (A+B)*C or A+(B*C) )  Improper production: expr  expr + expr | expr * expr  example 3: if E1 then if E2 then S1 else S2 (which then does the else match with)  Improper production:  stmt  if expr then stmt | if expr then stmt else stmt | if expr then stmt else stmt

compiler Constreuction15 Two parse trees of example 3 stmt ifE1thenstmt ifE2thenS1elseS2 stmt ifE1thenstmtelseS2 ifE2thenS1

compiler Constreuction16 Eliminating Ambiguity  Operator Associativity  expr  expr + term | term  Operator Precedence  expr  expr + term | term term  term * factor | factor term  term * factor | factor  Dangling Else  stmt  matched | unmatched matched  if expr then matched else matched matched  if expr then matched else matched unmatched  if expr then stmt unmatched  if expr then stmt | if expr then matched else unmatched | if expr then matched else unmatched

compiler Constreuction17 Eliminating Left Recursion  Immediate left recursion  Example: A  A  |   Transformation A  A  1 | A  | … |  |  2 | … Where no  begins with A, we replace A productions by A   1A’ |  2A’ | …. A’   1A’ |  2A’ | … | 

compiler Constreuction18  Indirect Left Recursion  Example: S  Aa | b A  Ac | Sd |   Transformation (assuming no cycles A  + A) 1. Arrange nonterminals in order A1, A2, … An 2. for i := 1 to n do for j := 1 to i-1 do begin Replace Ai  Aj  by  i  .. Replace Ai  Aj  by  i  .. where Aj   | … are current Aj prod where Aj   | … are current Aj prod end end Eliminate the immediate left recursion among Ai Eliminate the immediate left recursion among Aiend

compiler Constreuction19  In the above example, S  Aa | b A  Ac | Sd |  A  Sd will be replaced by A  Ac | Aad | bd | , then eliminates immediate recursion among A productions and yields the following S  Aa | b A  bdA’ | A’ A’  cA’ | adA’ | 

compiler Constreuction20 Algorithm 4.1 Eliminating Left Recursion  This algorithm will systematically eliminate left recursions from a grammar.  This is about how to remove indirect left recursions.  Precondition: the grammar has no cycles or  - productions. A cycle means: A  + A To avoid getting A  A type of productions during nonterminal replacement. For example, A  BA, B  Ab |  when A  BA is derived to A   A  a cycle shows up.  -production also makes the algorithm more complex because A  BCD may be derived to A  CD so handling the leftmost non-terminal only is not sufficient  -production also makes the algorithm more complex because A  BCD may be derived to A  CD so handling the leftmost non-terminal only is not sufficient

compiler Constreuction21 Indirect Left Recursion A  Bb | a B  Cc | b C  Dd | c D  Aa | d A  Bb  Ccb  Ddcb  Aadcb C  Dd  Aad  Bbad  Ccbad Need to expose immediate left recursions and then eliminate them. Some ordering is needed. Suppose we replace A  Bb by A  Ccb and then start with B  Cc  Ddc  Aadc  Ccbabc, this would never expose the immediate left recursion in this example. Need to expose immediate left recursions and then eliminate them. Some ordering is needed. Suppose we replace A  Bb by A  Ccb and then start with B  Cc  Ddc  Aadc  Ccbabc, this would never expose the immediate left recursion in this example.

compiler Constreuction22 Algorithm 4.1 For i:= 1 to n do begin For j:= 1 to i-1 do begin replace each production of the form Ai  Aj  by the productions  i  .. where Aj   | … are current Aj production End End eliminate the immediate left recursion among Ai- productions End Key idea: For each non-terminal Ai, all references to lower numbered non-terminal Aj, (where j < i) will be replaced by higher numbered non-terminals.

compiler Constreuction23. A1  … A2  Ai-1  Ai+k  … … Ai  Ai-1  | A2  … …An After replacement, there will be no backward references

compiler Constreuction24 Left Factoring Consider the following grammar A   1 |  It is not easy to determine whether to expand A to  or  A transformation called left factoring can be applied. It becomes: A   A’ A’  

compiler Constreuction25 Exercise stmt  if expr then stmt | if expr then stmt else stmt | if expr then stmt else stmt For the following grammar form: A   1 |  2 What is  ?  1?  2?  : if expr then stmt   else stmt

compiler Constreuction26 Syntax Error Handling  Different type of errors  Lexical  Syntactic  Semantic  Logical  Error handling goals  Report errors clearly and accurately  Recover quickly  Fast

compiler Constreuction27 Error Handling Strategies  Don’t quit after detecting the 1 st error.  Avoid introducing “spurious” errors  Inhibit error messages that stem from errors uncovered too close together  Simple error repair will be sufficient due to the increasing emphasis on interactive computing and good programming environment.

compiler Constreuction28 Error Recovery Strategies  Panic mode  Deleting input tokens until one of a designated set of synchronizing tokens is found.  Phrase level  Local correction to repair punctuation errors  Error productions  Augment the grammar with error productions  Global correction  Globally least-cost correction to a string, costly to implement.