Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Context-Free Grammars Lecture 7
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
Top-Down Parsing.
COP4020 Programming Languages
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Yu-Chen Kuo1 Chapter 4 Syntax Analysis. Yu-Chen Kuo2 4.1 The Role of The Parser A parser obtains a string of tokens from the lexical analyzer and verifies.
Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)
Grammars CPSC 5135.
PART I: overview material
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Bernd Fischer RW713: Compiler and Software Language Engineering.
1 Problems with Top Down Parsing  Left Recursion in CFG May Cause Parser to Loop Forever.  Indeed:  In the production A  A  we write the program procedure.
Introduction to Parsing
1 Compiler Construction Syntax Analysis Top-down parsing.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
TOP-DOWN PARSING Recursive-Descent, Predictive Parsing.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Chapter 3 – Describing Syntax
Problems with Top Down Parsing
lec02-parserCFG May 8, 2018 Syntax Analyzer
LESSON 16.
Compiler Construction
CS510 Compiler Lecture 4.
Compiler Construction
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
Compiler Construction
Syntax Analysis Sections :.
Lecture 7 Predictive Parsing
Top-Down Parsing The parse tree is created top to bottom.
R.Rajkumar Asst.Professor CSE
Lecture 7 Predictive Parsing
BNF 9-Apr-19.
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
lec02-parserCFG May 27, 2019 Syntax Analyzer
Chapter 4. Syntax Analysis (1)
Faculty of Computer Science and Information System
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Chapter 4. Syntax Analysis (1)

2 Application of a production  A  in a derivation step  i   i+1

3 Formal grammars (1/3)  Example : Let G 1 have N = {A, B, C}, T = {a, b, c} and the set of productions   ACB  BC A  aABCbB  bb A  abCbC  bc cC  cc The reader should convince himself that the word a k b k c k is in L(G 1 ) for all k  1 and that only these words are in L(G 1 ). That is, L(G 1 ) = { a k b k c k | k  1}.

4 Formal grammars (2/3)  Example : Grammar G 2 is a modification of G 1 : G 2 :   ACB  BC A  aABCbB  bb A  abCbC  b The reader may verify that L(G 2 ) = { a k b k | k  1}. Note that the last rule, bC  b, erases all the C's from the derivation, and that only this production removes the nonterminal C from sentential forms.

5 Formal grammars (3/3)  Example : A simpler grammar that generates { a k b k | k  1} is the grammar G 3 : G 3 :   S S  aSb S  ab A derivation of a 3 b 3 is   S  aSb  aaSbb  aaabbb The reader may verify that L(G 3 ) = { a k b k | k  1}.

6 TypeFormat of ProductionsRemarks 0φAψ→ φω ψUnrestricted Substitution Rules 1φAψ→ φω ψ, ω≠λ ∑→λ Context Sensitive Context Free Right Linear Left Linear 2A →ω, ω≠λ ∑→λ 3A→aB A→a ∑→λ A→Ba A →a ∑→λ Regular Noncon- tracting Contracting The four types of formal grammars

7 Context-Sensitive Grammars(Type1 )  Definition : A context-sensitive grammar G = (N,T,P,  ) is a formal grammar in which all productions are of the form φAψ→φωψ, ω≠ The grammar may also contain the production  →, if G is a context-sensitive (type1) grammar, then L(G) is a context-sensitive (type1) language. Unrestricted Grammars(Type0 )

8 Context-Free Grammars (Type2)  Definition : A context-free grammar G=(N,T,P,  ) is a formal grammar in which all productions are of the form A→ω The grammar may also contain the production  →λ. If G is a context-free (type2) grammar, then L(G) is a context-free (type2) language. A ∈ N ∪ {  } ω ∈ (N ∪ T)* - {λ}

9 Regular Grammars (Type3) (1/2)  Definition : A production of the form A→aB or A→a is called a right linear production. A production of the form A→Ba or A→a is a left linear production. A formal grammar is right linear if it contains only right linear productions, and is left linear if it contains only left linear production  →λ. Left and right linear grammars are also known as regular grammars. If G is a regular (type3) grammar, then L(G) is a regular (type3) language. A ∈ N ∪ {∑} B ∈ N a ∈ T A ∈ N ∪ {∑} B ∈ N a ∈ T

10 Regular Grammars (Type3) (2/2)  Example: A left linear grammar G 1 and a right linear grammar G 2 have productions as follows: G 1 : G 2 : The reader may verify that L(G 1 ) = (10)*1=1(01)*=L(G 2 ) ∑ → 1B ∑ → 1 A → 1B B → 0A A → 1 ∑ → B1 ∑ → 1 A → B1 B → A0 A → 1

11 Ambiguity (1/2)  Example : Consider the context-free grammar G:   S S  SS S  ab We see that the derivations correspond to different tree diagrams. The grammar G is ambiguous with respect to the sentence ababab: if the tree diagrams were used as the basis for assigning meaning to the derived string, mistaken interpretation could result.

12 Ambiguity (2/2)  Definition: A context-free grammar is ambiguous if and only if it generates some sentence by two or more distinct leftmost derivations.

13 Fig Position of parser in compiler model.

14 Syntax Error Handling (1/2)  Probable Errors –lexical, such as misspelling an identifier, keyword, or operator –syntactic, such as an arithmetic expression with unbalanced parentheses –semantic, such as an operator applied to an incompatible operand –logical, such as an infinitely recursive call

15 Syntax Error Handling (2/2)  The error handler in a parser has simple-to-state goals: –It should report the presence of errors clearly and accurately. –It should recover from each error quickly enough to be able to detect subsequent errors. –It should not significantly slow down the processing of correct programs.

16 Error-Recovery Strategies  panic mode  phrase level  error productions  global correction

17 Example 4.2  The grammar with the following productions defines simple arithmetic expressions. expr op  expr op expr ( expr ) - expr id + - * / 

18 Notational Conventions (1/2) 1. These symbols are terminals: i)Lower-case letters early in the alphabet such as a, b, c. ii)Operator symbols such as +, -, etc. iii)Punctuation symbols such as parentheses, comma, etc. iv)The digits 0, 1,..., 9. v)Boldface strings such as id or if. 2. These symbols are nonterminals: i)Upper-case letters early in the alphabet such as A, B, C. ii) The letter S, which, when it appears, is usually the start symbol. iii)Lower-case italic names such as expr or stmt. 3. Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either nonterminals or terminals.

19 Notational Conventions (2/2) 4. Lower-case letters late in the alphabet, chiefly u, v,..., z, represent strings of terminals. 5. Lower-case Greek letters, , , , for example, represent strings of grammar symbols. Thus, a generic production could be written as A  , indicating that there is a single nonterminal A on the left of the arrow (the left side of the production) and a string of grammar symbols  to the right of the arrow (the right side of the production). 6. If A   1, A   2,..., A   k are all productions with A on the left (we call them A-productions), we may write A   1 |  2 |... |  k. We call  1,  2,...,  k the alternatives for A. 7. Unless otherwise stated, the left side of the first production is the start symbol.

20 Derivations  We say that  A    if A   is a production and  and  are arbitrary strings of grammar symbols. If  1   2 ...   n, we say  1 derives  n. The symbol  means “derives in one step”. Often we wish to say “derives in zero or more steps”. For this purpose we can use the symbol . Thus, 1.    for any string , and 2. If    and   , then   . * * **

21 Fig Building the parse tree from derivation (4.4)     (Grammar 4.4 ) E  - E  - (E)  - (E+E)  - (id+E)  - (id+id)

22 Eliminating Ambiguity stmt |||| if expr then stmt if expr then stmt else stmt other stmt matched_stmt unmatched_stmt |||||| matched_stmt unmatched_stmt if expr then matched_stmt else matched_stmt other if expr then stmt if expr then matched_stmt else unmatched_stmt

23 Elimination of Left Recursion  No matter how many A-productions there are, we can eliminate immediate left recursion from them by the following technique. First, we group the A-productions as A  A  1 | A  2 |... | A  m |  1 |  2 |... |  n where no begins with an A. Then, we replace the A- productions by A   1 A' |  2 A' |... |  n A' A'   1 A' |  2 A' |... |  m A' | 

24 Left Factoring  In general, if A   1 |  2 are two A-productions, and the input begins with a nonempty string derived from , we do not know whether to expand A to  1 or to  2. However, we may defer the decision by expanding A to  A'. Then, after seeing the input derived from , we expand A' to  1 or to  2. That is, left-factored, original productions become A   A' A'   1 |  2  Example The language L 2 = { a n b m c n d m | n  1 and m  1 }

25 Fig Steps in top-down parse. (a)(b)(c)

26 Fig Transition diagrams for grammar (4.11).   EE'TT'FEE'TT'F  TE' +TE' |  FT' *FT' |  (E) | id (Grammar 4.11 )

27 Fig Simplified transition diagrams.      (a)(b) (c)(d)

28 Fig Simplified transition diagrams for arithmetic expressions.  

29 Fig Model of a nonrecursive predictive parser.

30 Nonrecursive Predictive Parsing 1. If X = a = $, the parser halts and announces successful completion of parsing. 2. If X = a  $, the parser pops X off the stack and advances the input pointer to the next input symbol. 3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X  UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.

31 Fig Parsing table M for grammar (4.11). NONTER- MINAL INPUT SYMBOL Id+*()$ EE'TT'FEE'TT'F E  TE' T  FT' F  id E'  +TE' T'   T'  *FT' E  TE' T  FT' F  (E) E'   T'   E'   T'  

32 Fig Moves made by predictive parser on input id + id * id. STACKINPUTOUTPUT $E $E' T $E' T' F $E' T' id $E' T' $E' $E' T + $E' T $E' T' F $E' T' id $E' T' $E' T' F * $E' T' F $E' T' id $E' T' $E' $ id + id * id$ + id * id$ id * id$ * id$ id$ $ E  T E' T  F T' F  id T'   E'  + T E' T  F T' F  id T'  * F T' F  id T'   E'  

33 Fig Parsing table M for grammar (4.13). NONTER- MINAL INPUT SYMBOL abeit$ S S  a S  iEtSS' S'S' S'   S'  eS S'  S'   E E  b SESE  iEtS | iEtSeS | a b (Grammar 4.13 )

34 Fig Synchronizing tokens added to parsing table of Fig NONTER- MINAL INPUT SYMBOL id+*()$ EE'TT'FEE'TT'F E  TE' T  FT' F  id E'  +TE' synch T'   synch T'  *FT' synch E  TE' T  FT' F  (E) synch E'   synch T'   synch E'   synch T'   synch

35 Fig Parsing and error recovery moves made by predictive parser. STACKINPUTOUTPUT $E $E' T $E' T' F $E' T' id $E' T' $E' T' F * $E' T' F $E' T' $E' $E' T + $E' T $E' T' F $E' T' id $E' T' $E' $ ) id * + id$ id * + id$ * + id$ + id$ id$ $ error, skip ) id is in FIRST(E) error, M[F, +] = synch F has been popped