1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2009.

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
Mooly Sagiv and Roman Manevich School of Computer Science
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Chapter 4 Chang Chi-Chung
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
1 October 2, October 2, 2015October 2, 2015October 2, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Top-Down Parsing - recursive descent - predictive parsing
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
1 Compiler Construction Syntax Analysis Top-down parsing.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Pembangunan Kompilator.  The parse tree is created top to bottom.  Top-down parser  Recursive-Descent Parsing ▪ Backtracking is needed (If a choice.
UNIT - 2 -Compiled by: Namratha Nayak | Website for Students | VTU - Notes - Question Papers.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Top-Down Parsing The parse tree is created top to bottom. Top-down parser –Recursive-Descent Parsing.
1 Compiler Construction Syntax Analysis Top-down parsing.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Chapter 3 Chang Chi-Chung
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Lecture #12 Parsing Types.
Syntax Analysis Part I Chapter 4
Introduction to Top Down Parser
Syntax Analysis Chapter 4.
Syntax Analysis Part II
Syntax Analysis Sections :.
Top-Down Parsing CS 671 January 29, 2008.
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Top-Down Parsing The parse tree is created top to bottom.
Chapter 4 Top Down Parser.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
Presentation transcript:

1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,

2 Position of a Parser in the Compiler Model Lexical Analyzer Parser and rest of front-end Source Program Token, tokenval Symbol Table Get next token Lexical errorSyntax error Semantic error Intermediate representation

3 The Parser A parser implements a C-F grammar The role of the parser is twofold: 1.To check syntax (= string recognizer) –And to report syntax errors accurately 2.To invoke semantic actions –For static semantics checking, e.g. type checking of expressions, functions, etc. –For syntax-directed translation of the source code to an intermediate representation

4 Syntax-Directed Translation One of the major roles of the parser is to produce an intermediate representation (IR) of the source program using syntax-directed translation methods Possible IR output: –Abstract syntax trees (ASTs) –Control-flow graphs (CFGs) with triples, three-address code, or register transfer list notation –WHIRL (SGI Pro64 compiler) has 5 IR levels!

5 Error Handling A good compiler should assist in identifying and locating errors –Lexical errors: important, compiler can easily recover and continue –Syntax errors: most important for compiler, can almost always recover –Static semantic errors: important, can sometimes recover –Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required –Logical errors: hard or impossible to detect

6 Viable-Prefix Property The viable-prefix property of parsers allows early detection of syntax errors –Goal: detection of an error as soon as possible without further consuming unnecessary input –How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language … for (;) … … DO 10 I = 1;0 … Error is detected here Prefix

7 Error Recovery Strategies Panic mode –Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery –Perform local correction on the input to repair the error Error productions –Augment grammar with productions for erroneous constructs Global correction –Choose a minimal sequence of changes to obtain a global least-cost correction

8 Grammars (Recap) Context-free grammar is a 4-tuple G = (N, T, P, S) where –T is a finite set of tokens (terminal symbols) –N is a finite set of nonterminals –P is a finite set of productions of the form    where   (N  T)* N (N  T)* and   (N  T)* –S  N is a designated start symbol

9 Notational Conventions Used Terminals a,b,c,…  T specific terminals: 0, 1, id, + Nonterminals A,B,C,…  N specific nonterminals: expr, term, stmt Grammar symbols X,Y,Z  (N  T) Strings of terminals u,v,w,x,y,z  T* Strings of grammar symbols , ,   (N  T)*

10 Derivations (Recap) The one-step derivation is defined by  A      where A   is a production in the grammar In addition, we define –  is leftmost  lm if  does not contain a nonterminal –  is rightmost  rm if  does not contain a nonterminal –Transitive closure  * (zero or more steps) –Positive closure  + (one or more steps) The language generated by G is defined by L(G) = {w  T* | S  + w}

11 Derivation (Example) Grammar G = ({E}, {+,*,(,),-,id}, P, E) with productions P =E  E + E E  E * E E  ( E ) E  - E E  id E  - E  - id E * EE * E E  + id * id + id E  rm E + E  rm E + id  rm id + id Example derivations: E  * id + id

12 Chomsky Hierarchy: Language Classification A grammar G is said to be –Regular if it is right linear where each production is of the form A  w Bor A  w or left linear where each production is of the form A  B wor A  w –Context free if each production is of the form A   where A  N and   (N  T)* –Context sensitive if each production is of the form  A      where A  N, , ,   (N  T)*, |  | > 0 –Unrestricted

13 Chomsky Hierarchy L (regular)  L (context free)  L (context sensitive)  L (unrestricted) Where L (T) = { L(G) | G is of type T } That is: the set of all languages generated by grammars G of type T L 1 = { a n b n | n  1 } is context free L 2 = { a n b n c n | n  1 } is context sensitive Every finite language is regular! (construct a FSA for strings in L(G)) Examples:

14 Parsing Universal (any C-F grammar) –Cocke-Younger-Kasimi –Earley Top-down (C-F grammar with restrictions) –Recursive descent (predictive parsing) –LL (Left-to-right, Leftmost derivation) methods Bottom-up (C-F grammar with restrictions) –Operator precedence parsing –LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR

15 Top-Down Parsing LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing Grammar: E  T + T T  ( E ) T  - E T  id Leftmost derivation: E  lm T + T  lm id + T  lm id + id EE T + T id E TT + E T + T

16 Productions of the form A  A  |  |  are left recursive When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs Left Recursion (Recap)

17 A General Systematic Left Recursion Elimination Method Input: Grammar G with no cycles or  -productions Arrange the nonterminals in some order A 1, A 2, …, A n for i = 1, …, n do for j = 1, …, i-1 do replace each A i  A j  with A i   1  |  2  | … |  k  where A j   1 |  2 | … |  k enddo eliminate the immediate left recursion in A i enddo

18 Immediate Left-Recursion Elimination Rewrite every left-recursive production A  A  |  |  | A  into a right-recursive production: A   A R |  A R A R   A R |  A R | 

19 Example Left Recursion Elim. A  B C | a B  C A | A b C  A B | C C | a Choose arrangement: A, B, C i = 1:nothing to do i = 2, j = 1:B  C A | A b  B  C A | B C b | a b  (imm) B  C A B R | a b B R B R  C b B R |  i = 3, j = 1:C  A B | C C | a  C  B C B | a B | C C | a i = 3, j = 2:C  B C B | a B | C C | a  C  C A B R C B | a b B R C B | a B | C C | a  (imm) C  a b B R C B C R | a B C R | a C R C R  A B R C B C R | C C R | 

20 Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing Replace productions A    1 |   2 | … |   n |  with A   A R |  A R   1 |  2 | … |  n

21 Predictive Parsing Eliminate left recursion from grammar Left factor the grammar Compute FIRST and FOLLOW Two variants: –Recursive (recursive-descent parsing) –Non-recursive (table-driven parsing)

22 FIRST (Revisited) FIRST(  ) = { the set of terminals that begin all strings derived from  } FIRST(a) = {a}if a  T FIRST(  ) = {  } FIRST(A) =  A  FIRST(  )for A   P FIRST(X 1 X 2 …X k ) = if for all j = 1, …, i-1 :   FIRST(X j ) then add non-  in FIRST(X i ) to FIRST(X 1 X 2 …X k ) if for all j = 1, …, k :   FIRST(X j ) then add  to FIRST(X 1 X 2 …X k )

23 FOLLOW FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = for all (B   A  )  P do add FIRST(  )\{  } to FOLLOW(A) for all (B   A  )  P and   FIRST(  ) do add FOLLOW(B) to FOLLOW(A) for all (B   A)  P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)

24 LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A   1 |  2 | … |  n for nonterminal A the following holds: 1.FIRST(  i )  FIRST(  j ) =  for all i  j 2.if  i  *  then 2.a.  j  *  for all i  j 2.b.FIRST(  j )  FOLLOW(A) =  for all i  j

25 Non-LL(1) Examples GrammarNot LL(1) because: S  S a | a Left recursive S  a S | aFIRST(a S)  FIRST(a)   S  a R |  R  S |  For R: S  *  and   *  S  a R a R  S |  For R: FIRST(S)  FOLLOW(R)  

26 Recursive-Descent Parsing (Recap) Grammar must be LL(1) Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information

27 Using FIRST and FOLLOW in a Recursive-Descent Parser expr  term rest rest  + term rest | - term rest |  term  id procedure rest(); begin if lookahead in FIRST(+ term rest) then match(‘+’); term(); rest() else if lookahead in FIRST(- term rest) then match(‘-’); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }

28 Non-Recursive Predictive Parsing: Table-Driven Parsing Given an LL(1) grammar G = (N, T, P, S) construct a table M[A,a] for A  N, a  T and use a driver program with a stack Predictive parsing program (driver) Parsing table M a+b$ X Y Z $ stack input output

29 Constructing an LL(1) Predictive Parsing Table for each production A   do for each a  FIRST(  ) do add A   to M[A,a] enddo if   FIRST(  ) then for each b  FOLLOW(A) do add A   to M[A,b] enddo endif enddo Mark each undefined entry in M error

30 Example Table E  T E R E R  + T E R |  T  F T R T R  * F T R |  F  ( E ) | id id+*()$ E E  T E R ERER ER  + T ERER  + T ER ER  ER   ER  ER   T T  F T R TRTR TR  TR   TR  * F TRTR  * F TR TR  TR   TR  TR   F F  idF  ( E ) A  A   FIRST(  ) FOLLOW(A) E  T E R ( id$ ) ER  + T ERER  + T ER + ER  ER   T  F T R ( id+ $ ) TR  * F TRTR  * F TR * TR  TR   F  ( E ) (* + $ ) F  id id* + $ )

31 LL(1) Grammars are Unambiguous Ambiguous grammar S  i E t S S R | a S R  e S |  E  b abeit$ S S  aS  aS  i E t S S R SRSR SR   SR  e SSR   SR  e S SR  SR   E E  bE  b A  A   FIRST(  ) FOLLOW(A) S  i E t S S R ie $ S  aS  a a SR  e SSR  e S e SR  SR   E  b bt Error: duplicate table entry

32 Predictive Parsing Program (Driver) push($) push(S) a := lookahead repeat X := pop() if X is a terminal or X = $ then match(X) // moves to next token and a := lookahead else if M[X,a] = X  Y 1 Y 2 …Y k then push(Y k, Y k-1, …, Y 2, Y 1 ) // such that Y 1 is on top … invoke actions and/or produce IR output … elseerror() endif until X = $

33 Example Table-Driven Parsing Stack $E $E R T $E R T R F $E R T R id $E R T R $E R $E R T+ $E R T $E R T R F $E R T R id $E R T R $E R T R F* $E R T R F $E R T R id $E R T R $E R $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E  T E R T  F T R F  id T R   E R  + T E R T  F T R F  id T R  * F T R F  id T R   E R  

34 Panic Mode Recovery id+*()$ E E  T E R synch ERER ER  + T ERER  + T ER ER  ER   ER  ER   T T  F T R synch T  F T R synch TRTR TR  TR   TR  * F TRTR  * F TR TR  TR   TR  TR   F F  id synch F  ( E ) synch FOLLOW(E) = { ) $ } FOLLOW(E R ) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(T R ) = { + ) $ } FOLLOW(F) = { + * ) $ } Add synchronizing actions to undefined entries based on FOLLOW synch:the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found Pro:Can be automated Cons:Error messages are needed

35 Phrase-Level Recovery id+*()$ E E  T E R synch ERER ER  + T ERER  + T ER ER  ER   ER  ER   T T  F T R synch T  F T R synch TRTR insert * TR  TR   TR  * F TRTR  * F TR TR  TR   TR  TR   F F  id synch F  ( E ) synch Change input stream by inserting missing tokens For example: id id is changed into id * id insert *: driver inserts missing * and retries the production Can then continue here Pro:Can be automated Cons:Recovery not always intuitive

36 Error Productions id+*()$ E E  T E R synch ERER ER  + T ERER  + T ER ER  ER   ER  ER   T T  F T R synch T  F T R synch TRTR T R  F T R TR  TR   TR  * F TRTR  * F TR TR  TR   TR  TR   F F  id synch F  ( E ) synch E  T E R E R  + T E R |  T  F T R T R  * F T R |  F  ( E ) | id Add “error production”: T R  F T R to ignore missing *, e.g.: id id Pro:Powerful recovery method Cons:Cannot be automated