Joey Paquet, 2000, 2002, 20081 Lecture 3 Syntactic Analysis Part I.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Lecture # 11 Grammar Problems.
Mooly Sagiv and Roman Manevich School of Computer Science
Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Top-Down Parsing.
COP4020 Programming Languages
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Review Joey Paquet,
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CSE 425: Syntax II Context Free Grammars and BNF In context free grammars (CFGs), structures are independent of the other structures surrounding them Backus-Naur.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style COMPILER DESIGN Syntactic analysis: Part.
Compilers: syntax/4 1 Compiler Structures Objective – –describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets ,
Chapter 3 Describing Syntax and Semantics
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Parsing Top-Down.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
Joey Paquet, 2000, 2002, 2008, 2012, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Introduction to Parsing
Chapter 3 – Describing Syntax
Programming Languages Translator
CS510 Compiler Lecture 4.
Compiler Construction
4 (c) parsing.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Compiler design Syntactic analysis: Part I
Compiler design Syntactic analysis – part II First and follow sets
Chapter 3 Syntactic Analysis I.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
COMPILER CONSTRUCTION
Presentation transcript:

Joey Paquet, 2000, 2002, Lecture 3 Syntactic Analysis Part I

Joey Paquet, 2000, 2002, Syntactic Analyzer Roles –Analyze the structure of the program and its component statements and expressions –Check for (and recover from) syntax errors –Control the front-end’s execution

Joey Paquet, 2000, 2002, Historic Historically based on formal natural language grammatical analysis (Chomsky [1950’s]) A generative grammar is used –builds sentences in a series of steps –starts from abstract concepts defined by a set of grammatical rules (often called productions) –refines the analysis down to actual words Analyzing (parsing) consists in reconstructing the way in which the sentences were constructed Valid sentences can be represented as a parse tree

Joey Paquet, 2000, 2002, Example sentence noun phrase articlenoun thedog verb phrase verbnoun phrase articlenoun gnawedthebone ::= ::= article noun ::= verb

Joey Paquet, 2000, 2002, Syntax and Semantics Syntax: defines how valid sentences are formed Semantics: defines the meaning of valid sentences Some grammatically correct sentences can have no meaning –The bone walked the dog It is impossible to automatically validate the full meaning of all English sentences In programming languages, semantics is about giving a meaning by translating programs into executables

Joey Paquet, 2000, 2002, Grammars A grammar is a quadruple (T,N,S,R) –T: a finite set of terminal symbols –N: a finite set of non-terminal symbols –S: a unique starting symbol (SN) –R: a finite set of productions  | (,(TN)) Context free grammars have productions of the form: –A | (AN)((TN))

Joey Paquet, 2000, 2002, Backus-Naur Form J.W. Backus: main designer of the first FORTRAN compiler P. Naur: main designer of the Algol-60 programming language –non-terminals are placed in angle brackets –the symbol ::= is used instead of an arrow –a vertical bar can be used to signify alternatives –curly braces are used to signify an indefinite number of repetitions –square brackets are used to signify optionality Widely used to define programming languages’ syntax

Joey Paquet, 2000, 2002, Example Grammar for expressions: G = (T,N,S,R), T = {id,+,,,/,(,)}, N = {E}, S = E, R = { E  E + E, E  E  E, E  E  E, E  E / E, E  ( E ), E  id}

Joey Paquet, 2000, 2002, Example Parse the sequence: (a+b)/(ab) The lexical analyzer tokenizes the sequence as: (id+id)/(idid) Construct a parse tree for the expression: –start symbol=root –non-terminal=internal node –terminal=leaf –production=subtree

Joey Paquet, 2000, 2002, Top-Down Parsing Starts at the root (starting symbol) Builds the tree downwards from: –the sequence of tokens in input (from left to right) –the rules in the grammar

Joey Paquet, 2000, 2002, Example E /EE 1- Using: E  E / E E /EE ()E()E 2- Using: E  ( E )  EE+EE E /EE ()E()E id 3- Using: E  E + E E  E  E E  id

Joey Paquet, 2000, 2002, Derivations The application of grammar rules towards the recognition of a grammatically valid sequence of terminals can be represented with a derivation Noted as a series of transformations: –{   [] | (,(TN))  ( R)} –where production  is used to transform  into .

Joey Paquet, 2000, 2002, Example  EE+EE E /EE ()E()E id E E / E [E  E / E]  E / ( E )[E  ( E )]  E / ( E  E ) [E  E  E]  E / (E  id ) [E  id]  E / ( id  id )[E  id]  ( E ) / ( id  id )[E  ( E )]  ( E + E ) / ( id  id ) [E  E + E]  ( E + id ) / ( id  id ) [E  id]  ( id + id ) / ( id  id ) [E  id] In this case, we say that E  (id+id)/(idid) The language generated by the grammar can be defined as: L(G) = { | S  }   G G

Joey Paquet, 2000, 2002, Left and Rightmost Derivations E E / E [E  E / E]  E / ( E )[E  ( E )]  E / ( E  E ) [E  E  E]  E / (E  id ) [E  id]  E / ( id  id )[E  id]  ( E ) / ( id  id )[E  ( E )]  ( E + E ) / ( id  id ) [E  E + E]  ( E + id ) / ( id  id ) [E  id]  ( id + id ) / ( id  id ) [E  id] E E / E [E  E / E]  ( E ) / E [E  ( E )]  ( E + E ) / E [E  E + E]  ( id + E ) / E [E  id]  ( id + id ) / E [E  id]  ( id + id ) / ( E )[E  ( E )]  ( id + id ) / ( E  E )[E  E  E]  ( id + id ) / ( id  E ) [E  id]  ( id + id ) / ( id  id ) [E  id] Rightmost Derivation Leftmost Derivation

Joey Paquet, 2000, 2002, Top-Down & Bottom-up Parsing A top-down parser builds a parse tree starting at the root down to the leafs –It builds leftmost derivations A bottom-up parser builds a parse tree starting from the leafs up to the root –They build rightmost derivations

Joey Paquet, 2000, 2002, Ambiguous Grammars Which of these trees is the right one for the expression “id + id * id” ? According to the grammar, both are right The language defined by this grammar is ambiguous. That is not acceptable in a compiler. id +EE E * E E *EE E+E E

Joey Paquet, 2000, 2002, Ambiguous Grammars Solutions: –incorporate operation precedence in the parser (complicates the compiler, rarely done) –rewrite the grammar to remove ambiguities E  E + E E  E  E E  E  E E  E / E E  ( E ) E  id id *FT E+T E F F E  E + T | E  T | T T  T  F | T / F | F F  ( E ) | id

Joey Paquet, 2000, 2002, Left Recursion The aim is to design a parser that has no arbitrary choices to make between rules (predictive parsing) The first rule that can apply is applied In this case, productions of the form AA will be applied forever Example: id + id + id E E+T E+T E+T E+T...

Joey Paquet, 2000, 2002, Non-immediate Left Recursions Non-immediate left recursions are sets of productions of the form: A  B | … B  A | … A B  A  B  A ...

Joey Paquet, 2000, 2002, Left Recursion This problem afflicts all top-down parsers Solution: apply a transformation to the grammar to remove the left recursions 1- Isolate each set of productions of the form: A  A 1 | A 2 | A 3 | …(left recursive) A   1 |  2 |  3 | …(non-left-recursive) 2- Introduce a new non-terminal A 3- Change all the non-recursive productions on A to: A   1 A |  2 A |  3 A | … 4- Remove the left-recursive production on A and substitute: A   |  1 A |  2 A |  3 A |...

Joey Paquet, 2000, 2002, Example E  E + T | E  T | T T  T  F | T / F | F F  ( E ) | id (i) E  E + T | E  T | T 1- E  E + T | E  T(A  A 1 | A 2 ) E  T (A   1 ) 2- E(A) 3- E  TE(A   1 A) 4- E   | +TE | TE (A   |  1 A |  2 A) E  TE E   | +TE | TE T  T  F | T / F | F F  ( E ) | id

Joey Paquet, 2000, 2002, Example E  TE E   | +TE | TE T  T  F | T / F | F F  ( E ) | id (ii) T  T  F | T / F | F 1- T  T  F | T / F(A  A 1 | A 2 ) T  F (A   1 ) 2- T(A) 3- T  FT(A   1 A) 4- T   | FT | /FT (A   |  1 A |  2 A) E  TE E   | +TE | TE T  FT T   | FT | /FT F  ( E ) | id

Joey Paquet, 2000, 2002, Ambiguity Sets of rules of the form: A    1 |   2 |   3 | … can be eliminated using a factorization technique 1- Isolate a set of productions of the form: A    1 |   2 |   3 | … 2- Introduce a new non-terminal A 3- Replace all the ambiguous set of productions on A by: A   A 4- Add a set of factorized productions on A : A   1 |  2 |  3 | …

Joey Paquet, 2000, 2002, Backtracking It is possible to write a parser that implements an ambiguous grammar In this case, when there is an arbitrary alternative, the parser explores them one after the other If an alternative does not result in a valid parse tree, the parser backtracks to the last arbitrary alternative and selects another right- hand-side The parse fails only when there are no more alternatives left This is often called a “brute-force” method

Joey Paquet, 2000, 2002, Example S  ee | bAc | bAe A  d | cA Seeking for : bcde S bcA d cA S beA d cA SbAe [S  bAe] bcAe[A  cA] bcde[A  d] OK SbAc [S  bAc] bcAc[A  cA] bcdc[A  d] ???

Joey Paquet, 2000, 2002, Backtracking Backtracking is tricky and inefficient to implement Normally, code is generated after rules are applied; backtracking involves retraction of the generated code! Parsing with backtracking is almost never used The solution is to eliminate the ambiguities

Joey Paquet, 2000, 2002, Predictive Parsing Restriction: the parser must always be able to determine which of the right-hand sides to follow, only with its knowledge of the next token in input. Top-down parsing without backtracking Deterministic parsing No backtracking is possible

Joey Paquet, 2000, 2002, Predictive Parsing Recursive descent predictive parser –a function is defined for each non-terminal symbol –its predictive nature allows it to choose the right right- hand-side –it recognizes terminal symbols and calls other functions to recognize non-terminal symbols in the chosen right hand side –the parse tree is actually constructed by the nest of function calls –very easy to implement –hard to maintain

Joey Paquet, 2000, 2002, Predictive Parsing Table-driven predictive parser –table tells the parser which right-hand-side to choose –the driver algorithm is standard to all parsers –only the table changes –easy to maintain –table is hard to build for most languages –will be covered in next lecture

Joey Paquet, 2000, 2002, FIRST and FOLLOW sets Predictive parsers need to know what right hand side to choose The only information we have is the next token in input If all the right hand sides begin with terminal symbols, the choice is straightforward If some right hand sides begin with non-terminals, the parser must know what token can begin the sequence generated by this non-terminal (i.e. the FIRST set) If a FIRST set contains , it must know what follows this non- terminal (i.e. the FOLLOW set) in order to chose the  production.

Joey Paquet, 2000, 2002, Example FIRST(E) = {0,1,(} FIRST(E’) = {+, } FIRST(T) = {0,1,(} FIRST(T’) = {*, } FIRST(F) = {0,1,(} FOLLOW(E)= {$,)} FOLLOW(E’)= {$,)} FOLLOW(T)= {+,$,)} FOLLOW(T’)= {+,$,)} FOLLOW(F)= {*,+,$,)} E  TE’ E’  +TE’ |  T  FT’ T’  *FT’ |  F  0 | 1 | (E)

Joey Paquet, 2000, 2002, E --> TE' E' --> +TE' | epsilon T --> FT' T' --> *FT' | epsilon F --> 0 | 1 | (E) FIRSTFOLLOW E0,1,($,) E’epsilon,+$,) T0,1,(+,$,) T’epsilon, *+,$,) F0,1,(*,+,$,) Example

Joey Paquet, 2000, 2002, Parse(){ lookahead = NextToken() skipErrors([0,1,(],[$]) if (E();match('$')) else error = true} E(){ if (lookahead is in [0,1,(]) //FIRST(TE') if (T();E'();) write(E->TE') else error = true else error = true} E'(){ if (lookahead is in [+]) //FIRST[+TE'] if (match('+');T();E'()) write(E'->TE') else error = true else if (lookahead is in [$,)] //FOLLOW[E'](epsilon) write(E'->epsilon); else error = true} T(){ if (lookahead is in [0,1,(]) //FIRST[FT'] if (F();T'();) write(T->FT') else error = true else error = true} Example

Joey Paquet, 2000, 2002, T'(){ if (lookahead is in [*]) //FIRST[*FT'] if (match('*');F();T'()) write(T'->*FT') else error = true else if (lookahead is in [+,),$] //FOLLOW[T'] (epsilon) write(T'->epsilon) else error = true} F(){ if (lookahead is in [0]) //FIRST[0] match('0') write(F->0) else if (lookahead is in [1]) //FIRST[1] match('1') write(F->1) else if (lookahead is in [(]) //FIRST[(E)] if (match('(');E();match(')')) write(F->(E)); else error = true else error = true} } Example