©University of Sheffieldcom2010 Com2010 - Functional Programming Syntax Analysis Marian Gheorghe Lecture 16 Module homepage Mole &

Slides:



Advertisements
Similar presentations
AST Generation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Concepts Lecture 9.
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
CPSC Compiler Tutorial 9 Review of Compiler.
Context-Free Grammars Lecture 7
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
Syntax Directed Definitions Synthesized Attributes
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 Chapter 2 A Simple Compiler. 2 Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing.
CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Context-Free Grammars
Context-Free Grammars and Parsing
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
CONTENTS Processing structures and commands Control structures – Sequence Sequence – Selection Selection – Iteration Iteration Naming conventions – File.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Com Functional Programming Lexical Analysis Marian Gheorghe Lecture 15 Module homepage Mole & ©University of Sheffieldcom2010.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. C H A P T E R T W O Syntax.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
What am I? while b != 0 if a > b a := a − b else b := b − a return a AST == Abstract Syntax Tree.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
©University of Sheffieldcom2010 Com Functional Programming Demos: LexPrs & PrsRes & Software Engineering Design and Coding Marian Gheorghe Lecture.
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Parsing 1 of 4: Grammars and Parse Trees
Chapter 3 – Describing Syntax
Parsing 2 of 4: Scanner and Parsing
A Simple Syntax-Directed Translator
CS510 Compiler Lecture 4.
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
Compiler Construction (CS-636)
CS 363 Comparative Programming Languages
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
R.Rajkumar Asst.Professor CSE
C H A P T E R T W O Syntax.
CS 3304 Comparative Languages
Chapter 4: Lexical and Syntax Analysis Sangho Ha
High-Level Programming Language
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

©University of Sheffieldcom2010 Com Functional Programming Syntax Analysis Marian Gheorghe Lecture 16 Module homepage Mole &

17.3 Syntax analysis (Parsing) and postfix (inverse Polish) notation Parsing ©University of Sheffieldcom2010

Parsing (syntax analysis of) a program means passing through the text of the program and checking whether the rules defining the syntax of the programming language are correctly applied. Parsing comes immediately after lexical analysis and consequently processes a sequence of token units rather than the initial sequence of characters defining the program. The syntax rules may be given in various equivalent forms: EBNF notation or syntax diagrams. What parsing means? ©University of Sheffieldcom2010

Compiler ©University of Sheffieldcom2010 Lexical analysisParsing Compiler Semantics; code generation… Source fileToken unitsAST "a = a + 1 " ++ “/*comment*/" [(1,"a"),(5,"="),(1,"a"), (3,"+"),(2,"1"),(0,"Eop")] ?? ? AST – Abstract Syntax Tree Key - codeLexical unit - string spaces, comment only in source

C/Java syntax fragment loop ::= iteration_statement iteration_statement ::= while ( expression ) statement | do statement while ( expression ) | for_statement for_statement ::= for ( expression ; expression ; expression ) statement selection_statement ::= if_statement | switch ( expression ) statement if_statement ::= if ( expression ) statement | if ( expression ) statement else statement additive_expression ::= multiplicative_expression {additive_operator multiplicative_expression} ::=, |, {, }, (, ) – meta-language elements if, do, while … - terminals (bold, blue, lowercase); the rest are non-terminals (uppercase) EBNF ©University of Sheffieldcom2010

SA (Sequence of Assignments) -consisting only of assignment statements delimited by ;. Token units are separated by spaces ‘ ‘. Each assignment has a very simple form identifier = arithmetic_expression where arithmetic_expression is built with identifiers, numbers and +, - The lexic will follow from the syntax diagrams. Comments are between “ /* ” and “ */ ” Our language, SA, will be able to express simple calculations. Ex: a = 1 ; a = a + a – 1 ; b = a + 7 We also assume that every program should end with a specific lexical unit called eop (lexical analyser will be responsible for adding this bit – last token unit (0,”eop”) ) Syntax expressed by using syntax diagrams of four types: Sequence (S), alternation (A), iteration (I) and terminal (T) Simple programming language ©University of Sheffieldcom2010

1. Program :: =(S)7. Trm::=(A) StmtList EopIdentifier 2. StmtList ::=(I)Number Assign8. Operator ::=(A) DelimAddOp 3. Assign ::=(S)MinOp LHandS RestAss9. LHandS ::= ident (T) 4. RestAss ::=(S)10. AssSymb ::= assg (T) AssSymb Exp11. Identifier ::= ident (T) 5. Exp ::=(I)12. Number ::= no (T) Trm13. Delim ::= sc (T) Operator14. Addop ::= pls (T) 6. Eop ::= eop(T)15. MinOp ::= mns (T) S – sequence, A – alternation; I – iteration; T – terminal. Lexic given by diagrams 9 – 15. SA – syntax diagrams ©University of Sheffieldcom2010

Where terminals are: ident is the key of the token unit (ident, “someIdent”) assg is the key of the token unit (assg, “=“) no is the key of the token unit (no, “123”) sc is the key of the token unit (sc, “;”) pls is the key of the token unit (pls, “+”) mns is the key of the token unit (mns, “-”) Terminals ©University of Sheffieldcom2010

1. Program :: =(S)7. Trm::=(A) StmtList EopIdentifier Number 1’.Program ::= StmtList Eop7’.Trm ::= Identifier | Number 2. StmtList ::=(I) 9. LHandS ::= ident (T) Assign Delim 2’.StmtList ::= Assign {Delim Assign} 9’.LHandS ::= ident - Decision point!! S – sequence, A – alternation; I – iteration; T – terminal ::=, |, {, } – meta-language elements Syntax diagrams and EBNF notation ©University of Sheffieldcom2010

Alternation: two sets of symbols distinguishing the two alternatives; each alternative is uniquely identified by one of them 7’. Trm ::= Identifier | Number ⇒ {ident} | {no} 8’. Operator ::= AddOp | MinOp ⇒ ? These sets MUST be disjoint Iteration: a set of symbols associated to the iterative process; the set uniquely identifies the component that iterates 2’.StmtList ::= Assign {Delim Assign} ⇒ {;} 5’.Exp ::= Trm {Operator Trm} ⇒ ? Decision points ©University of Sheffieldcom2010 {+} | {-} {+, -}

Sequence ::= (read as ‘X followed by Y’) XY Alternation ::= (read as ‘either X or Y’) X Y Iteration ::= (read as ‘X followed by, …’) X Y Terminal ::= (read as ‘this is t’) t - Decision points Diagram types (normal forms) ©University of Sheffieldcom What identifies X and Y - What identifies the iteration: Y

Each of the four diagrams will have a parsing function associated with; each parsing function processes and returns a list of token units. Recap. type TokenUnit = (Int,String) Each of the first three diagrams has two components (X, Y) that correspond to syntax diagrams; consequently they are parsing functions too. All these diagrams will have the functions corresponding to X and Y as arguments. The parsing function for Sequence diagram seqOf :: (SetOf TokenUnit -> SetOf TokenUnit) -> (SetOf TokenUnit->SetOf TokenUnit)->SetOf TokenUnit->SetOf TokenUnit -- seqOf fX fY processes ->X -> Y-> seqOf fX fY = fY.fX -- composition Parsing functions - Sequence ©University of Sheffieldcom2010

The parsing function for Alternation diagram altOf :: (SetOf TokenUnit -> SetOf TokenUnit)-> SetOf TokenUnit -> (SetOf TokenUnit -> SetOf TokenUnit)-> SetOf TokenUnit -> SetOf TokenUnit -> SetOf TokenUnit -- altOf fX fY processes ‘X or Y’ using their identifying sets -- XTUs, YTUs altOf _ _ _ _ [] = error ("Input: empty/ Alternative ") altOf fX fXTUs fY fYTUs | fst t `elem` map fst fXTUs = fX ts | fst t `elem` map fst fYTUs = fY ts | otherwise = error("Input: "++ show t++"/ Expected: “ ++ show(head fXTUs) ++ " or “ ++ show(head FYTUs)) where fXTUs and fYTUs represent the sets of token units that distinguishes fX from fY respectively; as pattern allowing to refer to t:ts’ by using ts Parsing functions - Alternation ©University of Sheffieldcom2010

The parsing functions for Iteration diagram iterOf :: (SetOf TokenUnit -> SetOf TokenUnit) -> (SetOf TokenUnit -> SetOf TokenUnit) -> SetOf TokenUnit -> SetOf TokenUni -> SetOf TokenUnit -- iterOf fX fY processes fX and 'seqOf fY fX' iteratively using the -- set identifying the iteration component, fYTUs iterOf fX fY fYTUs ts = iterationOf fX fY fYTUs (fX ts) iterationOf :: (SetOf TokenUnit -> SetOf TokenUnit) -> (SetOf TokenUnit -> SetOf TokenUnit) -> SetOf TokenUnit -> SetOf TokenUniT -> SetOf TokenUnit iterationOf _ _ _ [] = error ("Input: empty/ Iteration ") iterationOf fX fY fYTUs |fst t `elem` map fst fYTUs= iterationOf fX fY fYTUs (seqOf fY fX ts) | otherwise = ts Parsing functions - Iteration ©University of Sheffieldcom2010

The last diagram is the terminal parsing function fTerm :: TokenUnit -> SetOf TokenUnit -> SetOf TokenUnit -- fTerm processes the terminal x against the top element -- of the list of token units fTerm x [] = error("Input: empty/ Expected : "++show x) fTerm x (t:ts) | fst x /= fst t = error("Input: "++show t++"/ Expected: " ++show x) | otherwise = ts fTerm checks whether or not the terminal x is equal to the top element of the token list; if not, an error will stop the parsing process; if yes, the current top element is discarded Parsing functions - Terminal ©University of Sheffieldcom2010

With the previous four functions writing recursive descent parser is a routine process (replace X, Y or t with suitable components). Ex: fProgram :: SetOf TokenUnit -> SetOf TokenUnit --1 Program :: StmtList Eop - Sequence fProgram = seqOf fStmtList fEop fStmtList :: SetOf TokenUnit -> SetOf TokenUnit --2 StmtList :: Assign {Delim Assign} - Iteration fStmtList = iterOf fAssign fDelim [(sc, ";")] fOperator::SetOf TokenUnit -> SetOf TokenUnit -- 8 Operator ::= AddOp|MinOp - Alternation fOperator=altOf fAddOp [(pls, "+")] fMinOp [(mns, "-")] fAssSymb :: SetOf TokenUnit -> SetOf TokenUnit --10AssSymb :: = - Terminal fAssSymb = fTerm (assg, "=") SA parser ©University of Sheffieldcom2010

Ex: SA program k = 1 ; j = k fProgram: (fStmtList fEop) ~~>fStmtList fStmtList: (fAssign {fDelim fAssign})~~>fAssign fAssign:(LHandS fRestAss) ~~>fLHandS fLHandS: (‘ident’ ie ‘k’ –- ok); then fRestAss fResAss: (fAssSymb fExp) ~~>fAssSymb fAssSymb: (‘=‘ ie ‘=‘ -- ok); then fExp fExp: (fTrm{fOperator fTrm}) ~~>fTrm fTrm: (fIdentifier|fNumber) ~~>fNumber fNumber: (‘no’ ie ‘1’ – ok); then fDelim fDelim(‘;’ ie ‘;’ -- ok) fAssign (…) This is an Abstract Syntax Tree (from a derivation tree) Invocation chain ©University of Sheffieldcom2010

1. We have written a lexical analyser for SA (called lex_an ). It will be invoked as lex_an lex_aut in_p lex_aut is the automaton used by the scanner (specifies SA lexic) and in_p is the input program (a string) lex_an produces a list of token units Ex: lex_an lex_aut "a = a + 1 {-comment-}“ ⇒ [(1,"a"),(5,"="),(1,"a"),(3,"+"),(2,"1"),(0,"Eop")] 2. We have written a parser for SA (invoked through fProgram – the name of the first diagram). It processes a list of token units produced by the lexical analyser and returns a list of token units. If the input is correct then the returned result is []. It will be invoked as fProgram (lex_an lex_aut parser_in) Ex: fProgram (lex_an lex_aut "a = a + 1 ; b = a ") ⇒ [] Recap ©University of Sheffieldcom2010

lex_an lex_aut “k = 1 ; /*comment*/ j = k - 1“ ⇒ [(1,"k"),(5,"="),(2,"1"),(6,";"),(1,"j"),(5,"="),(1,"k"), (4,""),(2,"1"),(0,"Eop")] fProgram(lex_an lex_aut “k = 1 ; /*comment*/ j = k - 1“) ⇒ [] fProgram(lex_an lex_aut “k = 1 ; /*comment*/ j = k - “) ⇒ Program error: Input: (0,"Eop")/ Expected: (1,"") or (2,"") fProgram(lex_an lex_aut “k = 1 t ; /*comment*/ j = k - “) ⇒ Program error: Input: (1,"t")/ Expected: (0,"") Example ©University of Sheffieldcom2010

1.Syntax diagrams define the syntax 2.Four key diagrams (variants will follow) : 1.sequence 2.alternation 3.iteration 4.terminal 3.Simplified version with only two non-terminals for the first three provided 4.Higher order functions for these diagrams 5.Every syntax diagram is written as one of these functions 6.Parser: a collection of functions illustrating a recursive descent method 7.Outcome: an output suitable to further analysis Summary of Parsing ©University of Sheffieldcom2010