Download presentation
Presentation is loading. Please wait.
Published byEdmund Bond Modified over 9 years ago
1
©University of Sheffieldcom2010 Com2010 - Functional Programming Syntax Analysis Marian Gheorghe Lecture 16 Module homepage Mole & http://www.dcs.shef.ac.uk/~marian
2
17.3 Syntax analysis (Parsing) and postfix (inverse Polish) notation Parsing ©University of Sheffieldcom2010
3
Parsing (syntax analysis of) a program means passing through the text of the program and checking whether the rules defining the syntax of the programming language are correctly applied. Parsing comes immediately after lexical analysis and consequently processes a sequence of token units rather than the initial sequence of characters defining the program. The syntax rules may be given in various equivalent forms: EBNF notation or syntax diagrams. What parsing means? ©University of Sheffieldcom2010
4
Compiler ©University of Sheffieldcom2010 Lexical analysisParsing Compiler Semantics; code generation… Source fileToken unitsAST "a = a + 1 " ++ “/*comment*/" [(1,"a"),(5,"="),(1,"a"), (3,"+"),(2,"1"),(0,"Eop")] ?? ? AST – Abstract Syntax Tree Key - codeLexical unit - string spaces, comment only in source
5
C/Java syntax fragment loop ::= iteration_statement iteration_statement ::= while ( expression ) statement | do statement while ( expression ) | for_statement for_statement ::= for ( expression ; expression ; expression ) statement selection_statement ::= if_statement | switch ( expression ) statement if_statement ::= if ( expression ) statement | if ( expression ) statement else statement additive_expression ::= multiplicative_expression {additive_operator multiplicative_expression} ::=, |, {, }, (, ) – meta-language elements if, do, while … - terminals (bold, blue, lowercase); the rest are non-terminals (uppercase) http://www.csci.csusb.edu/dick/samples EBNF ©University of Sheffieldcom2010
6
SA (Sequence of Assignments) -consisting only of assignment statements delimited by ;. Token units are separated by spaces ‘ ‘. Each assignment has a very simple form identifier = arithmetic_expression where arithmetic_expression is built with identifiers, numbers and +, - The lexic will follow from the syntax diagrams. Comments are between “ /* ” and “ */ ” Our language, SA, will be able to express simple calculations. Ex: a = 1 ; a = a + a – 1 ; b = a + 7 We also assume that every program should end with a specific lexical unit called eop (lexical analyser will be responsible for adding this bit – last token unit (0,”eop”) ) Syntax expressed by using syntax diagrams of four types: Sequence (S), alternation (A), iteration (I) and terminal (T) Simple programming language ©University of Sheffieldcom2010
7
1. Program :: =(S)7. Trm::=(A) StmtList EopIdentifier 2. StmtList ::=(I)Number Assign8. Operator ::=(A) DelimAddOp 3. Assign ::=(S)MinOp LHandS RestAss9. LHandS ::= ident (T) 4. RestAss ::=(S)10. AssSymb ::= assg (T) AssSymb Exp11. Identifier ::= ident (T) 5. Exp ::=(I)12. Number ::= no (T) Trm13. Delim ::= sc (T) Operator14. Addop ::= pls (T) 6. Eop ::= eop(T)15. MinOp ::= mns (T) S – sequence, A – alternation; I – iteration; T – terminal. Lexic given by diagrams 9 – 15. SA – syntax diagrams ©University of Sheffieldcom2010
8
Where terminals are: ident is the key of the token unit (ident, “someIdent”) assg is the key of the token unit (assg, “=“) no is the key of the token unit (no, “123”) sc is the key of the token unit (sc, “;”) pls is the key of the token unit (pls, “+”) mns is the key of the token unit (mns, “-”) Terminals ©University of Sheffieldcom2010
9
1. Program :: =(S)7. Trm::=(A) StmtList EopIdentifier Number 1’.Program ::= StmtList Eop7’.Trm ::= Identifier | Number 2. StmtList ::=(I) 9. LHandS ::= ident (T) Assign Delim 2’.StmtList ::= Assign {Delim Assign} 9’.LHandS ::= ident - Decision point!! S – sequence, A – alternation; I – iteration; T – terminal ::=, |, {, } – meta-language elements Syntax diagrams and EBNF notation ©University of Sheffieldcom2010
10
Alternation: two sets of symbols distinguishing the two alternatives; each alternative is uniquely identified by one of them 7’. Trm ::= Identifier | Number ⇒ {ident} | {no} 8’. Operator ::= AddOp | MinOp ⇒ ? These sets MUST be disjoint Iteration: a set of symbols associated to the iterative process; the set uniquely identifies the component that iterates 2’.StmtList ::= Assign {Delim Assign} ⇒ {;} 5’.Exp ::= Trm {Operator Trm} ⇒ ? Decision points ©University of Sheffieldcom2010 {+} | {-} {+, -}
11
Sequence ::= (read as ‘X followed by Y’) XY Alternation ::= (read as ‘either X or Y’) X Y Iteration ::= (read as ‘X followed by, …’) X Y Terminal ::= (read as ‘this is t’) t - Decision points Diagram types (normal forms) ©University of Sheffieldcom2010 - What identifies X and Y - What identifies the iteration: Y
12
Each of the four diagrams will have a parsing function associated with; each parsing function processes and returns a list of token units. Recap. type TokenUnit = (Int,String) Each of the first three diagrams has two components (X, Y) that correspond to syntax diagrams; consequently they are parsing functions too. All these diagrams will have the functions corresponding to X and Y as arguments. The parsing function for Sequence diagram seqOf :: (SetOf TokenUnit -> SetOf TokenUnit) -> (SetOf TokenUnit->SetOf TokenUnit)->SetOf TokenUnit->SetOf TokenUnit -- seqOf fX fY processes ->X -> Y-> seqOf fX fY = fY.fX -- composition Parsing functions - Sequence ©University of Sheffieldcom2010
13
The parsing function for Alternation diagram altOf :: (SetOf TokenUnit -> SetOf TokenUnit)-> SetOf TokenUnit -> (SetOf TokenUnit -> SetOf TokenUnit)-> SetOf TokenUnit -> SetOf TokenUnit -> SetOf TokenUnit -- altOf fX fY processes ‘X or Y’ using their identifying sets -- XTUs, YTUs altOf _ _ _ _ [] = error ("Input: empty/ Alternative ") altOf fX fXTUs fY fYTUs ts@(t:ts') | fst t `elem` map fst fXTUs = fX ts | fst t `elem` map fst fYTUs = fY ts | otherwise = error("Input: "++ show t++"/ Expected: “ ++ show(head fXTUs) ++ " or “ ++ show(head FYTUs)) where fXTUs and fYTUs represent the sets of token units that distinguishes fX from fY respectively; ts@(t:ts') as pattern allowing to refer to t:ts’ by using ts Parsing functions - Alternation ©University of Sheffieldcom2010
14
The parsing functions for Iteration diagram iterOf :: (SetOf TokenUnit -> SetOf TokenUnit) -> (SetOf TokenUnit -> SetOf TokenUnit) -> SetOf TokenUnit -> SetOf TokenUni -> SetOf TokenUnit -- iterOf fX fY processes fX and 'seqOf fY fX' iteratively using the -- set identifying the iteration component, fYTUs iterOf fX fY fYTUs ts = iterationOf fX fY fYTUs (fX ts) iterationOf :: (SetOf TokenUnit -> SetOf TokenUnit) -> (SetOf TokenUnit -> SetOf TokenUnit) -> SetOf TokenUnit -> SetOf TokenUniT -> SetOf TokenUnit iterationOf _ _ _ [] = error ("Input: empty/ Iteration ") iterationOf fX fY fYTUs ts@(t:ts') |fst t `elem` map fst fYTUs= iterationOf fX fY fYTUs (seqOf fY fX ts) | otherwise = ts Parsing functions - Iteration ©University of Sheffieldcom2010
15
The last diagram is the terminal parsing function fTerm :: TokenUnit -> SetOf TokenUnit -> SetOf TokenUnit -- fTerm processes the terminal x against the top element -- of the list of token units fTerm x [] = error("Input: empty/ Expected : "++show x) fTerm x (t:ts) | fst x /= fst t = error("Input: "++show t++"/ Expected: " ++show x) | otherwise = ts fTerm checks whether or not the terminal x is equal to the top element of the token list; if not, an error will stop the parsing process; if yes, the current top element is discarded Parsing functions - Terminal ©University of Sheffieldcom2010
16
With the previous four functions writing recursive descent parser is a routine process (replace X, Y or t with suitable components). Ex: fProgram :: SetOf TokenUnit -> SetOf TokenUnit --1 Program :: StmtList Eop - Sequence fProgram = seqOf fStmtList fEop fStmtList :: SetOf TokenUnit -> SetOf TokenUnit --2 StmtList :: Assign {Delim Assign} - Iteration fStmtList = iterOf fAssign fDelim [(sc, ";")] fOperator::SetOf TokenUnit -> SetOf TokenUnit -- 8 Operator ::= AddOp|MinOp - Alternation fOperator=altOf fAddOp [(pls, "+")] fMinOp [(mns, "-")] fAssSymb :: SetOf TokenUnit -> SetOf TokenUnit --10AssSymb :: = - Terminal fAssSymb = fTerm (assg, "=") SA parser ©University of Sheffieldcom2010
17
Ex: SA program k = 1 ; j = k fProgram: (fStmtList fEop) ~~>fStmtList fStmtList: (fAssign {fDelim fAssign})~~>fAssign fAssign:(LHandS fRestAss) ~~>fLHandS fLHandS: (‘ident’ ie ‘k’ –- ok); then fRestAss fResAss: (fAssSymb fExp) ~~>fAssSymb fAssSymb: (‘=‘ ie ‘=‘ -- ok); then fExp fExp: (fTrm{fOperator fTrm}) ~~>fTrm fTrm: (fIdentifier|fNumber) ~~>fNumber fNumber: (‘no’ ie ‘1’ – ok); then fDelim fDelim(‘;’ ie ‘;’ -- ok) fAssign (…) This is an Abstract Syntax Tree (from a derivation tree) Invocation chain ©University of Sheffieldcom2010
18
1. We have written a lexical analyser for SA (called lex_an ). It will be invoked as lex_an lex_aut in_p lex_aut is the automaton used by the scanner (specifies SA lexic) and in_p is the input program (a string) lex_an produces a list of token units Ex: lex_an lex_aut "a = a + 1 {-comment-}“ ⇒ [(1,"a"),(5,"="),(1,"a"),(3,"+"),(2,"1"),(0,"Eop")] 2. We have written a parser for SA (invoked through fProgram – the name of the first diagram). It processes a list of token units produced by the lexical analyser and returns a list of token units. If the input is correct then the returned result is []. It will be invoked as fProgram (lex_an lex_aut parser_in) Ex: fProgram (lex_an lex_aut "a = a + 1 ; b = a ") ⇒ [] Recap ©University of Sheffieldcom2010
19
lex_an lex_aut “k = 1 ; /*comment*/ j = k - 1“ ⇒ [(1,"k"),(5,"="),(2,"1"),(6,";"),(1,"j"),(5,"="),(1,"k"), (4,""),(2,"1"),(0,"Eop")] fProgram(lex_an lex_aut “k = 1 ; /*comment*/ j = k - 1“) ⇒ [] fProgram(lex_an lex_aut “k = 1 ; /*comment*/ j = k - “) ⇒ Program error: Input: (0,"Eop")/ Expected: (1,"") or (2,"") fProgram(lex_an lex_aut “k = 1 t ; /*comment*/ j = k - “) ⇒ Program error: Input: (1,"t")/ Expected: (0,"") Example ©University of Sheffieldcom2010
20
1.Syntax diagrams define the syntax 2.Four key diagrams (variants will follow) : 1.sequence 2.alternation 3.iteration 4.terminal 3.Simplified version with only two non-terminals for the first three provided 4.Higher order functions for these diagrams 5.Every syntax diagram is written as one of these functions 6.Parser: a collection of functions illustrating a recursive descent method 7.Outcome: an output suitable to further analysis Summary of Parsing ©University of Sheffieldcom2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.