Core Core: Simple prog. language for which you will write an interpreter as your project. First define the Core grammar Next look at the details of how an interpreter for Core may be written. Approach to be used in interpreter: Recursive descent (also “syntax directed”) **The tabs on the next two pages don’t work correctly on the classroom PCs – need to reformat for use on those …** CSE 3341/655; Part 2
BNF for Core <prog> ::= program <decl seq> begin <stmt seq> end (1) <decl seq> ::= <decl> | <decl> <decl seq> (2) <stmt seq> ::= <stmt> | <stmt> <stmt seq> (3) <decl> ::= int <id list>; (4) <id list> ::= <id> | <id>, <id list> (5) <stmt> ::= <assign>|<if>|<loop>|<in>|<out> (6) <assign> ::= <id> = <exp>; (7) <if> ::= if <cond> then <stmt seq> end; (8) |if <cond> then <stmt seq> else <stmt seq> end; <loop> ::= while <cond> loop <stmt seq> end; (9) <in> ::= read <id list>; (10) <out> ::= write <id list>; (11) CSE 3341/655; Part 2
BNF for Core (contd.) <cond> ::= <comp>|!<cond> (12) | [<cond> && <cond>] | [<cond> or <cond>] <comp> ::= (<op> <comp op> <op>) (13) <exp> ::= <fac>|<fac>+<exp>|<fac>-<exp> (14) <fac> ::= <op> | <op> * <fac> (15) <op> ::= <int> | <id> | (<exp>) (16) <comp op> ::= != | == | < | > | <= | >= (17) <id> ::= <let> | <let><id> | <let><int> (18) <let> ::= A | B | C | ... | X | Y | Z (19) <int> ::= <digit> | <digit><int> (20) <digit> ::= 0 | 1 | 2 | 3 | ... | 9 (21) Notes: Problem with <exp>: consider 9-5+4; fix? -5 is not a legal <no>; fix? Productions (18)-(21) have no semantic significance; CSE 3341/655; Part 2
Parse Tree for a simple program program int X; begin X = 25; write X; end <prog> program <decl seq> end begin <stmt seq> <decl> <stmt seq> <stmt> <id list> int ; <assign> <stmt> <id> <let> x <output> <id> ; = <exp> write <id list> ; <let> x <...> <id> <let> x CSE 3341/655; Part 2
Concrete vs. Abstract Parse Trees program int x; begin X = 25; output X; end <prog> program <decl seq> end begin <stmt seq> <decl> <stmt> <id list> int ; <assign> <id> <let> X = <exp> x <...> <output> write ? ? ? CSE 3341/655; Part 2
Abstract Parse Tree <prog> <decl seq> <stmt seq> program int X; begin X = 25; write X; end <prog> <decl seq> <stmt seq> <decl> <stmt> <id list> <id> X <assign> <oper> <fac> <int> 25 <output> 1. What if we had declared Y instead of X? 2. What if we had exchanged the two statements? CSE 3341/655; Part 2
Core Interpreter Tokenizer: Inputs Core program, produces stream of tokens; Parser: Consumes stream of tokens, produces the abstract parse tree (PT); Printer: Given PT, prints the original prog. in a pretty format Executor: Given PT, executes the program; Parser, Printer, Executor: use recursive descent approach. Mention Lex, YACC, Flex, Bison, Antlr, … Slide 16 notes How to do this in pure BNF? Using ε it is easy. Without it, the number of productions increases quite a bit. But using ε can cause problems for compilers. In homeworks, exams, etc. you may use it unless I say otherwise. Relation to book: So far, mostly chapter 1; and 3.1., 3.2, 3.3; rest of chapter 3 not inclded. We will move to chapter 4; that will lead us to the project. A lot of this should be familiar from 321 and 625. But going over it again should make it easier to see how it relates to PLs and lang. implementations. The project also has some relation to 560. CSE 3341/655; Part 2
Tokenizer Tokens: Reserved words: program, begin, end, int, if, then, else, while, loop, read, write Operators/special symbols: ; , = ! [ ] && or ( ) + - * != == < > <= >= Integers (unsigned) Identifiers (start with uc letter, followed by zero or more uc letters followed by zero or more digits) CSE 3341/655; Part 2
Tokenizer methods ... getToken(): returns (info about) current token; Repeated calls to getToken() return same token. skipToken(): skips current token; next token becomes current token; so next call to getToken() will return new token. intVal(): returns the value of the current (integer) token; (what if current token is not an integer? -- error!) idName(): returns the name (string) of the current (id) token. (what if current token is not an id? -- error!) CSE 3341/655; Part 2
Recursive Descent Key idea: Single procedure PN corr. to each non-term. N PN is responsible for every occurrence of N and only occurrences of N Will use this approach for parsing, printing, execution Details: Obtain abstract parse tree Pass root node to PS (S is starting non-term.) Each PN gets most of the work done by procedures correspoding to the children of the nodes it receives as argument CSE 3341/655; Part 2
Recursive Descent (contd.) Example <if> <cond> <stmt seq> <stmt seq> ... ... ... void execIf( ?? ) { bool b = evalCond( ??); if (b) then { execSS(??); return; } else if (?alt?) then {execSS(??); return; } else return; } So, need: 1. Non-term. at current node 2. Alternative at current node 3. Move to children nodes CSE 3341/655; Part 2
A (bad!) representation of PTs An array representation of parse trees: Each node in tree ↔ row in array; Each row has 5 columns: Number corresponding to the non-terminal at the node; Number corresponding to alternative used; The row numbers of children nodes. Representation of the <if> statement in the last page: ... CSE 3341/655; Part 2
Recursive Descent (contd) void execIf( int n ) { // n is row no. of <if> node bool b = evalCond( PT[n,3]); // PT is the parse tree array if (b) then { execSS(PT[n,4]); return; } else if (PT[n,2] == 2) then {execSS(PT[n,5]); return; } else return; } Why do we need PT[n,1]? Why 5 columns in a row? What about <int>? what about <id>? CSE 3341/655; Part 2
Recursive Descent (contd) void printIf( int n ) { // n: row no. of <if> node // check PT[n,1] to see if this is <if> node write(“if”); printCond( PT[n,3]); // don’t we have to evaluate the condition? write(“then”); printSS(PT[n,4]); // what if it was not an <SS>? if (PT[n,2]==2) { write(“else”); printSS(PT[n,5]); } write(“end;”); } CSE 3341/655; Part 2
Recursive Descent (contd) void printAssign( int n ) { // n: row no. of <assign> node // check PT[n,1] to see if this is <assign> node printId( PT[n,3] ); write(“=”); print Exp( PT[n,4]); } // bug in this code! Slide 16 notes How to do this in pure BNF? Using ε it is easy. Without it, the number of productions increases quite a bit. But using ε can cause problems for compilers. In homeworks, exams, etc. you may use it unless I say otherwise. Relation to book: So far, mostly chapter 1; and 3.1., 3.2, 3.3; rest of chapter 3 not inclded. We will move to chapter 4; that will lead us to the project. A lot of this should be familiar from 321 and 625. But going over it again should make it easier to see how it relates to PLs and lang. implementations. The project also has some relation to 560. CSE 3341/655; Part 2
Recursive Descent (contd) void execAssign( int n ) { // n: row no. of <assign> node // check PT[n,1] to see if this is <assign> node int x = evalExp(PT[n,4]); // don’t we have to first take care of PT[n,3]? assignIdVal(PT[n,3], x); // what about PT[n,2]? PT[n,5]? } CSE 3341/655; Part 2
Parser Parsing is harder: No tree to descend! The trick: Build the tree *as* you descend! Approach: Calling procedure will create an "empty" node -by grabbing the next free row from the PT array- and pass it to the appropriate parse procedure CSE 3341/655; Part 2
Recursive Descent Parsing (Note: "t" is the (global) Tokenizer.) void parseIf( int n ) { // node created by *caller* - who? PT[n,1] = 8; // why? string s = t.getToken(); // if s != “if” error! PT[n,3] = nextRow++; // next free row; initialize? parseCond(PT[n,3]); // bug! PT[n,4] = nextRow++; parseSS(PT[n,4]); // bug! s = t.getToken(); if (s!=“else”) {return; // bug! bug!} t.skipToken(); PT[n,5]=nextRow++; parseSS(PT[n,5]); return; // not so fast! } CSE 3341/655; Part 2