Simple One-Pass Compiler Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University
Outline Translation Scheme. Annotated Parse Tree. Parsing Fundamental. Top-Down Parsers. Abstract Stack Machine. Simple Code Generation.
Simple One-Pass Compiler Scanner Source program (text stream) m a i n ( ) { Parser Object Code (text stream)
Sample Grammar expr expr + term expr expr - term expr term
Derivation String: 9 – 5 + 2 expr expr + term expr – term + term term – term + term 9 – term + term 9 – 5 + term 9 – 5 + 2 leftmost/rightmost derivation
Parse Tree expr expr term expr term term 9 - 5 + 2
Translation Scheme expr ::= expr + term expr ::= expr – term Context-free Grammar with Embedded Semantic Actions. expr ::= expr + term expr ::= expr – term expr ::= term term ::= 0 term ::= 1 ... term ::= 9 { print(‘+’); } { print(‘-’); } { print(‘0’); } { print(‘1’); } { print(‘9’); } emitting (พ่น) a translation
Parse Tree with Semantic Actions expr + { print(‘+’) } expr term - { print(‘-’) } 2 { print(‘2’) } expr term term 5 { print(‘5’) } 9 { print(‘9’) } Depth-first traversal Input: 9 – 5 + 2 Output: 9 5 - 2 +
Location of Semantic Actions Semantic Actions can be placed anywhere on the RHS. expr ::= {print(‘+’);} expr + term expr ::= {print(‘-’);} expr – term expr ::= term term ::= 0 {print(‘0’);} term ::= 1 {print(‘1’);} ... term ::= 9 {print(‘9’);}
Parsing Approaches Top-down parsing Bottom-up parsing build parse tree from start symbol match result terminal string with input stream simple but limit in power Bottom-up parsing start from input token stream build parse tree from terminal until get start symbol complex but powerful
Top Down vs. Bottom Up start here result match start here result input token stream input token stream Top-down Parsing Bottom-up Parsing
Example type ::= simple | ^id | array [ simple ] of type simple ::= integer | char | num dotdot num Input Token String array [ num dotdot num ] of integer
Top-Down Parsing with Left-to-right Scanning of Input Stream type array [ simple ] of type Input array [ num dotdot num ] of integer lookahead token
Backtracking (Recursive-Descent Parsing) simple integer char num Input array [ num dotdot num ] of integer lookahead token
Predictive Parsing type array [ simple ] of type type ::= simple | ^id | array [ simple ] of type simple ::= integer | char | num dotdot num type array [ simple ] of type Input array [ num dotdot num ] of integer lookahead token
The Program for Predictive Parser match (scanner) Input (text stream) a r r a y [ OK match(‘array’) Predictive Parser Output
The Program for Predictive Parsing procedure match ( t : token ); procedure simple; begin begin if lookahead = t then if lookahead = integer then lookahead := nexttoken match ( integer ) else error else if lookahead = char then end; match ( char ) else if lookahead = num then begin procedure type; match ( num ) match ( dotdot ) match ( num ) begin end if lookahead is in { integer, char, num } then else error simple end; else if lookahead = ‘ ^ ‘ then begin match ( ‘ ^ ’ ); match ( id ) end else if lookahead = array then begin match ( array ); match ( ‘ [ ‘ ); simple; match ( ‘ ] ‘ ); match ( of ); type else error end;
Mapping Between Production and Parser Codes type -> arrary [ simple ] of type match(array); match(‘[‘); simple; match(‘]’); match(of); type scanner parser parsing (recognition) of simple
Lookahead Symbols A -> a FIRST( a ) = set of fist token in strings generated from a FIRST(simple) = { integer, char, num } FIRST( ^id ) = { ^ } FIRST(array [ simple ] of type) = { array }
Rules for Predictive Parser If A -> a and A -> b then FIRST(a) and FIRST(b) are disjoint e-production stmt -> begin opt_stmts end opt_stmts -> stmt_list opt_stmts | e
expr -> expr + term | term Left Recursion Left Recursion => Parser loops forever A -> Aa | b expr -> expr + term | term Rewriting... A -> b R R -> a R | e
Example expr expr + term expr expr - term expr term expr term rest rest + term rest | - term rest | e term 0 | 1 | 2 | ... | 9
Semantic Actions expr term rest rest + term {print(‘+’);} rest ...
rest + term {print(‘+’);} rest | - term {print(‘-’);} rest | e expr term rest rest + term {print(‘+’);} rest | - term {print(‘-’);} rest | e term 0 {print(‘0’);} ... procedure expr; begin term(); rest(); end; procedure rest; begin if lookahead = ‘+’ then match(‘+’); term(); print(‘+’); rest(); else if lookahead = ‘-’ then begin match(‘-’); print(‘-’); end;
Abstract Stack Machines Primitive Arithmetic Operator 1 3 + 5 * Stack 1 Stack 3 Add Stack 5 Multiply L-values & R-values i := i + 1 ; ADDRESS OF i VALUE OF i p^ := q^
Stack Manipulation push v push v on top of stack pop v pop v from top of stack rvalue l push content of data l lvalue l push address of data l := put r-value on top to l-value below and pop both from top copy push copy of top
Translation of Expression day := (1461 * y) div 4 + (153 * m + 2) div 5 + d lvalue day push 153 div push 1461 rvalue m + rvalue y * rvalue d * push 2 + push 4 + := div push 5
Translation Scheme for Assignment Statement stmt -> id := { emit(‘lvalue’, id.lexeme); } expr { emit(‘:=‘); }
Control Flow label l target for jump goto l jump to lable l gofalse l pop top; jump if zero gotrue l pop top; jump if nonzero
Translation of if Statement stmt -> if expr { out := newlabel; emit ( ‘gofalse’, out ); } then stmt1 { emit ( ‘label’, out ); } code for expr gofalse out code for stmt1 label out