Presentation is loading. Please wait.

Presentation is loading. Please wait.

241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression.

Similar presentations


Presentation on theme: "241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression."— Presentation transcript:

1 241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression language 241-437, Semester 1, 2011-2012 10. Intermediate Code Generation

2 241-437 Compilers: IC/10 2 Overview 1. Intermediate Code (IC) Generation 2. IC Examples 3. Expression Translation in SPIM 4. The Expressions Language

3 241-437 Compilers: IC/10 3 In this lecture Source Program Target Lang. Prog. Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code

4 241-437 Compilers: IC/10 4 1. Intermediate Code (IC) Generation Helps with retargeting – –e.g. can easily attach a back end for a new machine to an existing front end Enables machine-independent code optimization. Front endBack end Intermediate code Target machine code

5 241-437 Compilers: IC/10 5 Graphical IC Representations Abstract Syntax Trees (AST) – –retains basic parse tree structure, but with unneeded nodes removed Directed Acyclic Graphs (DAG) – –compacted AST to avoid duplication – –smaller memory needs Control Flow Graphs (CFG) – –used to model control flow

6 241-437 Compilers: IC/10 6 Linear (text-based) ICs Stack-based (postfix) – –e.g. the JVM Three-address code x := y op z Two-address code: x := op y (the same as x := x op y)

7 241-437 Compilers: IC/10 7 2. IC Examples ASTs and DAGs Stack-based (postfix) Three-address Code SPIM

8 241-437 Compilers: IC/10 8 2.1. ASTs and DAGs assign a+ ** b- c a+ b b * cc a := b *-c + b * -c -- Pros:easy restructuring of code and/or expressions for intermediate code optimization Cons:memory intensive AST DAG

9 241-437 Compilers: IC/10 9 2.2. Stack-based (postfix) a := b * -c + b * -c b c uminus * b c uminus * + a assign iload 2// push b iload 3// push c ineg// uminus imul// * iload 2// push b iload 3// push c ineg// uminus imul// * iadd// + istore 1// store a (e.g. JVM stack instrs) Postfix notation represents operations on a stack Pro:easy to generate Cons:stack operations are more difficult to optimize

10 241-437 Compilers: IC/10 10 2.3. Three-Address Code a := b * -c + b * -c t1 := - c t2 := b * t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 Translated from the AST t1 := - c t2 := b * t1 t5 := t2 + t2 a := t5 Translated from the DAG

11 241-437 Compilers: IC/10 11 2.4. SPIM Three address code for a simulator that runs MIPS32 assembly language programs – –http://www.cs.wisc.edu/~larus/spim.html Loading/Storing – –lw register,var - loads value into register – –sw register,var - stores value from register – –many, many others continued

12 241-437 Compilers: IC/10 12 8 registers: $t0 - $t7 Binary math ops (reg1 = reg2 op reg3): – –add reg1,reg2,reg3 – –sub reg1,reg2,reg3 – –mul reg1,reg2,reg3 – –div reg1,reg2,reg3 Unary minus (reg1 = - reg2) – –neg reg1, reg2

13 241-437 Compilers: IC/10 13 "a := b * -c + b * -c" in SPIM assign a+ ** b- c b c lw $t0,c neg $t1,$t0 lw $t0,b mul $t2, $t1,$t0 lw $t0,c neg $t1,$t0 lw $t0,b mul $t1, $t1,$t0 add $t1,$t2,$t1 sw $t1,a t1 t0 t1 t2 t1 - AST

14 241-437 Compilers: IC/10 14 a := b * -c + b * -c lw $t0,c neg $t1,$t0 lw $t0,b mul $t1, $t1,$t0 add $t2,$t1,$t1 sw $t2,a assign a+ b * - c t1 t0 t1 t2 DAG

15 241-437 Compilers: IC/10 15 3. Expression Translation in SPIM Grammar: S => id := E E => E + E E => id S a := b + c + d + e E EE E E E E Generate: lw $t1,b 1 As we parse, use attributes to pass information about the temporary variables up the tree. parse tree --> code using bottom-up evaluation

16 241-437 Compilers: IC/10 16 S a := b + c + d + e E EE E E E E Generate: lw $t1,b lw $t2,c 12 Each number corresponds to a temporary variable.

17 241-437 Compilers: IC/10 17 S a := b + c + d + e E EE E E E E Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 12 3 Each number corresponds to a temporary variable.

18 241-437 Compilers: IC/10 18 S a := b + c + d + e E EE E E E E Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d 12 3 4

19 241-437 Compilers: IC/10 19 S a := b + c + d + e E EE E E E E Generate: lw t1,b lw t2,c add $t3,$t1,$t2 lw t4,d add $t5,$t3,$t4 12 3 4 5

20 241-437 Compilers: IC/10 20 S a := b + c + d + e E EE E E E E Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e 12 3 4 5 6

21 241-437 Compilers: IC/10 21 S a := b + c + d + e E EE E E E E 12 3 4 5 6 7 Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6

22 241-437 Compilers: IC/10 22 S a := b + c + d + e E EE E E E E 12 3 4 5 6 7 Generate: lw $t1,b lw $t2,c add $t3,$t1,$t2 lw $t4,d add $t5,$t3,$t4 lw $t6,e add $t7,$t5,$t6 sw $t7,a Pro:easy to rearrange code for global optimization Cons:lots of temporaries

23 241-437 Compilers: IC/10 23 Issues when Processing Expressions Type checking/conversion. Address calculation for more complex types (arrays, records, etc.). Expressions in control structures, such as loops and if tests.

24 241-437 Compilers: IC/10 24 4. The Expressions Language exprParse3.c builds a parse tree for the input file (reuses code from exprParse2.c). An intermediate code is generated from the parse tree, and saved to an output file. The input file is not executed by exprParse3.c – –that is done by a separate emulator.

25 241-437 Compilers: IC/10 25 Usage > gcc -Wall -o exprParse3 exprParse3.c >./exprParse3 < test1.txt > cat codeGen.txt PUSH 2 STORE x WRITE PUSH 3 LOAD x ADD STORE y WRITE STOP let x = 2 let y = 3 + x test1.txt stores intermediate code in codeGen.txt exprParse3 test1.txt codeGen.txt

26 241-437 Compilers: IC/10 26 Emulator Usage >./emulator codeGen.txt Reading code from codeGen.txt == 2 == 5 Stop emulator codeGen.txt it runs the intermediate code

27 241-437 Compilers: IC/10 27 4.1. The Instruction Set The instructions in codeGen.txt are executed by a emulator. – –it emulates (simulates) real hardware The instructions refer to two data structures used in the emulator.

28 241-437 Compilers: IC/10 28 The Emulator's Data Structures The emulator's data structures: – –a symbol table of IDs and their integer values – –a stack of integers for evaluating the expressions 2 stack x 4 symbol table

29 241-437 Compilers: IC/10 29 The Instructions WRITE// pop top element off stack and print STOP// exit code emulation LOAD ID// get ID value from symbol table, and push onto stack STORE ID// copy stack top into symbol table for ID continued

30 241-437 Compilers: IC/10 30 PUSH integer// push integer onto stack STORE0 ID// push 0 onto stack, and save to table as value for ID ( same as push 0; store ID) MULT// pop two stack values, multiply them, push result back ADD, MINUS, DIV // same for those ops

31 241-437 Compilers: IC/10 31 Intermediate Code Type Since the intermediate code uses a stack to store values rather than registers, then it is a stack-based (postfix) representation.

32 241-437 Compilers: IC/10 32 4.2. exprParse3.c Coding All the parsing code in exprParse3.c is the same as exprParse2.c. The difference is that the parse tree is passed to a generateCode() function to convert it to intermediate code – –see main()

33 241-437 Compilers: IC/10 33 main() #define CODE_FNM "codeGen.txt" // where to store generated code int main(void) /* parse, print the tree, then generate code which is stored in CODE_FNM */ { Tree *t; nextToken(); t = statements(); match(SCANEOF); printTree(t, 0); generateCode(CODE_FNM, t); return 0; }

34 241-437 Compilers: IC/10 34 Generating the Code void generateCode(char *fnm, Tree *t) /* Open the intermediate code file, fnm, and write to it. */ { FILE *fp; if ((fp = fopen(fnm, "w")) == NULL) { printf("Could not write to %s\n", fnm); exit(1); } else { printf("Writing code to %s\n", fnm); cgTree(fp, t); fprintf(fp, "STOP\n"); // last instruction in file fclose(fp); } } // end of generateCode()

35 241-437 Compilers: IC/10 35 void cgTree(FILE *fp, Tree *t) /* Recurse over the parse tree looking for non-NEWLINE subtrees to convert into code Each block of code generated for a non-NEWLINE subtree ends with a WRITE instruction, to print out the value of the line. */ { if (t == NULL) return; Token tok = TreeOper(t); if (tok == NEWLINE) { cgTree(fp, TreeLeft(t)); cgTree(fp, TreeRight(t)); } else { codeGen(fp, t); fprintf(fp, "WRITE\n"); // print value at EOL } } // end of cgTree()

36 241-437 Compilers: IC/10 36 void codeGen(FILE *fp, Tree *t) /* Convert the tree nodes for ID, INT, ASSIGNOP, PLUSOP, MINUSOP, MULTOP, DIVOP into instructions. The load/store instructions: LOAD ID, STORE ID, STORE0 ID, PUSH integer The math instructions: MULT, ADD, MINUS, DIV */ { if (t == NULL) return; : continued

37 241-437 Compilers: IC/10 37 Token tok = TreeOper(t); if (tok == ID) codeGenID(fp, TreeID(t)); else if (tok == INT) fprintf(fp, "PUSH %d\n", TreeValue(t)); else if (tok == ASSIGNOP) { // id = expr char *id = TreeID(TreeLeft(t)); getIDEntry(id); // don't use Symbol info codeGen(fp, TreeRight(t)); fprintf(fp, "STORE %s\n", id); } : continued

38 241-437 Compilers: IC/10 38 else if (tok == PLUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "ADD\n"); } else if (tok == MINUSOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MINUS\n"); } : continued

39 241-437 Compilers: IC/10 39 else if (tok == MULTOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "MULT\n"); } else if (tok == DIVOP) { codeGen(fp, TreeLeft(t)); codeGen(fp, TreeRight(t)); fprintf(fp, "DIV\n"); } } // end of codeGen()

40 241-437 Compilers: IC/10 40 void codeGenID(FILE *fp, char *id) /* An ID may already be in the symbol table, or be new, which is converted into a LOAD or a STORE0 code operation. */ { SymbolInfo *si = NULL; if ((si = lookupID(id)) != NULL) // already declared fprintf(fp, "LOAD %s\n", id); else { // new, so add to table addID(id, 0); // 0 is default value fprintf(fp, "STORE0 %s\n", id); } } // end of codeGenID()

41 241-437 Compilers: IC/10 41 From Tree to Code \n NULL = x2 = y + 3x let x = 2 let y = 3 + x x 0 symbol table in exprParse3.c PUSH 2 STORE x WRITE PUSH 3 LOAD x ADD STORE y WRITE STOP y 0

42 241-437 Compilers: IC/10 42 4.3. The Emulator > gcc –Wall –o emulator emulator.c >./emulator codeGen.txt Reading code from codeGen.txt == 2 == 5 Stop

43 241-437 Compilers: IC/10 43 Emulator Data Structures #define MAX_SYMS 15 // max no of vars #define STACK_SIZE 10 // stack data structure int stack[STACK_SIZE]; int stackTop = -1; // symbol table data structures typedef struct SymInfo { char *id; int value; } SymbolInfo; int symNum = 0; // number of symbols stored SymbolInfo syms[MAX_SYMS]; 2 x 4

44 241-437 Compilers: IC/10 44 Evaluating Input Lines void eval(FILE *fp) /* Read in the code file a line at a time and process the lines. An instruction on a line may be a single command (e.g. WRITE) or a instruction name and an argument (e.g. LOAD x). */ { char buf[BUFSIZ]; char cmd[MAX_LEN], arg[MAX_LEN]; int no; : continued

45 241-437 Compilers: IC/10 45 while (fgets(buf, sizeof(buf), fp) != NULL) { no = sscanf(buf, "%s %s\n", cmd, arg); if ((no 2)) printf("Unknown format: %s\n", buf); else processCmd(cmd, arg); // process commands as they are read in } } // end of eval()

46 241-437 Compilers: IC/10 46 Processing an Instruction void processCmd(char *cmd, char *arg) { SymbolInfo *si; if (strcmp(cmd, "LOAD") == 0) { if ((si = lookupID(arg)) == NULL) { printf("Error: load cannot find %s\n", arg); exit(1); } push(si->value); } else if (strcmp(cmd, "STORE") == 0) addID(arg, topOf()); else if (strcmp(cmd, "STORE0") == 0) { push(0); addID(arg, 0); } continued

47 241-437 Compilers: IC/10 47 else if (strcmp(cmd, "PUSH") == 0) push( atoi(arg) ); else if (strcmp(cmd, "MULT") == 0) { int v2 = pop(); int v1 = pop(); push( v1*v2 ); } else if (strcmp(cmd, "ADD") == 0) { int v2 = pop(); int v1 = pop(); push( v1+v2 ); } else if (strcmp(cmd, "MINUS") == 0) { int v2 = pop(); int v1 = pop(); push( v1-v2 ); } continued

48 241-437 Compilers: IC/10 48 else if (strcmp(cmd, "DIV") == 0) { int v2 = pop(); if (v2 == 0) { printf("Error: div by 0; using 1\n"); v2 = 1; } int v1 = pop(); push( v1/v2 ); } else if (strcmp(cmd, "WRITE") == 0) printf("== %d\n", pop()); else if (strcmp(cmd, "STOP") == 0) { printf("Stop\n"); exit(1); } continued

49 241-437 Compilers: IC/10 49 else printf("Unknown instruction: %s\n", cmd); } // end of processCmd()

50 241-437 Compilers: IC/10 50 Evaluating the Code for test1.txt let x = 2 let y = 3 + x PUSH 2 STORE x WRITE PUSH 3 LOAD x ADD STORE y WRITE STOP test1.txtcodeGen.txt continued

51 241-437 Compilers: IC/10 51 PUSH 2 STORE X WRITE PUSH 3 2 2 x 2 x 2 3 x 2 stack symbol table x 2 continued

52 241-437 Compilers: IC/10 52 LOAD X ADD STORE Y WRITE STOP 3 2 x 2 x 2 x 2 stack symbol table y 5 x 2 5 5 y 5 y 5 2+ 3


Download ppt "241-437 Compilers: IC/10 1 Compiler Structures Objective – –describe intermediate code generation – –explain a stack-based intermediate code for the expression."

Similar presentations


Ads by Google