Compiler Construction Code Generation Activation Records TODO: ensure consistency in displaying the values of instrumentation predicates for concrete heaps
LIR vs. assembly LIR Assembly #Registers Unlimited Limited Function calls Implicit Runtime stack Instruction set Abstract Concrete
Function calls Conceptually Supply new environment (frame) with temporary memory for local variables Pass parameters to new environment Transfer flow of control (call/return) Return information from new environment (ret. value)
Activation records New environment = activation record (a.k.a. frame) Activation record = data of current function / method call User data Local variables Parameters Return values Register contents Administration data Code addresses
Runtime stack Stack of activation records Call = push new activation record Return = pop activation record Only one “active” activation record: top of stack
Runtime stack … Stack grows downwards (towards smaller addresses) Current frame … Previous frame SP FP Stack grows downwards (towards smaller addresses) SP: stack pointer top of current frame FP: frame pointer base of current frame Sometimes called BP (base pointer)
X86 runtime stack Instruction Usage push, pusha,… Push on runtime stack pop, popa,… Pop from runtime stack call Transfer control to called routine ret Transfer control back to caller Register Usage ESP Stack pointer EBP Base pointer X86 stack registers pusha: Push all general-purpose registers onto stack. pushad: Push all double-word (32-bit) registers onto stack. X86 stack and call/ret instructions
Call sequences The processor does not save the content of registers on procedure calls So who will? Caller saves and restores registers Caller’s responsibility Callee saves and restores registers Callee’s responsability Calling conventions describe the interface of called code: The order in which atomic (scalar) parameters, or individual parts of a complex parameter, are allocated How parameters are passed (pushed on the stack, placed in registers, or a mix of both) Which registers may be used by the callee without first being saved (i.e. pushed) How the task of setting up for and restoring the stack after a function call is divided between the caller and the callee
… … Call sequences call return caller callee caller FP SP SP SP SP SP Push caller-save registers Push actual parameters (in reverse order) FP … … caller Caller push code SP push return address Jump to call address Reg 1 … Reg n call SP Push current base-pointer bp = sp Push local variables Push callee-save registers Callee push code (prologue) Param n … param1 callee SP Return address Callee pop code (epilogue) Pop callee-save registers Pop callee activation record Pop old base-pointer SP Previous fp SP Local 1 Local 2 … Local n return FP pop return address Jump to address Caller pop code SP caller Pop parameters Pop caller-save registers
Call sequences – Foo(42,21) caller callee caller call return push %ecx call _foo Push caller-save registers Push actual parameters (in reverse order) caller push return address Jump to call address call push %ebp mov %esp, %ebp sub %8, %esp push %ebx Push current base-pointer bp = sp Push local variables (callee variables) Push callee-save registers callee pop %ebx mov %ebp, %esp pop %ebp ret Pop callee-save registers Pop callee activation record Pop old base-pointer Registers EAX, ECX, and EDX are caller-saved, and the rest are callee-saved. ECX is a general register. return pop return address Jump to address add $8, %esp pop %ecx caller Pop parameters Pop caller-save registers
“To Callee-save or to Caller-save?” Callee-saved registers need only be saved when callee modifies their value Some conventions exist (cdecl) %eax, %ecx, %edx – caller save %ebx, %esi, %edi – callee save %esp – stack pointer %ebp – frame pointer Use %eax for return value cdecl (which stands for C declaration) is a calling convention that originates from the C programming language and is used by many C compilers for the x86 architecture.[1] In cdecl, subroutine arguments are passed on the stack. Integer values and memory addresses are returned in the EAX register, floating point values—in the ST0 x87 register. Registers EAX, ECX, and EDX are caller-saved, and the rest are callee-saved.
Accessing stack variables … … Use offset from EBP Stack grows downwards Above EBP = parameters Below EBP = locals Examples %ebp + 4 = return address %ebp + 8 = first parameter %ebp – 4 = first local Param n … param1 FP+8 Return address FP Previous fp Local 1 Local 2 … Local n-1 Local n FP-4 SP
main calling method bar int bar(int x) { int y; … } static void main(string[] args) { int z; Foo a = new Foo(); z = a.bar(31);
main calling method bar int bar(Foo this, int x) { int y; … } static void main(string[] args) { int z; Foo a = new Foo(); z = a.bar(a,31); … … implicit parameter main’s frame EBP+12 31 EBP+8 this ≈ a Return address implicit argument EBP Previous fp bar’s frame y ESP, EBP-4 Examples %ebp + 4 = return address %ebp + 8 = first parameter Always this in virtual function calls %ebp = old %ebp (pushed by callee) %ebp – 4 = first local
x86 assembly AT&T syntax and Intel syntax We’ll be using AT&T syntax Work with GNU Assembler (GAS) GAS instructions generally have the form mnemonic source, destination. For instance, the following mov instruction: movb $0x05, %al will move the value 5 into the register al.
IA-32 Eight 32-bit general-purpose registers EFLAGS register EAX, EBX, ECX, EDX, ESI, EDI EBP – stack frame (base) pointer ESP – stack pointer EFLAGS register info on results of arithmetic operations EIP (instruction pointer) register Machine-instructions add, sub, inc, dec, neg, mul, … generally suffixed with the letters "b", "s", "w", "l", "q" or "t" to determine what size operand is being manipulated. S-Sign, Z-Zero, C-Carry, P-Parity, O-Overflow
Immediate and register operands Value specified in the instruction itself Preceded by $ Example: add $4,%esp Register Register name is used Preceded by % Example: mov %esp,%ebp
Memory and base displacement operands Memory operands Obtain value at given address Example: mov (%eax), %eax Base displacement Obtain value at computed address Syntax: disp(base,index,scale) offset = base + (index * scale) + displacement Example: mov $42, 2(%eax) %eax + 2 Example: mov $42, (%eax,%ecx,4) %eax + %ecx*4 Addressing mode Example Meaning Register Add R4,R3 R4 <- R4+R3 Immediate Add R4,#3 R4 <- R4+3 Displacement Add R4,100(R1) R4 <- R4+Mem[100+R1] Register indirect Add R4,(R1) R4 <- R4+Mem[R1] Indexed / Base Add R3,(R1+R2) R3 <- R3+Mem[R1+R2] Direct or absolute Add R1,(1001) R1 <- R1+Mem[1001] Memory indirect Add R1,@(R3) R1 <- R1+Mem[Mem[R3]] Auto-increment Add R1,(R2)+ R1 <- R1+Mem[R2]; R2 mR2+d Auto-decrement Add R1,–(R2) R2 <- R2–d; R1 <- R1+Mem[R2] Scaled Add R1,100(R2)[R3] R1 <- R1+Mem[100+R2+R3*d]
Accessing Variables: Illustration … … Use offset from base pointer Above BP = parameters Below BP = locals (+ LIR reg.s) Examples: %eax = %ebp + 8 (%eax) = the value 572 8(%ebp) = the value 572 param n … 572 %eax,BP+8 Return address BP Previous bp local 1 … local n BP-4 SP
Base displacement addressing 4 4 4 4 4 4 4 4 7 2 4 5 6 7 1 Array base reference (%ecx,%ebx,4) mov (%ecx,%ebx,4), %eax %ecx = base %ebx = 3 offset = base + (index * scale) + displacement offset = %ecx + (3*4) + 0 = %ecx + 12
Instruction examples Translate a=p+q into mov 8(%ebp),%ecx (load p) add 12(%ebp),%ecx (arithmetic p + q) mov %ecx,-4(%ebp) (store a)
Instruction examples Array access: a[i]=1 Jumps: mov -4(%ebp),%ebx (load a) mov -8(%ebp),%ecx (load i) mov $1,(%ebx,%ecx,4) (store into the heap) Jumps: Unconditional: jmp label2 Conditional: cmp $0, %ecx jne cmpFailLabel