Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Systems Architecture Lecture 4: Compilers, Assemblers, Linkers & Loaders Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some material drawn from CMU CSAPP Slides: Kesden and Puschel Lec 4 Systems Architecture
September 4, 1997 Introduction Objective: To introduce the role of compilers, assemblers, linkers and loaders. To see what is underneath a C program: assembly language, machine language, and executable. Lec 4 Systems Architecture
September 4, 1997 Compilation Process Lec 4 Systems Architecture
September 4, 1997 Compilation Process Lec 4 Systems Architecture
Below Your Program Example from a Unix system September 4, 1997 Below Your Program Example from a Unix system Source Files: count.c and main.c Corresponding assembly code: count.s and main.s Corresponding machine code (object code): count.o and main.o Library functions: libc.a Executable file: a.out format for a.out and object code: ELF (Executable and Linking Format) Lec 4 Systems Architecture
Producing an Executable Program September 4, 1997 Producing an Executable Program Example from a Unix system (SGI Challenge running IRIX 6.5) Compiler: count.c and main.c count.s and main.s gcc -S count.c main.c Assembler: count.s and main.s count.o and main.o gcc -c count.s main.s as count.s -o count.o Linker/Loader: count.o main.o libc.a a.out gcc main.o count.o ld main.o count.o -lc (additional libraries are required) Lec 4 Systems Architecture
Source Files void main() { int n,s; printf("Enter upper limit: "); September 4, 1997 Source Files void main() { int n,s; printf("Enter upper limit: "); scanf("%d",&n); s = count(n); printf("Sum of i from 1 to %d = %d\n",n,s); } int count(int n) { int i,s; s = 0; for (i=1;i<=n;i++) s = s + i; return s; } Lec 4 Systems Architecture
Assembly Code for MIPS (count.s) September 4, 1997 Assembly Code for MIPS (count.s) #.file 1 "count.c" .option pic2 .section .text .text .align 2 .globl count .ent count count: .LFB1: .frame $fp,48,$31 # vars= 16, regs= 2/0, args= 0, extra= 1 6 .mask 0x50000000,-8 .fmask 0x00000000,0 subu $sp,$sp,48 .LCFI0: sd $fp,40($sp) Lec 4 Systems Architecture
.LCFI1: lw $3,20($fp) sd $28,32($sp) .LCFI2: move $fp,$sp .LCFI3: September 4, 1997 .LCFI1: sd $28,32($sp) .LCFI2: move $fp,$sp .LCFI3: .set noat lui $1,%hi(%neg(%gp_rel(count))) addiu $1,$1,%lo(%neg(%gp_rel(count))) daddu $gp,$1,$25 .set at sw $4,16($fp) sw $0,24($fp) li $2,1 # 0x1 sw $2,20($fp) .L3: lw $2,20($fp) lw $3,16($fp) slt $2,$3,$2 beq $2,$0,.L6 b .L4 L6: lw $2,24($fp) lw $3,20($fp) addu $2,$2,$3 sw $2,24($fp) .L5: lw $2,20($fp) addu $3,$2,1 sw $3,20($fp) b .L3 .L4: lw $3,24($fp) move $2,$3 b .L2 .L2: move $sp,$fp ld $fp,40($sp) ld $28,32($sp) addu $sp,$sp,48 j $31 .LFE1: .end count Lec 4 Systems Architecture
Executable Program for MIPS (a.out) September 4, 1997 Executable Program for MIPS (a.out) 0000000 7f45 4c46 0102 0100 0000 0000 0000 0000 0000020 0002 0008 0000 0001 1000 1060 0000 0034 0000040 0000 6c94 2000 0024 0034 0020 0007 0028 0000060 0023 0022 0000 0006 0000 0034 1000 0034 0000100 1000 0034 0000 00e0 0000 00e0 0000 0004 0000120 0000 0004 0000 0003 0000 0114 1000 0114 0000140 1000 0114 0000 0015 0000 0015 0000 0004 0000160 0000 0001 7000 0002 0000 0130 1000 0130 0000200 1000 0130 0000 0080 0000 0080 0000 0004 0000220 0000 0008 7000 0000 0000 01b0 1000 01b0 0000240 1000 01b0 0000 0018 0000 0018 0000 0004 0000260 0000 0004 0000 0002 0000 01c8 1000 01c8 0000300 1000 01c8 0000 0108 0000 0108 0000 0004 0000320 0000 0004 0000 0001 0000 0000 1000 0000 0000340 1000 0000 0000 3000 0000 3000 0000 0005 • • • • • • • • • • • • • • • • • • • • • Lec 4 Systems Architecture
Assembly Characteristics: Data Types “Integer” data of 1, 2, or 4 bytes Data values Addresses (untyped pointers) Floating point data of 4, 8, or 10 bytes No aggregate types such as arrays or structures Just contiguously allocated bytes in memory Lec 4 Systems Architecture
Assembly Characteristics: Operations Perform arithmetic function on register or memory data Transfer data between memory and register Load data from memory into register Store register data into memory Transfer control Unconditional jumps to/from procedures Conditional branches Lec 4 Systems Architecture
Object Code Code for sum Assembler Linker 0x401040 <sum>: 0x55 Translates .s into .o Binary encoding of each instruction Nearly-complete image of executable code Missing linkages between code in different files Linker Resolves references between files Combines with static run-time libraries E.g., code for malloc, printf Some libraries are dynamically linked Linking occurs when program begins execution 0x401040 <sum>: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x08 0xec 0x5d 0xc3 Total of 13 bytes Each instruction 1, 2, or 3 bytes Starts at address 0x401040 Lec 4 Systems Architecture
Disassembling Object Code Disassembled 00401040 <_sum>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 03 45 08 add 0x8(%ebp),%eax 9: 89 ec mov %ebp,%esp b: 5d pop %ebp c: c3 ret d: 8d 76 00 lea 0x0(%esi),%esi Disassembler objdump -d p Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a.out (complete executable) or .o file Lec 4 Systems Architecture
Alternate Disassembly Disassembled Object 0x401040 <sum>: push %ebp 0x401041 <sum+1>: mov %esp,%ebp 0x401043 <sum+3>: mov 0xc(%ebp),%eax 0x401046 <sum+6>: add 0x8(%ebp),%eax 0x401049 <sum+9>: mov %ebp,%esp 0x40104b <sum+11>: pop %ebp 0x40104c <sum+12>: ret 0x40104d <sum+13>: lea 0x0(%esi),%esi 0x401040: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x08 0xec 0x5d 0xc3 Within gdb Debugger gdb p disassemble sum Disassemble procedure x/13b sum Examine the 13 bytes starting at sum Lec 4 Systems Architecture
What Can be Disassembled? % objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386 No symbols in "WINWORD.EXE". Disassembly of section .text: 30001000 <.text>: 30001000: 55 push %ebp 30001001: 8b ec mov %esp,%ebp 30001003: 6a ff push $0xffffffff 30001005: 68 90 10 00 30 push $0x30001090 3000100a: 68 91 dc 4c 30 push $0x304cdc91 Anything that can be interpreted as executable code Disassembler examines bytes and reconstructs assembly source Lec 4 Systems Architecture