Separate Assembly allows a program to be built from modules rather than a single source file assembler linker source file 1 -------------> object module 1 – (assembly lang.) (machine code) \ \ source file 2 -----------> object module 2 -----> executable file (assembly lang.) (machine code) / or "load module" / or "binary" / "bin" prewritten library – (machine code) (machine code)
Advantages of separate assembly separate source files provide a nice structure for dividing a project between several team members separate source files allow separate testing (and thus better isolation and detection of bugs) separate source files provide for easy reuse
Advantages of separate assembly separate source files minimize the work of reassembly and relinking needed whenever a change occurs in only one source file prewritten libraries provide machine extensions (higher-level functions like square root, if not available as a machine instruction)
Program testing bottom-up development: write and test the lowest-level (leaf) subroutines first, then write the modules or program that uses them requires extra effort of writing test drivers top-down development: write the highest-level modules/programs first and test the logic; requires extra effort of writing "stub" routines that are simply place holders for lower-level subroutines that will be developed later; these stubs can return fixed values if necessary and are often useful just to print that the call occurred (and optionally print what parameters were passed)
Program testing Example programs linked from web pages illustrate subroutines exercised by test drivers individual testing of modules like this is often not done (e.g., we are lazy or we foolishly think we will save time by skipping testing) or testing is done poorly (e.g., we neglect important test cases)
Program testing "write all the code, bolt it all together, and hope for the best"? - bad idea it is easier to detect errors and debug when we individually test than when we combine previously tested and debugged modules, the remaining errors will probably stem from misunderstandings about the interface specifications (i.e., the number, ordering, and data types of the actual parameters in the subroutine calls)
Program testing Programmers often mix the two approaches top-down typically offers the ability to quickly and easily prototype a program and obtain feedback from the end user see this essay by Paul Graham on "Programming Bottom-Up" in which he argues that bottom-up programs are usually smaller and easier to read: http://www.paulgraham.com/progbot.html
Linking objects compiler or assembler produces object file (.o) -- using -c flag for armc linker (unfortunately named ld since ln already used as a file system command) yields an executable file (default name is a.out)
Linking objects in file p1.s .global main .global x main: push {lr} prt_addr: ldr r1, =x ldr r0, =fmt1 bl printf prt_value: ldr r0, =x ldr r1,[r0] ldr r0, =fmt2 return: pop {pc} .section ".rodata" fmt1: .asciz "the address of x is %p\n" fmt2: .asciz "the value of x is %d\n"
Linking objects .global x .section ".data" x: .word 55 y: .word 66 in file p2.s .global x .section ".data" x: .word 55 y: .word 66 [03:08:42] rlowe@joey7:~/ [84] armc p1.s /tmp/ccSiPZZk.o: In function `return': (.text+0x24): undefined reference to `x' collect2: ld returned 1 exit status
Linking objects [03:13:18] rlowe@joey7: [89] armc -c p1.s [03:13:28] rlowe@joey7: [90] nm p1.o nm - prints symbols in object file or 00000000 t $a executable file 00000024 t $d r = read only data (addresses are 00000000 r fmt1 relative to data section) 00000018 r fmt2 T = text (addresses are relative to text 00000000 T main section U printf U = undefined 00000004 t prt_addr t = text (lower case type code => 00000010 t prt_value private, upper case type code => 00000020 t return global U x U = undefined
Linking objects [03:25:39] rlowe@joey7 [95] armc -c p2.s [03:25:47] rlowe@joey7 [96] nm p2.o 00000000 D x D = data (lower case type code => private 00000004 d y lower case type code => public
Linking objects [03:52:49] [107] arm-linux-gnueabi-gcc p1.o p2.o nm a.out // could use ld p1.o p2.o ... 00008924 t $a 00008e90 t $a 00008924 T main .. U printf@@GLIBC_2.4... undefined since 00008928 t prt_addr default is dynamic 00008934 t prt_value linking to shared 00008944 t return object for printf 00069078 D x 0006907c d y [04:06:40] rlowe@joey7:[120] ./a.out the address of x is 0x69078 the value of x is 55
Linking objects the linker resolves external references between .o (simple object files), .a (libraries/archives), and .so (shared objects) the linker also performs storage management to assign regions within the executable file to each program section; the linker resolves any external references and also performs relocation, that is, it fixes addresses within the program sections relative to each other
Linking objects simple object files and libraries/archives are combined using static linking to make a self-contained executable (i.e., all parts needed for execution are contained in the executable); however, for common library routines such as printf, this requires too much disk space for every executable that uses the common routine to have to store its own copy
Linking objects shared objects use dynamic linking (at run time) to save disk space (since the program doesn't need to keep a copy of the shared object inside its executable file) and memory space (since many programs can share a single memory-resident copy of the shared object); e.g., using static linking (arm-linux-gnueabi-gcc --static) on p1.s and p2.s resulted in an executable file size of 589770 bytes, which was reduced to 8629 bytes when dynamic linking was used (arm-linux-gnueabi-gcc)
Linking objects (dynamic link libraries (DLLs) in Windows systems are similar to shared objects, however, some early Windows systems would link DLL files into a complete executable memory image at load time rather than run time)
Linking objects dynamic linking Originally all programs were linked statically All external references fully resolved Each program complete Since late 1980's most systems have supported shared libraries and dynamic linking: For common library packages, only keep a single copy in memory, shared by all processes. Don't know where library is loaded until runtime; must resolve references dynamically, when program runs.
Linking objects static linking advantages executable is self-contained no run-time overhead dynamic linking advantages reduced disk space for executable only one copy of shared routine needs to be in memory, thus reduced memory space across several currently executing programs will get latest version of shared object
Linking example .global main, sub1, y main: push {lr} ldr r0, =x bl sub1 ldr r2, =y ldr r1, [r2] ldr r3, =x ldr r2, [r3] ldr r0, =fmt bl printf mov r0, #0 pop {pc} .section ".data" x: .word 1 y: .word 5 .section ".rodata" fmt: .asciz "\nx = %d y = %d\n\n"
Linking example /** sub1 File: b.s **/ sub1: push {lr} add r0,r0, #4 bl sub2 pop {pc} /** sub2 File: c.s ***/ .global sub2, y sub2: push {lr} ldr r1, =y ldr r2, [r1] add r1, r2, #1 str r1,[r0]
Loading running a program == actual _loading_ into memory and then _branching_ to the entry point address loader usually performs address relocation as words containing absolute addresses are loaded into memory; relocation is required in both the linker and the loader so that the program will run correctly (with the correct addresses)
Summary Assembler Linker Loading PC-relative offset * caller and subroutine in same source file bound -- * caller and subroutine in different files absolute address * definitions and use in same source file may need relocation * definition and use in different files