Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer System Chapter 7. Linking Lynn Choi Korea University.

Similar presentations


Presentation on theme: "Computer System Chapter 7. Linking Lynn Choi Korea University."— Presentation transcript:

1 Computer System Chapter 7. Linking Lynn Choi Korea University

2 A Simple Program Translation Problems: Efficiency: small change requires complete recompilation Modularity: hard to share common functions (e.g. printf ) Solution: Separate compilation: use linker Translator m.c p ASCII source file Binary executable object file (memory image on disk)

3 A Better Scheme Using a Linker Linker (ld) Translators m.c m.o Translators a.c a.o p Separately compiled relocatable object files Executable object file (contains code and data for all functions defined in m.c and a.c )

4 Compiler Driver Compiler driver coordinates all steps in the translation and linking proces s. Typically included with each compilation system (e.g., gcc ) Invokes preprocessor ( cpp ), compiler ( cc1 ), assembler ( as ), and linker ( ld ). Passes command line arguments to appropriate phases Example: create executable p from m.c and a.c : bass> gcc -O2 -v -o p m.c a.c cpp [args] m.c /tmp/cca07630.i cc1 /tmp/cca07630.i m.c -O2 [args] -o /tmp/cca07630.s as [args] -o /tmp/cca076301.o /tmp/cca07630.s ld -o p [system obj files] /tmp/cca076301.o /tmp/cca076302.o bass>

5 Example Program 1 /* main.c */ void swap(); int buf[2] = {1, 2}; int main() { swap(); return 0; } /* swap.c */ extern int buf[]; int *bufp0 = &buf[0]; int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } Unix> gcc –O2 –g –o p main.c swap.c This generates the following compilation processes cpp [other arguments] main.c /tmp/main.i c11 /tmp/main.i main.c –O2 [other arguments] –o /tmp/main.s as [other arguments] –o /tmp/main.o /tmp/main.s (The driver goes through the same process to generate swap.o.) Ld –o p [system object files and args] /tmp/main.o /tmp/swap.o

6 LinkerLinking The process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed. Linking time Can be done at compile time, i.e. when the source code is translated Or, at load time, i.e. when the program is loaded into memory Or, even at run time. Static Linker Performs the linking at compile time. Takes a collection of relocatable object files and command line arguments and generate a fully linked executable object file that can be loaded and run. Performs two main tasks Symbol resolution: associate each symbol reference with exactly one symbol definition Relocation: relocate code and data sections and modify symbol references to the relocated memory locations Dynamic Linker Performs the linking at load time or at run time. Will be discussed later..

7 Object Files Relocatable object file Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file Compilers and assemblers generate relocatable object files Executable object file Contains binary code and data in a form that can be copied directly into memory and executed Linkers generate executable object files Shared object file A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or at run time

8 What Does a Linker Do? Merges object files Merges multiple relocatable (. o ) object files into a single executable object file that can loaded and executed by the loader. Resolves external references As part of the merging process, resolves external references. External reference: reference to a symbol defined in another object file. Relocates symbols Relocates symbols from their relative locations in the.o files to new absolute positions in the executable. Updates all references to these symbols to reflect their new positions. References can be in either code or data code: a(); /* reference to symbol a */ data: int *xp=&x; /* reference to symbol x */

9 Why Linkers? Modularity Program can be written as a collection of smaller source files, rather than one monolithic mass. Can build libraries of common functions (more on this later) e.g., Math library, standard C libraryEfficiency Time: Change one source file, compile, and then relink. No need to recompile other source files. Space: Libraries of common functions can be aggregated into a single file... Yet executable files and running memory images contain only code for the functions they actually use.

10 Object File Format Object file format varies from system to system Examples: a.out (early UNIX systems), COFF (Common Object File Format, used by early versions of System V), PE (Portable Executable, a variation of COFF used by Microsoft Windows), ELF (Used by modern UNIX including Linux) ELF (Executable and Linkable Format) Standard binary format for object files Derives from AT&T System V Unix Later adopted by BSD Unix variants and Linux One unified format for Relocatable object files (.o ), Executable object files Shared object files (. so ) Generic name: ELF binaries Better support for shared libraries than old a.out formats.

11 ELF Object File Format ELF header Word size, byte ordering of the system, the machine type (e.g. IA32) The size of ELF header, the object file type (.e.g. relocatable, executable, or shared), the file offset of the section header table and the size and number of entries in the section header table. Section header table Contains the locations and sizes of various sections A fixed size entry for each section Segment header table Page size, virtual and physical addresses of memory segments (sections), segment sizes..text section: machine code.data section: initialized global variables Local C variables are maintained at run time on the stack, and do not appear in.data or.bss section.bss section: uninitialized global variables “Block Storage Start” or “Better Save Space!” Just a place holder and do not occupy space ELF header Segment header table (required for executables).text section.data section.bss section.symtab.rel.txt.rel.data.debug Section header table (required for relocatables) 0

12 ELF Object File Format (cont).symtab section Symbol table Information about function and global variables.rel.text section Relocation info for.text section Addresses of instructions that will need to be modified in the executable.rel.data section Relocation info for.data section Addresses of pointer data that will need to be modified in the merged executable.debug section Info for symbolic debugging ( gcc -g ).line section Mapping between line numbers in source code and machine instructions in the.text section.strtab section String table for symbols in the.symtab and.debug ELF header Program header table (required for executables).text section.data section.bss section.symtab.rel.text.rel.data.debug Section header table (required for relocatables) 0.line.strtab

13 Life and Scope of an Object Life vs. scope Life of an object determines whether the object is still in memory (of the process) whereas the scope of an object determines whether the object can be accessed at this position It is possible that an object is live but not visible. It is not possible that an object is visible but not live. Local variables Variables defined inside a function The scope of these variables is only within this function The life of these variables ends when this function completes So when we call the function again, storage for variables is created and values are reinitialized. Static local variables - If we want the value to be extent throughout the life of a program, we can define the local variable as "static." Initialization is performed only at the first call and data is retained between func calls.

14 Life and Scope of an Object Global variables Variables defined outside a function The scope of these variables is throughout the entire program The life of these variables ends when the program completes Static variables Static variables are local in scope to their module in which they are defined, but life is throughout the program. Static local variables: static variables inside a function cannot be called from outside the function (because it's not in scope) but is alive and exists in memory. Static variables: if a static variable is defined in a global space (say at beginning of file) then this variable will be accessible only in this file (file scope) If you have a global variable and you are distributing your files as a library and you want others not to access your global variable, you may make it static by just prefixing keyword static

15 Symbols Three kinds of linker symbols Global symbols that are defined by module m and that can be referenced by other modules. Nonstatic C functions and nonstatic global variables Functions or global variables that are defined without the C static attribute. Global symbols that are referenced by module m, but defined by other module These are called externals. Local symbols that are defined and referenced exclusively by module m Static C functions and static variables Local procedure variables that are defined with C static attribute are not managed on the stack. Instead, the compiler allocates space in.data or in.bss Local linker symbol ≠ local program variable

16 ELF Symbol Table typedef struct { int name; /* string table offset */ int value; /* section offset, or VM address */ int size; /* object size in bytes */ char type:4, /* data, func, section, or src file name (4 bits) */ binding:4; /* local or global (4 bits) */ char reserved; /* unused */ char section; /* section header index, ABS, UNDEF, */ /* or COMMON */ } Elf_Symbol; Num:ValueSizeTypeBindOtNdxName 8:08OBJECTGLOBAL03buf 9:017FUNCGLOBAL01main 10:00NOTYPEGLOBAL0UNDswap Three entries in the symbol table for main.o

17 Strong and Weak Symbols Symbol resolution The linker associates each reference with exactly one symbol definition References to local symbols defined in the same module is straightforward Resolving references to global variables is trickier The same symbol might be defined by multiple object files Program symbols are either strong or weak strong: procedures and initialized globals weak: uninitialized globals At compile time, the compiler exports each global symbol to the assembler as either strong or weak, and the assembler encodes this information implicitly in the symbol table of the relocatable object file int foo=5; p1() { } int foo; p2() { } p1.cp2.c strong weak strong

18 Linker’s Symbol Resolution Rules Rule 1. A strong symbol can only appear once. Rule 2. A weak symbol can be overridden by a strong symbol of the same name. references to the weak symbol resolve to the strong symbol. Rule 3. If there are multiple weak symbols, the linker can pick an arbitrary one.

19 Linker Puzzles int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} int x=7; int y=5; p1() {} double x; p2() {} int x=7; p1() {} int x; p2() {} int x; p1() {} Link time error: two strong symbols ( p1 ) References to x will refer to the same uninitialized int. Is this what you really want? Writes to x in p2 might overwrite y ! Evil! Writes to x in p2 will overwrite y ! Nasty! Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. References to x will refer to the same initialized variable.

20 Relocation After the symbol resolution phase Once the symbol resolution step has been completed, the linker associates each symbol reference in the code with exactly one symbol definition Linker knows the exact sizes of the code and data sections in each object module Relocation consists of 2 steps Relocating sections and symbol definitions Merges all sections of the same type into a new aggregate section.data sections from all the input modules are merged into a single.data section Assign run-time memory addresses to the new aggregate sections and to each symbol defined internally in each module Relocating symbol references within sections Modify external references so that they point to the correct run-time addresses To perform this step, the linker relies on relocation entries in the relocatable object modules

21 Example C Program int e=7; int main() { int r = a(); exit(0); } m.ca.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; }

22 Relocating Sections main() m.o int *ep = &e a() a.o int e = 7 headers main() a() 0 system code int *ep = &e int e = 7 system data more system code int x = 15 int y system data int x = 15 Relocatable Object Files Executable Object File.text.data.text.data.text.data.bss.symtab.debug.data uninitialized data.bss system code

23 Resolving External References Symbols are lexical entities that name functions and variables. Each symbol has a value (typically a memory address). Code consists of symbol definitions and references. References can be either local or external. int e=7; int main() { int r = a(); exit(0); } m.ca.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } Def of local symbol e Ref to external symbol exit (defined in libc.so ) Ref to external symbol e Def of local symbol ep Defs of local symbols x and y Refs of local symbols ep,x,y Def of local symbol a Ref to external symbol a

24 m.o Relocation Info Disassembly of section.text: 00000000 : 0: 55 pushl %ebp 1: 89 e5 movl %esp,%ebp 3: e8 fc ff ff ff call 4 4: R_386_PC32 a 8: 6a 00 pushl $0x0 a: e8 fc ff ff ff call b b: R_386_PC32 exit f: 90 nop Disassembly of section.data: 00000000 : 0: 07 00 00 00 source: objdump int e=7; int main() { int r = a(); exit(0); } m.c

25 a.o Relocation Info (.text ) a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } Disassembly of section.text: 00000000 : 0: 55 pushl %ebp 1: 8b 15 00 00 00 movl 0x0,%edx 6: 00 3: R_386_32 ep 7: a1 00 00 00 00 movl 0x0,%eax 8: R_386_32 x c: 89 e5 movl %esp,%ebp e: 03 02 addl (%edx),%eax 10: 89 ec movl %ebp,%esp 12: 03 05 00 00 00 addl 0x0,%eax 17: 00 14: R_386_32 y 18: 5d popl %ebp 19: c3 ret

26 a.o Relocation Info (. data ) a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } Disassembly of section.data: 00000000 : 0: 00 00 00 00 0: R_386_32 e 00000004 : 4: 0f 00 00 00

27 Executable After Relocation (. text ) 08048530 : 8048530: 55 pushl %ebp 8048531: 89 e5 movl %esp,%ebp 8048533: e8 08 00 00 00 call 8048540 8048538: 6a 00 pushl $0x0 804853a: e8 35 ff ff ff call 8048474 804853f: 90 nop 08048540 : 8048540: 55 pushl %ebp 8048541: 8b 15 1c a0 04 movl 0x804a01c,%edx 8048546: 08 8048547: a1 20 a0 04 08 movl 0x804a020,%eax 804854c: 89 e5 movl %esp,%ebp 804854e: 03 02 addl (%edx),%eax 8048550: 89 ec movl %ebp,%esp 8048552: 03 05 d0 a3 04 addl 0x804a3d0,%eax 8048557: 08 8048558: 5d popl %ebp 8048559: c3 ret

28 Executable After Relocation (. data ) Disassembly of section.data: 0804a018 : 804a018: 07 00 00 00 0804a01c : 804a01c: 18 a0 04 08 0804a020 : 804a020: 0f 00 00 00 int e=7; int main() { int r = a(); exit(0); } m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; }

29 Linking with Static Libraries Static library A package of related object modules, which can be supplied as input to the linker at compile time The linker copies only the object modules in the library that are actually referenced by the program Examples libc.a: ANSI C standard C library that includes printf, scanf, strcpy, etc. libm.a: ANSI C math library Instead of unix> gcc main.c /usr/lib/printf.o /usr/lib/scanf.o …Do unix> gcc main.c /usr/lib/libc.a To create a library, use AR tool unix> gcc –c addvec.c multvec.c unix> ar rcs libvector.a addvec.o multvet.o

30 Linking with Static Libraries Translators ( cpp, cc1, as ) main2.c main2.o libc.a Linker ( ld ) p2 printf.o and any other modules called by printf.o libvector.a addvec.o Static libraries Source files Relocatable object files Fully linked executable object file vector.h

31 Packaging Commonly Used Functions How to package functions commonly used by programmers? Math, I/O, memory management, string manipulation, etc. Awkward, given the linker framework so far: Option 1: Put all functions in a single source file Programmers link big object file into their programs Space and time inefficient Each executable object file occupies disk space as well as memory space Any change in one function would require the recompilation of the entire source file Option 2: Put each function in a separate source file Programmers explicitly link appropriate binaries into their programs More efficient, but burdensome on the programmer Solution: static libraries (. a archive files) Concatenate related relocatable object files into a single file with an index (called an archive). Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. If an archive member file resolves reference, link the member file into executable.

32 Static Libraries (archives) Translator p1.c p1.o Translator p2.c p2.olibc.a static library (archive) of relocatable object files concatenated into one file. executable object file (only contains code and data for libc functions that are actually called from p1.c and p2.c ) Further improves modularity and efficiency by packaging commonly used functions [e.g., C standard library ( libc ), math library ( libm )] Linker selects only the.o files in the archive that are actually needed by the program. Linker (ld) p

33 Creating Static Libraries Translator atoi.c atoi.o Translator printf.c printf.o libc.a Archiver (ar)... Translator random.c random.o ar rs libc.a \ atoi.o printf.o … random.o Archiver allows incremental updates: Recompile function that changes and replace.o file in archive. C standard library

34 Commonly Used Libraries libc.a (the C standard library) 8 MB archive of 900 object files. Standard I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math libm.a (the C math library) 1 MB archive of 226 object files. floating point math (sin, cos, tan, log, exp, sqrt, …) % ar -t /usr/lib/libc.a | sort … fork.o … fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o … % ar -t /usr/lib/libm.a | sort … e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o …

35 Using Static Libraries Linker’s algorithm for resolving external references: Scan.o files and.a files in the command line order. During the scan, keep a list of the current unresolved references. As each new.o or.a file obj is encountered, try to resolve each unresolved reference in the list against the symbols in obj. If there exist any entries in the unresolved list at end of scan, then error.Problem: Command line order matters! Moral: put libraries at the end of the command line. bass> gcc -L. libtest.o –lmine.a bass> gcc -L. –lmine.a libtest.o libtest.o: In function `main': libtest.o(.text+0x4): undefined reference to `libfun'

36 Executable Object File After the linking with static libraries The input C program (in ASCII text) file has been transformed into a single binary file that contains all of the information needed to load the program into memory and run. The format of an executable object file Similar to the format of a relocatable object file, except the followings ELF header includes program’s entry point, which is the address of the 1 st instruction.init section defines a small function called _init that will be called by the program’s initialization code No relocation information Segment header table describes Mapping between the segments in the executable object files and the memory segments in the virtual address space Read/write/executable permission (alignment information between segments)

37 Executable Object File.data.symtab.debug 0.rodata.bss ELF header Describes object file sections.strtab Section header table.line Segment header table.text.init Read-only memory segment (code segment) Read/write memory segment (data segment) Symbol table and debugging info are not loaded into memory

38 Loading Executable Binaries ELF header Program header table (required for executables).text section.data section.bss section.symtab.rel.text.rel.data.debug Section header table (required for relocatables) 0.text segment (r/o).data segment (initialized r/w).bss segment (uninitialized r/w) Executable object file for example program p Process image init and shared lib segments

39 Kernel virtual memory Memory-mapped region for shared libraries Run-time heap (created by malloc ) User stack (created at runtime) Unused 0 %esp (stack pointer) Memory invisible to user code brk 0xc0000000 0x08048000 0x40000000 Read/write segment (. data,. bss ) Read-only segment (.init,. text,.rodata ) Loaded from the executable file Linux Run-time Memory Image

40 Startup Routines for C program When the loader (execve) runs It creates the memory image by copying chunks of the executable object files into the code and data segments (guided by the segment header table) The loader jumps to the program’s entry point, Which is the address of the _start symbol. The startup code at the _start address is defined in the object file crt1.o 0x080480c0 :/* entry point */ call __libc_init_first/* startup code in.text */ call _init/* startup code in.init */ Initialization routines from.text and.init sections call atexit /* startup code in.text */ Appends a list of routines that should be called when the application calls the exit function call main/* application code */ exit function runs the functions registered by atexit and returns control to OS by calling _exit call _exit/* return control to OS */

41 Shared Libraries Static libraries have the following disadvantages: Potential for duplicating lots of common code in the executable files on a file system. Every C program needs the standard C library Potential for duplicating lots of code in the text segment of each process. Minor bug fixes of system libraries require each application to explicitly relink. Solution: Shared libraries Whose members are dynamically loaded and linked at run-time. Called shared objects (.so) in UNIX or dynamic link libraries (DLL) in Windows Shared library routines can be shared by multiple processes. There is exactly one.so file for a particular library in any given file system. The code and data in this.so file are shared by all the executable object files A single copy of the.text section of a shared library in memory can be shared by multiple processes Dynamic linking can occur when executable is first loaded and run. Common case for Linux, handled automatically by ld-linux.so. Dynamic linking can also occur after program has begun. An application can request the dynamic linker to load and link shared libraries. In Linux, this is done explicitly by user with dlopen().

42 Dynamic Linking with Shared Library Translators ( cpp, cc1, as ) main2.c main2.o libc.so libvector.so Linker ( ld ) p2 Dynamic linker ( ld-linux.so ) Relocation and symbol table info libc.so libvector.so Code and data Partially linked executable object file (on disk) Relocatable object file Fully linked executable in memory vector.h Loader ( execve ) gcc –shared –fPIC –o libvector.so addvec.c multvec.c gcc –o p2 main2.c./libvector.so loader loads and runs the dynamic linker, and then passes control to the application

43 The Complete Picture Translator m.c m.o Translator a.c a.o libc.so Static Linker (ld) p Loader/Dynamic Linker (ld-linux.so) libwhatever.a p’ libm.so


Download ppt "Computer System Chapter 7. Linking Lynn Choi Korea University."

Similar presentations


Ads by Google