Program Execution in Linux David Ferry, Chris Gill CSE 422S - Operating Systems Organization Washington University in St. Louis St. Louis, MO 63143
Creating an Executable File //Source code #include <stdio.h> int foo = 20; int main( int argc, char* argv[]){ printf(“Hello, world!\n”); return 0; } Compiler Relocatable Object file: 00000000 D foo 00000000 T main U puts Linker Executable file Two stages: Compilation Linking The compiler translates source code to machine code. The linker connects binary files to libraries to create an executable. CSE 422S – Operating Systems Organization
CSE 422S – Operating Systems Organization The Symbol Table Program binaries have a symbol table that keep track of data and code: Example: The linker must resolve undefined symbols before the program can be run! int foo = 10; int bar = 20; int main( int argc, char* argv[] ){ printf(“Hello, world!\n”); return 0; } CSE 422S – Operating Systems Organization
Static vs. Dynamic Linking Static linking – required code and data is copied into executable at compile time Dynamic linking – required code and data is linked to executable at runtime my_program.o Static: Dynamic: Program code my_program.o libc.so Program code Program Data Library Code Program Data Library Code CSE 422S – Operating Systems Organization
CSE 422S – Operating Systems Organization Parts of a Program Virtual Address Space A program has two components: Data Code Either component may be: static (fixed at compile time) dynamic (linked at run time) The compiler creates static sections as part of a binary. The linker links dynamic sections from other binaries. 0xc000_0000 Stack Memory Map Segment Heap .bss .data .text 0x0000_0000 CSE 422S – Operating Systems Organization
CSE 422S – Operating Systems Organization Program Segmentation Virtual Address Space Static code: .text segment Dynamic code: Memory map segment Static data: .data segment (initialized) Dynamic data: Initialized at runtime: Stack Heap .bss 0xc000_0000 Stack Memory Map Segment Heap .bss .data .text 0x0000_0000 CSE 422S – Operating Systems Organization
Running a Statically Linked Program A statically linked program is entirely self-contained: The loader creates a valid process by loading a binary image into memory On Linux, execve() system call The C runtime initializes the process to execute normal C code Usually called crt0.o CSE 422S – Operating Systems Organization
CSE 422S – Operating Systems Organization The C Runtime Initializes the C stack and heap Sets up argc and argv Calls user-specified program constructors and destructors Does C library intialization CSE 422S – Operating Systems Organization
Running a Statically Linked Program User forks() an existing process to get a new process space execve() reads program into memory Starts executing at _start() in the C runtime, which sets up environment C runtime eventually calls main() After main returns, C runtime does some cleanup CSE 422S – Operating Systems Organization
Running a Dynamically Linked Program Some functions and data do not exist in process space at runtime The dynamic linker (called ld) maps these into the memory map segment on-demand Stack Memory Map Segment Heap .bss .data .text CSE 422S – Operating Systems Organization
CSE 422S – Operating Systems Organization Linking at Runtime At compile time: The linker (ld) is embedded in program Addresses of dynamic functions are replaced with calls to the linker At runtime the linker does lazy-binding: Program runs as normal until it encounters an unresolved function Program jumps to linker Linker maps shared library into address space and replaces the unresolved address with the resolved address CSE 422S – Operating Systems Organization
Runtime Linker Implementation Uses a procedure link table (PLT) to do lazy binding Stack //Source code #include <stdio.h> int foo = 20; int main( int argc, char* argv[]){ printf(“Hello, world!\n”); return 0; } Heap .bss Procedure Link Table (PLT) .data linker_stub() .text CSE 422S – Operating Systems Organization
Runtime Linker Implementation Uses a procedure link table (PLT) to do lazy binding Stack //Source code #include <stdio.h> int foo = 20; int main( int argc, char* argv[]){ printf(“Hello, world!\n”); return 0; } Library with printf() function Heap .bss Procedure Link Table (PLT) .data library printf() .text CSE 422S – Operating Systems Organization
Static vs. Dynamic Linking Does not need to look up libraries at runtime Does not need extra PLT indirection Replicates disk space Dynamic: Less disk space (7K vs 571K for hello world) Shared libraries already in memory and in hot cache Incurs lookup and indirection overheads CSE 422S – Operating Systems Organization
Executable File Format The current binary file format is called ELF - Executable and Linking Format First part of file is the ELF Header, which defines contents of the rest of the file Segments contain data & code needed at runtime Sections contain linking & relocation data Adds additional segments past .text, .data, etc.: .rodata – read-only data .debug – debugging symbol table and more… GCC adds it’s own sections… CSE 422S – Operating Systems Organization
CSE 422S – Operating Systems Organization Binary File Utilities nm – prints symbol table objdump – prints all binary data readelf – prints ELF data pmap – prints memory map of a running process ldd – prints dynamic library dependencies of a binary strip – strips symbol data from a binary CSE 422S – Operating Systems Organization