Chapter 13 : Symbol Management in Linking

Slides:



Advertisements
Similar presentations
Programs in Memory Bryce Boe 2012/08/29 CS32, Summer 2012 B.
Advertisements

CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Symbol Table.
C Language.
Part IV: Memory Management
Chapter 10 Linking and Loading. Separate assembly creates “.mob” files.
The Assembly Language Level
Assembler/Linker/Loader Mooly Sagiv html:// Chapter 4.3 J. Levine: Linkers & Loaders
Linking & Loading CS-502 Operating Systems
Chapter 3 Loaders and Linkers
Loaders and Linkers CS 230 이준원. 2 Overview assembler –generates an object code in a predefined format »COFF (common object file format) »ELF (executable.
Lecture 10: Linking and loading. Lecture 10 / Page 2AE4B33OSS 2011 Contents Linker vs. loader Linking the executable Libraries Loading executable ELF.
Linkers and Loaders 1 Linkers & Loaders – A Programmers Perspective.
1 Starting a Program The 4 stages that take a C++ program (or any high-level programming language) and execute it in internal memory are: Compiler - C++
George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
Memory Management (II)
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
Guide To UNIX Using Linux Third Edition
Chapter 3: Introduction to C Programming Language C development environment A simple program example Characters and tokens Structure of a C program –comment.
Review of C++ Programming Part II Sheng-Fang Huang.
UNIX ELF File Format. Elf File Format The a.out format served the Unix community well for over 10 years. However, to better support cross-compilation,
A Simple Two-Pass Assembler
MIPS coding. SPIM Some links can be found such as:
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
C Programming Tutorial – Part I CS Introduction to Operating Systems.
COMPILERS Semantic Analysis hussein suleman uct csc3005h 2006.
A Case Study on UNIX a.out File Format. a.out Object File Format A.out is an object/executable file format used on UNIX machines. –Think about why the.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS212: Object Oriented Analysis and Design Lecture 9: Function Overloading in C++
Topic 2d High-Level languages and Systems Software
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
ADTs and C++ Classes Classes and Members Constructors The header file and the implementation file Classes and Parameters Operator Overloading.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /29/2013 Lecture 13: Compile-Link-Load Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
Linking Ⅱ.
CS412/413 Introduction to Compilers and Translators April 14, 1999 Lecture 29: Linking and loading.
1 CS503: Operating Systems Spring 2014 Part 0: Program Structure Dongyan Xu Department of Computer Science Purdue University.
 2007 Pearson Education, Inc. All rights reserved C Arrays.
CSC 8505 Compiler Construction Runtime Environments.
Different Types of Libraries
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
Loader and Linker.
EEL 3801 C++ as an Enhancement of C. EEL 3801 – Lotzi Bölöni Comments  Can be done with // at the start of the commented line.  The end-of-line terminates.
CSc 453 Linking and Loading
CS252: Systems Programming Ninghui Li Based on Slides by Gustavo Rodriguez-Rivera Topic 2: Program Structure and Using GDB.
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
Hello world !!! ASCII representation of hello.c.
1 Compiler Construction Run-time Environments,. 2 Run-Time Environments (Chapter 7) Continued: Access to No-local Names.
Program Execution in Linux David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
Lecture 3 Translation.
Assemblers, linkers, loaders
Computer Architecture & Operations I
Slides adapted from Bryant and O’Hallaron
The University of Adelaide, School of Computer Science
Linking & Loading.
Chapter7 Structure & C++
Program Execution in Linux
CS-3013 Operating Systems C-term 2008
Topic 2e High-Level languages and Systems Software
Machine Independent Features
Scope (visibility) scope in C is simple:
Classes and Objects.
Linking & Loading CS-502 Operating Systems
A Simple Two-Pass Assembler
Program Execution in Linux
10/6: Lecture Topics C Brainteaser More on Procedure Call
Linking & Loading CS-502 Operating Systems
Program Assembly.
CSE 542: Operating Systems
SPL – PS1 Introduction to C++.
Presentation transcript:

Chapter 13 : Symbol Management in Linking

Binding and Name Resolution Linkers handle various kinds of symbols. They handle symbolic references from one module to another. Each module includes a symbol table. The table includes: Global symbols defined and referenced in the module. Global symbols referenced but not defined in the module (called externals) Segment names. Non-global symbol, usually for debugger and crash dump analysis Line number information, to tell source language debuggers the correspondence between source lines and object code.

Linking Process The linker reads all of the symbol tables in the input modules and extracts the useful information. Then it builds the link-time symbol tables and uses those to guide the linking process. Depending on the output file format, the linker may place some or all of the symbol information in the output file. Some formats have multiple symbol tables per file. For example, ELF shared libraries can have one symbol table for dynamic linker, another for debugging, and another for relinking.

Symbol Table Format Within the linker, a symbol table is often kept as an array of table entries. Usually a hash function is used to quickly locate entries. We use the computed hash value modulo the number of buckets to select one of the hash buckets. Then we run down the chain of symbols looking for the symbol. We need to use full hash values to compare. If full hash values are the same, then we need to compare the symbol names.

Hashed Symbol Table

Module Table The linker needs to track every input module (a.out) seen during a linking run. The information is stored in a module table. Because most of the key information for each a.out file is in the file header, the table primarily contains a copy of each a.out file’s header. The table also contains pointers to in-memory copies of the symbol table string table and relocation table. During the first pass, the linker reads in the symbol table from each input file and copies it into an in-memory buffer.

Build the Global Symbol Table The linker keeps a global symbol table with an entry for every symbol referenced or defined in any input file. The linker links each entry from the file to its corresponding global symbol table entry. (see next page) Relocation items in a module generally refer to symbols by index in the modules’ own symbol table. External symbols need to be resolved.

Resolving Symbols

Symbol Resolution During the second pass, the linker resolves symbol references. After symbol resolutions, relocation will be performed. This is because in most object file format, relocation entries identify the program references to the symbol. If the output file uses absolute addresses, the address of the symbol simply replaces the symbol reference. If the output file is relocatable, suppose that a symbol is resolved to offset 426 in the data section, the output file has to contain a relocatable reference to data+426.

Special Symbols Many systems use a few special symbol defined by the linker itself. UNIX systems all require that the linker define etext, edata, end as the end of the text, data, and bss segments, respectively. The C library sbrk() uses end as the address of the beginning of the run-time heap. For programs with constructor and destructor routines, many linkers create tables of pointers to the routines from each input file, which a linker created symbol like __CTOR_LIST. The language startup stub then uses this special symbol to find the list and call all the routines.

Special Symbol Example 0804957c l O .dtors 00000000 __DTOR_LIST__ 08049574 l O .ctors 00000000 __CTOR_LIST__ 00000000 l df *ABS* 00000000 crtstuff.c 08048520 l .text 00000000 gcc2_compiled. 08048520 l F .text 00000000 __do_global_ctors_aux 08049578 l O .ctors 00000000 __CTOR_END__ 08048548 l F .text 00000000 init_dummy 08049570 l O .data 00000000 force_to_data 08049580 l O .dtors 00000000 __DTOR_END__ 08049630 g O *ABS* 00000000 _end 08048550 g O *ABS* 00000000 _etext 0804960c g O *ABS* 00000000 _edata

Name Mangling The names used in object file symbol tables and in linking are often not the same names that are used in the source programs from which the object files were compiled. There are three reasons for this: Avoiding name collisions Name overloading Type checking The process of turning the source program names into the object file names is called “name mangling.” It is often done in C and C++.

C Name Mangling In older object file formats, compilers or assemblers use names from the source program directly as the names in the object file. This may result in collisions with names reserved by assemblers/compilers or libraries. Programmers do not know what names can be used and what names cannot be used. This is a real problem.

Method 1 The first method is to mangle the names of C procedures or variables so that they would not inadvertently collide with names of libraries and other routines. C procedure names were modified with a leading underscore, so that main becomes _main. This method works reasonably well. However, this method is no longer used. The output of the following example is derived from Sun OS 5.7.

Method 1 Example 3:05pm ccsun2:test/> cat p1.c int xx, yy; foo1() {} foo2() main() { printf("xx %d yy %d\n", xx, yy); }

Method 1 Example 000022b8 g .text 0c1c 00 05 _main 00002290 g .text 0beb 00 05 _foo1 0000229c g .text 0bec 00 05 _foo2 000040d0 g .bss 02e4 00 09 _xx 000040d4 g .bss 02e7 00 09 _yy

Method 2 The second method is that assemblers and linkers permit characters in symbols that are forbidden in C and C++. For example, the “.” and “$” characters. Rather than mangling names from C programs, the run-time libraries use names with forbidden characters that cannot collide with application program names. This method now is used in most recent operating systems. The output of the following example is derived from FreeBSD 4.2.

Method 2 Example 080484f0 g F .text 00000005 foo2 080484e8 g F .text 00000005 foo1 08049628 g O .bss 00000004 xx 080484f8 g F .text 00000024 main 080483ac F *UND* 00000031 printf 0804962c g O .bss 00000004 yy

C++ Type Encoding Another use of mangled names is to encode scope and type information. This makes it possible to use existing linkers to link programs in C++. In C++, the programmer can define many functions and variables with the same name but different scopes and, for functions, different arguments types. For example:

C++ Type Encoding A single program can have a global variable V and a static member of a class C::V. C++ permits function name overloading , with several functions having the same name but different arguments, such as f (int x) and f (float x). C++ was initially implemented as a translator called cfront that produced C code and used an existing link. The author used name mangling to produce names that can sneak through the C compiler into the linker. All the linker had to do with them was its usual job of matching identically named global names.

C++ Type Encoding Data variable names outside of C++ classes do not get mangled at all. Function names not associated with classes are mangled to encode the types of the arguments by appending __F and a string of letters that represent the argument types and type modifiers. For example, a function func(float, int, unsigned char) becomes func_FfiUc. Class names are considered types and are encoded as the length of the class name followed by the name, such as 4Pair.

C++ Type Encoding Class member functions are encoded as the function name, two underscores, the encoded class name, then F and the arguments. So, cl::fn (void) becomes fn__2clFv. Special functions including constructor, destructor, new, and delete have encodings as well: __ct, __dt, __nw, and _dl, respectively. A constructor for class pair that takes two character pointer arguments Pair(char *, char *) becomes __ct__4PairFPcPc. Name mangling thus does the job of giving unique names to every possible C++ object.

Link-Time Checking Case1: Most languages have procedures with declared argument types. If the caller does not pass the number and types of arguments that the callee expects, it is an error. Case 2: If a variable is defined in file1 but used in file2, For linker type checking, each defined and undefined global symbol has associated with it a string representing the argument and return types, similar to the mangle C++ argument types. When the linker resolves a symbol, it compares the type strings for the reference and definition of the symbol and thus can reports an error if they do not match.