Presentation is loading. Please wait.

Presentation is loading. Please wait.

OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest.

Similar presentations


Presentation on theme: "OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest."— Presentation transcript:

1 OBJECT MODULE FORMATS

2 The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest forms, but all the subsequent formats contain the basic elements that are present in OMF

3 Here is a depiction of the main formats that followed pe/coff+ mach-o for Mac osx10.6 pe/coff elf coff mach-o omf a.out

4 All of them contain separate sections for data, code, and relocation information (i.e. fixups). All of them, incidentally, were designed by committees with the objective of making them machine and language indepedent to varying degrees. So the committees included a wealth of fields that they thought might possibly be helpful, but which are in fact never used in practice.

5 So why didn’t we pick on one of these later formats to employ for our Project 4? It just would not have been possible to do this in a one-semester compiler course. Even in a two-semester course, the amount of extra detail required would be out of proportion to the gain in education value.

6 OMF was devised by Intel and at roughly the same time period, AT&T released A.OUT for use with Unix systems.

7 In order to provide for debugging information and shared libraries, COFF (common object file format) was released by AT&T together with the introduction of Unix System V.

8 The object module formats in use today by Linux, Unix, and Microsoft, are basically variants of COFF

9 COFF supported symbolic debugging by in effect including a symbol table which specified not only the offset of variables, but also the offset of code corresponding to the line number of the source - so as to aid e.g. in the setting of breakpoints.

10 Limitations of COFF include: It places a limit on section names (which correspond to our segment names) and on the number of sections allowed, and its symbolic debugging information is insufficient for supporting some of the features of languages such as C++.

11 In response, AT&T released ELF, a minor variant of COFF with the introduction of System V, version 4.

12 Microsoft created its own version of COFF. For the sake of concreteness let’s examine its main features - as described in the Microsoft document “Microsoft Portable Executable and Common Object File Format Specification”, September 21, 2010.

13 The name of the specification is abbreviated as PE / COFF while the version released to accommodate 64 bit machines is called PE / COFF+.

14 PE is the format of the output of the linker and. loader, in which the various modules that make up the program are linked all external references resolved all relocation (fixups) completed and the image obtained finally written into memory

15 The COFF component of PE / COFF is the format of the object module that serves as input to the linker

16 It closely follows that of the original COFF specification. The main difference is that the Microsoft version does not make use of the debugging facilities supplied by the original COFF such as e.g the line number information It relies on Visual C++ type debug information.

17 As a compiler writer, your responsibility in writing a compiler for Windows is the production of an object module for input to the linker. The PE formatted output of the linker, and the operating system, are the responsibility of Microsoft.

18 MICROSOFT’S COFF FORMAT Here is an illustration of the coff structure

19 SECTIONS The sections correspond to our segments. Except for the segment associated with uninialized data, each segment consists of a header, the raw data, and a relocation component.

20 The.text section is the code section and the relocation information corresponds to our fixups.

21 There are two data sections. One is for initialized data, to e.g contain the initial value of variables, as in: num dw 23 The other data section, called.bss above, is for unitialized data, as in: array2 dw 1000 dup(?)

22 The.bss section consists only of a header that specifies what space is to be involved at execution time. The “named sections”, if present, may be used for purposes such as functions that the program employs. The name of the section would then normally be the same as that of the function.

23 Section Headers. The fields involved in the section headers include: the section name. If the name has 8 characters or less, it is contained in the header, otherwise it is included in the String table (which corresponds to our ID_S), and the name field of the section header then contains a pointer to its offset there. the section’s virtual address (i.e. offset within the object module itself). the sections’s physical address (i.e. the offset from the start of the program that it will have at execution time)

24 the size of the section a pointer to the section’s raw data a pointer to the corresponding relocation entries a specification of whether the section contains executable code, initialized data, or unitialized data a specification of whether the section may or may not be read, written, or executed

25 THE FILE HEADER The fields involved in the file header include: a number identifying the target machine e.g. those employing the 386 or later Pentium, or various machines produced by Hitachi, Mitsubishi, etc. a time and date stamp, indicating when the file was created the number of section headers a pointer to the symbol table’s starting address

26 THE SYMBOL TABLE The symbol table entries are each 18 bytes long, and include: the name of the symbol. The same scheme is employed as described above for section header names, i.e. if the name is longer than 8 bytes it is stored in the string table, and a pointer to it employed instead

27 the section the item is defined in it’s offset within that section it’s storage class, e.g. whether it is external, static, or is a function

28 Some of the entries, such as e.g. those for functions, require more than the 18 bytes an entry provides for its information. In such cases, the main entry for the name is followed by an additional entry (referred to as an auxillary entry).

29 THE STRING TABLE As mentioned, this corresponds to our id_s. It starts off with 4 bytes specifying its length. This is followed by null-terminated strings, in general representing names.

30 Note that the segdef, pubdef, and extdef records we have been using are replaced by entries in the symbol table and the string table.

31 THE PE MODULE FORMAT As mentioned, the compiler writer, in the case where target is not an intermediate language, is concerned with producing the object module input to the linker. He or she is not directly involved with the PE module that the linker produces. Let us however look at the main features of the PE format.

32 Here is a diagram of its structure

33 The components the linker has added to the Coff format are: (a) the DOS stub (b) the optional file header (c) the data directories

34 THE DOS STUB The purpose of the DOS stub is to detect when an attempt is made to execute the program under DOS, and then issue an error message such as: This program can only be run under Windows

35 THE OPTIONAL FILE HEADER The loader needs to be able to relocate the program in the case where it is unable to load it into the base location employed by the linker. Some of the items listed on the next slide are included for this purpose

36 The information the optional file header contains includes: (a)the amount of memory space that will be occupied by executable code, initialized data, and uninialized data (b) the offsets from the beginning of the program where the above items will be located in memory (c) the offset from the beginning of the program of it’s entry point

37 (d) the amount of space needed for the stack (e) the amount of space needed for the heap (f) the alignment of the sections. The default is at an address divisible by 512, but any power of 2 up to 64k can be used. (g) the offsets within the module of the data (h) directories and their sizes.

38 THE DATA DIRECTORIES These include: (a) the Export Table (b) the Import Table (c) the Resource Table (d) the Base Relocation Table

39 The Export Table is employed mainly by DLLs to supply the entry points of the various functions they provide. The Import Table is used by programs to supply the externals references that the linker was unable resolve, usually those to DLL functions. Note that the location of the DLL functions may change between one Load & execute of the program to another.

40 The unresolved calls in the memory image to such external routines are not directly fixed up. They are instead replaced by the linker as calls to a table of external addresses which the loader fills in. The pentium has a call indirect instruction for this purpose.

41 The Resource Table table contains information about resources the program employs, such as dialog boxes, menus, icons, etc. The Base Relocation Table replaces the Coff version, as much of the relocation and linking involved has already be carried out by the linker.

42 SOURCES 1. Microsoft Portable Executable and Common Object File Format Specification, Revision 8.2, Sept. 2010. 2. Application Report spraa08-April 2009, Texas Instruments.


Download ppt "OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest."

Similar presentations


Ads by Google