Студенческая лаборатория МФТИ-Intel Assembler, Linker, MIPS Simulator Andrey Rodchenko
2 Assembly Language machine code HW dependent low-level programming language HW dependent high-level programming language mnemonic opcode labels and symbols addresses and constants macros sequence of instructions data sections, assembler directives mnemonic opcode labels and symbols addresses and constants macros sequence of instructions data sections, assembler directives advanced control structures functions declarations and invocations abstract data types OOP advanced control structures functions declarations and invocations abstract data types OOP encoded instruction set
3 Assemblers Variety And Usage multi-target / single target usage multi: GAS : i386, x86-64, PowerPC, ARC, ARM, VAX single: NASM : i386, x86-64 multi: GAS : i386, x86-64, PowerPC, ARC, ARM, VAX single: NASM : i386, x direct interaction with the hardware (device drivers, interrupt handlers, compilers, OSs) - specific instructions not implemented in a compiler - inline assembler in high level languages - self-modifying code - boot loaders, viruses, etc - direct interaction with the hardware (device drivers, interrupt handlers, compilers, OSs) - specific instructions not implemented in a compiler - inline assembler in high level languages - self-modifying code - boot loaders, viruses, etc
4 MIPS assembler basic structures directives labels comments instructions and pseudo-instructions.data - beginning of data segment.text- beginning of code segment.data - beginning of data segment.text- beginning of code segment main: # comment lw $t0, item
5 MIPS assembler syntax directives.align n.ascii str.asciiz str.byte b1,..., bn.half h1,..., hn.word w1,..., wn.data.extern sym size.float f1,..., fn.double d1,..., dn.globl sym.kdata.ktext.space n.text.align n.ascii str.asciiz str.byte b1,..., bn.half h1,..., hn.word w1,..., wn.data.extern sym size.float f1,..., fn.double d1,..., dn.globl sym.kdata.ktext.space n.text
6 Assembler phases 1 st pass – pseudo instructions replacement – symbol table creation – machine code generation and relocations creation 2 nd pass – complete address-related machine code using relocations sd $a0, 32($sp) =>sw $a0, 32($sp) =>sw $a1, 36($sp) LABELADDRESS lbl_u10x10 j lbl_u1=> 0x4: (lbl_u1_relocation:26b) j lbl_u1=> 0x4: ( ) ADDRLABELTYPE 0x4lbl_u1J displacement
7 Object File contains relocatable format machine code – header (descriptive and control information) – text segment (executable code) – data segment (static initialized data) – bss segment (uninitialized data) – relocation information – stack unwinding information – program symbols – debugging information different formats – ELF (executable and linkable fromat) – COFF (common object file format) – PE (portable executable)
8 Linking combines several objects files into single executable – symbol resolution – sections creation – relocations types of linking – static – dynamic linker script – controls how sections are merged and where they are placed
9 Symbols resolution symbol types – definitions (D) strong – several strong definitions are not allowed weak – may be overridden by other symbol definitions – Externals (U) for each object file (o) from left to right – if ‘o’ is not a library O = O U ‘o’ adds object file to set O U = U U ‘u’ – ‘d’ ∩ (U U ‘u’) update undefined set U D = D U ‘d’ add ‘d ‘ to the D set checking that it has not been strongly defined – if ‘o’ is a library if (‘d’ ∩ U) = ‘e’ != 0 the same as above ‘o’ = ‘e’, ‘u’= 0, ‘d’ = ‘e’ when all arguments are passed if U == 0 than linking is successful int i = 1; int i;
10 Relocation And Dynamic Linking relocating sections and symbol definitions – all sections with the same name are merged relocating symbol references – all references are updated by addresses of objects in merged sections dynamic linker (loaded as shared library itself) – dynamic code must be position-independent – start-up code, mapping shared libraries to program’s address space – lazy linking can improve overall application performance if ‘potentially unnecessary’ references are numerous loads library code segment into memory demand-driven