Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’ for the program-segments in an ELF executable file
Background Recall our previous in-class exercise: we wrote a demo-program that could execute a Linux application (named ‘hello’) A working version of that demo is now on our class website (named ‘tryexec.s’) That demo simulated ‘loading’ of the.text and.data program-segments, by copying the ‘hello’ file’s memory-image into two distinct locations in extended memory
Memory-to-memory copying We used the Pentium’s ‘movsb’ instruction to perform those two copying operations The number of bytes we copied was equal to the size of five disk-sectors (5 * 512) To ‘load’ the ‘.text’ program-segment, we copied from 0x to 0x To ‘load’ the ‘.data’ program-segment, we copied from 0x to 0x
Copying to extended memory The ‘movsb’ instruction is an example of a ‘complex’ instruction – it requires setup of several CPU registers prior to its execution Setup required for ‘movsb’ involves: –Setup DS : ESI to address the source buffer –Setup ES : EDI to address the dest’n buffer –Setup ECX with the number of bytes to copy –Clear the DF-bit in the EFLAGS register Then ‘rep movsb’ perform the string-copying Note that 32-bit addressing is required here!
Example assembly code ; Source-statements to ‘load’ the ‘.text’ program-segment: USE32; assemble for 32-bit code-seg mov ax, #sel_fs; selector for 4GB data-segment mov ds, ax; with base-address=0x mov es, ax; is used for both DS and ES mov esi, #0x ; offset-address for ‘source’ mov edi, #0x ; offset-address for ‘dest’n’ mov ecx, #2560; number of bytes to be copied cld; use ‘forward’ string-copying rep; ‘repeat-prefix’ is inserted movsb; before the ‘movsb’ opcode
Segments were ‘preloaded’ In our ‘tryexec.s’ demo, ‘.text’ and ‘.data’ segments were initialized in advance of transferring control to the ‘hello’ program That technique is called ‘preloading’ But the Pentium supports an alternative approach to program-loading (it’s called ‘load-on-demand’) Segments remain ‘uninitialized’ until they are actually accessed by the application
Segment-Not-Present The ‘Segment-Not-Present’ exception can be utilized to implement ‘demand-loading’ Segment-descriptors are initially marked as ‘Not Present’ (i.e., the P-bit is zero) When any instruction attempts to access these memory-segments (by moving the segment-selector into a segment-register), the CPU will generate an interrupt (int-11)
The Fault-Handler The interrupt service routine for INT-0x0B (Segment-Not-Present Fault) can perform the initialization of the specified memory region (i.e., the ‘loading’ operation), mark the segment-descriptor as ‘Present’ and then ‘retry’ the instrtuction that triggered the fault (by executing an ‘iret’ or ‘iretd’)
Error-Code Format EXTEXT IDTIDT reserved table-index TITI Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D
Benefits of deferred loading? With a small-size program (like ‘hello’) we might not see much benefit from using the ‘load-on-demand’ mechanism, since both of the program-segments sooner-or-later would have to be ‘loaded’ into memory The only apparent benefit is that copying can be done by ONE program-fragment (i.e., within the fault-handler) instead of by two fragments in the ‘pre-load’ procedure
Table-driven ‘handler’ Balanced against the fewer instructions required with ‘load-on-demand’ is the need to provide a table-driven interrupt-handler that can ‘load’ whichever ‘not present’ program-segments happen to get accessed A very simple implementation for such a handler could use a table like this one: memmap: ; from to count type.LONG 0x11800, 0x , 2560, 0xFA.LONG 0x11800, 0x , 2560, 0xF2
Big/Complex programs With complex applications that use many more program-segments, ‘demand-loading’ could potentially offer some runtime efficiencies For example, with interactive programs that can display various error-messages: If error-handling routines are in separate program-segments, then those segments would not need to be loaded unless -- and until -- the error-condition actually occurs (maybe never)
In-class exercise To get practical ‘hands on’ experience with implementing the demand-loading concept we propose the following exercise Modify the ‘tryexec.s’ demo (see website) by deferring the memory-to-memory copy operations until the program-segments are actually referenced by the ‘hello’ program Then perform the copying within an ISR
Some exercise details Copy the ‘tryexec.s’ demo-program to a new file, named ‘ondemand.s’ In the ‘load_and_exec_demo’ procedure, comment out the two memory-to-memory copy operations, and the mark the LDT segment-descriptors for.text and.data as ‘NOT PRESENT’ segments (i.e., P=0) Create a ‘memmap’ table that describes the copying operations that will be needed
Create a fault-handler Add an interrupt-gate for exception 0x0B and a fault-handler that will perform the copy-operation for a ‘not-present’ segment Remember that the CPU will automatically push an error-code onto the ring0 stack if a ‘segment- not-present exception occurs Don’t forget to discard that error-code as the final step before exiting from the ISR: add esp, #4 ; discard error-code iretd; retry the instruction
0x00 0x08 0x10 Parallel table-entries 0x00CF7A000000FFFF 0x00CF FFFF theLDT From 0x11800 Type 0xFA To 0x Size 2560 From 0x11800 Type 0xF2 To 0x Size 2560 From 0 Type 0xF2 To 0 Size 0 memmap 0x00 0x10 0x20 4-words4-longwords