Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

The ‘do-it-later’ philosophy Modern operating systems often follow a policy of deferring work whenever possible The advantage of adopting this practice is most evident in those cases where it turns out that the work was not needed after all Example: Many programs contain lots of code and data for diagnosing errors – but it’s not needed if no errors actually occur

Avoiding wasted effort Thus it will be more efficient if an OS does not always take time to load those portions of a program (such as its error-diagnostics and error-recovery routines) which may be unnecessary in the majority of situations But of course the OS needs to be ready to take a ‘timeout’ for loading those routines when and if the need becomes apparent

Another example In a multitasking environment, many tasks are taking turns at executing instructions The CPU typically performs task-switching several times every second – and must do a ‘save’ of the outgoing task’s context, and a ‘load’ of the incoming task’s context, any time it switches from one task to the next We ask: can any of this work be deferred?

The NPX registers Only a few tasks typically make any use of the Pentium’s ‘floating-point’ registers, so it’s wasteful to do a ‘save-and-reload’ for these registers with every task-switch The TS-bit (bit #3 in Control Register 0) is designed to assist an OS in implementing a policy of ‘lazy’ context-switching for the set of registers used in floating-point work

Example: effect of TS=1 Each time the CPU performs a task-switch it automatically sets the TS-bit to 1 (only an OS can execute a ‘clts’ to reset TS=0) When any task tries to execute any of the NPX instructions (to do some arithmetic with values in the floating-point registers), an exception 7 fault will occur if the TS-bit hasn’t been cleared since a task-switch

The fault-7 exception-handler The work involved in saving the contents of the floating-point registers being used by a no-longer-active task, and reloading those registers with values that the active task expects to work on, can be deferred to the fault-handler for exception-7 Then it can clear the TS-bit (with ‘clts’) and ‘retry’ the instruction that caused this ‘fault’

The ‘fork()’ system-call In a UNIX/Linux operating system, the way any new task get created is by a call to the kernel’s ‘fork()’ service-function This function is supposed to ‘duplicate’ the entire program-environment of the calling task (i.e., code, data, stack and heap, plus the kernel’s process-control data-structure But much of this work is often wasted!

The ‘fork-and-exec’ senario In practice, the most common reason for a program to ‘fork()’ a child-process is so the child-task can launch a separate program: In these cases the ‘duplicated’ code, data, and heap are not relevant to the new task -- and so they will simply get discarded! if ( fork() == 0 ) execl( “newprog”, newargs, 0 );

‘loading-on-demand’ An OS can avoid all the wasted effort of duplicating a parent-task’s resources (its code, data, heap, etc.) by implementing “only upon demand” loading as a policy For an OS that uses the CPU’s memory- segmentation capabilities, an ‘on demand’ policy can be implemented by using the Pentium ‘Segment-Not-Present’ exception

How it works Segments remain ‘uninitialized’ until they are actually accessed by an application Segment-descriptors are initially marked as ‘Not Present’ (i.e., their P-bit is zero) When any instruction attempts to access such a memory-segment (read, write, or fetch), the CPU responds by generating exception-11: “Segment-Not-Present”

An ‘error-code’ is pushed Besides pushing the memory-address of the faulting instruction onto the exception- handler’s stack, the CPU also pushes an ‘error-code’ to indicate which descriptor was not yet marked as being ‘Present’ The handler can then ‘load’ that segment with the proper information and adjust its descriptor’s P-bit, then retry the instruction

Error-Code Format EXTEXT IDTIDT reserved 31153 2 1 0 table-index TITI Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D

Our ‘simulation’ demo We can illustrate the ‘just-in-time’ idea by writing a program that performs a ‘far’ call to an ‘uninitialized’ region of memory: The code-segment descriptor (referenced here by the selector-value ‘sel_CS’) will be initially marked ‘Not-Present’ (so this ‘lcall’ instruction will trigger an exception-11) lcall$sel_CS, $draw_message

Our ‘fault-handler’ Our Interrupt-Service-Routine for fault-11 will do two things: Initialize the memory-region with code and data Mark the code-segment’s descriptor as ‘Present’ It will carefully preserve the CPU registers, so that it can ‘retry’ the faulting instruction

Where is the ‘error-code’? FLAGS CS IP error-code SS:SP 16-bits Layout of our fault-handler’s stack (because we used a 286 interrupt-gate) +0 +2 +4 +6 The Pentium provides a special pair of instructions that procedures can use to address any parameter-values that reside on its stack: ‘enter’ and ‘leave’

Code using ‘enter’ and ‘leave’ isrNPF:# Our fault-handler for exception-0x0B enter$0, $0# setup stackframe access callinitialize_the_high_arena callmark_segment_as_ready leave# discard the frame access add$2, %sp# discard the error-code iret# ‘retry’ the faulting instruction

What does ‘enter’ do? The effect of the single instruction enter $0, $0 is equivalent to this instruction-sequence: push%bp mov %sp, %bp

How the stack is changed FLAGS CS IP error-code SS:SP 16-bits Layout of our fault-handler’s stack BEFORE executing ‘enter’ +0 +2 +4 +6 FLAGS CS IP error-code SS:SP 16-bits Layout of our fault-handler’s stack AFTER executing ‘enter’ +2 +4 +6 +8 old-BP SS:BP NOTE: Any memory-references that use indirect addressing via register BP will use the SS segment-register by default (not the DS segment-register) for example:testw$0x0007, 2(%bp)

What does ‘leave’ do? The effect of the single instruction leave is equivalent to this instruction-sequence: mov %bp, %sp pop%bp

How the stack is changed FLAGS CS IP error-code SS:SP 16-bits Layout of our fault-handler’s stack BEFORE executing ‘leave’ +2 +4 +6 +8 old-BP SS:BP … other pushed words FLAGS CS IP error-code SS:SP 16-bits Layout of our fault-handler’s stack AFTER executing ‘leave’ +0 +2 +4 +6 So the effect of ‘leave’ is to undo the effect of ‘enter’

Our demo’s memory-layout ARENA #3 (not used by this demo) ARENA #2 (where our demo expects drawing code will reside) ARENA #1 (where the loader puts our program code and data) BOOT_LOCN 0x00000000 0x00007C00 0x00010000 0x00020000 0x00030000 Copy contents of ARENA #1 to ARENA #2

Efficient copying We use the Pentium’s ‘rep movsw’ instruction to perform memory-to-memory copying operations The segment-selector for the segment we copy from (it must be ‘readable’) goes into registers DS, and the segment-selector for the segment we copy to (it must be ‘writable’) goes into ES The number of words we will copy should match the size of our code-segment (which is 64KB) The Direction-Flag should be cleared (DF=0)

Example assembly code cld; use ‘forward’ string-copying mov $sel_ds, %si; selector for arena at 0x10000 mov %si, %ds; goes in segment-register DS xor%si, %si; start copying from offset zero mov $sel_DS, %di; selector for arena at 0x20000 mov%di, %es; goes in segment-register DS xor%di, %di; start copying to offset zero mov $0x8000, %cx; number of words to be copied rep movsw; perform the arena-copying

Segment-Descriptor Format Base[31..24]GD RSVRSV AVLAVL Limit [19..16] P DPLDPL SX C/DC/D R/WR/W ABase[23..16] Base[15..0]Limit[15..0] 6332 31 0 47 The segment-descriptor’s ‘Present’ bit is bit-number 47

In-class exercise To get some practical ‘hands on’ experience with implementing the demand-loading concept we suggest the following exercise: Modify our ‘notready.s’ demo so that it uses a 32-bit Interrupt-Gate for its Segment-Not-Present entry in the Interrupt Descriptor Table (this will affect the layout of the fault-handler’s stack) You may need to abandon use of the ‘enter’ and ‘leave’ instructions unless you also use a 32-bit data-segment descriptor for your stack-segment

Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

Similar presentations

Presentation on theme: "Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

Similar presentations

Presentation on theme: "Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’"— Presentation transcript:

Similar presentations

About project

Feedback