Memory Management.

Memory Management

Basic OS Organization Processor(s) Main Memory Devices
Process, Thread & Resource Manager Memory Manager Device File Operating System Computer Hardware

The Basic Memory Hierarchy
CPU Registers Less Frequently Used Information Primary Memory (Executable Memory) e.g. RAM More Frequently Used Information Secondary Memory e.g. Disk or Tape von Neumann architecture

Memory System Primary memory Secondary memory
Holds programs and data while they are being used by the CPU Referenced by byte; fast access; volatile Secondary memory Collection of storage devices Referenced by block; slow access; nonvolatile

Primary & Secondary Memory
CPU CPU can load/store Ctl Unit executes code from this memory Transient storage Primary Memory (Executable Memory) e.g. RAM Load Secondary Memory e.g. Disk or Tape Access using I/O operations Persistent storage Information can be loaded statically or dynamically

Classical Memory Manager Tasks
Memory management technology has evolved Early multiprogramming systems Resource manager for space-multiplexed primary memory As popularity of multiprogramming grew Provide robust isolation mechanisms Still later Provide mechanisms for shared memory

Contemporary Memory Manager
Performs the classic functions required to manage primary memory Attempts to efficiently use primary memory Keep programs/data in primary memory only while they are being used by CPU Store/restore data in secondary memory soon after it has been used or created Exploits storage hierarchies Virtual memory manager

Requirements on Memory Designs
The primary memory access time must be as small as possible The perceived primary memory must be as large as possible The memory system must be cost effective

Functions of Memory Manager
Allocate primary memory space to processes Map the process address space into the allocated portion of the primary memory Minimize access times using a cost-effective amount of primary memory May use static or dynamic techniques

Memory Manager Only a small number of interface functions provided – usually calls to: Request/release primary memory space Load programs Share blocks of memory Provides following Memory abstraction Allocation/deallocation of memory Memory isolation Memory sharing

Memory Abstraction Process address space
Allows process to use an abstract set of addresses to reference physical primary memory Mapped to object other than memory Process Address Space Hardware Primary Memory

Address Space Program must be brought into memory and placed within a process for it to be executed A program is a file on disk CPU reads instructions from main memory and reads/writes data to main memory Determined by the computer architecture Address binding of instructions and data to memory addresses

Creating an Executable Program
Link Edit Library code Other objects Secondary memory Link time: Combine elements Source code C Load time: Allocate primary memory Adjust addresses in address space (relocation) Copy address space from secondary to primary memory Loader Process address space Primary memory Reloc Object code Compile time: Translate elements

Bindings Compiler Linker
Binds static variables to storage locations relative to start of data segment Binds automatic variables to storage locations relative to bottom of stack Linker Combines data segments and adjusts bindings accordingly Same for stack

Bindings – cont. Loader Binds logical addresses used by program with physical memory locations (address binding) This type of binding is called static address binding The last stage of address binding can be deferred to runtime  dynamic address binding

Dynamic Memory Static and automatic variables are assigned addresses in the data or stack segments at compile time Dynamic memory allocation (e.g., new or malloc) is done at runtime This is not handled by the memory manager This merely binds parts of the process’s address space to dynamic data structures Memory manager gets involved if the process runs out of address space

Variations in program linking/loading

Normal linking and loading

Load-time dynamic linking

Run-time dynamic linking

Data Storage Allocation
Static variables stored in programs data segment Automatic variables Stored on stack Dynamically allocated space (new or malloc) Taken from heap storage – no system call Note: If heap disappears, kernel memory manager invoked to get more memory for the process

C Style Memory Layout Low Address High Address Text Segment
Initialized Part Data Segment Uninitialized Part Data Segment Heap Storage Stack Segment Environment Variables, … High Address

Program and Process Address Spaces
Hardware Primary Memory Absolute Program Address Space User Process Address Space 3 GB Supervisor Process Address Space 4 GB

Overview of Memory Management Techniques
Memory allocation strategies View the process address space and the primary memory as contiguous address space Paging and segmentation based techniques View the process address space and the primary memory as a set of pages / segments Map an address in process space to a memory address Virtual memory Extension of paging/segmentation based techniques To run a program, only the current pages/segments need to in primary memory

Memory Allocation Strategies
- There are two different levels in memory allocation

Two levels of memory management

Memory Management System Calls
In Unix, the system call is brk Increase the amount of memory allocated to a process

Malloc and New functions
They are user-level memory allocation functions, not system calls

Memory Management

Issues in a memory allocation algorithm
Memory layout / organization how to divide the memory into blocks for allocation? Fixed partition method: divide the memory once before any bytes are allocated. Variable partition method: divide it up as you are allocating the memory. Memory allocation select which piece of memory to allocate to a request Memory organization and memory allocation are close related It is a very general problem Variations of this problem occurs in many places. For examples: disk space management

Static Memory Allocation
Operating System Unused In Use Process 3 Process 0 pi Process 2 Issue: Need a mechanism/policy for loading pi’s address space into primary memory Process 1

Fixed-Partition Memory allocation
Statically divide the primary memory into fixed size regions Regions can have different sizes or same sizes A process / request can be allocated to any region that is large enough

Fixed-Partition Memory allocation – cont.
Advantages easy to implement. Good when the sizes for memory requests are known. Disadvantage: cannot handle variable-size requests effectively. Might need to use a large block to satisfy a request for small size. Internal fragmentation – The difference between the request and the allocated region size; Space allocated to a process but is not used It can be significant if the requests vary in size considerably

Fixed-Partition Memory Mechanism
Operating System pi needs ni units Region 0 N0 pi ni Region 1 N1 N2 Region 2 Region 3 N3

Which free block to allocate
How to satisfy a request of size n from a list of free blocks First-fit: Allocate the first hole that is big enough Next-fit: Choose the next block that is large enough Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole.

Fixed-Partition Memory -- Best-Fit
Operating System Loader must adjust every address in the absolute module when placed in memory Region 0 N0 Region 1 N1 Internal Fragmentation pi N2 Region 2 Region 3 N3

Fixed-Partition Memory -- Worst-Fit
Operating System pi Region 0 N0 Region 1 N1 N2 Region 2 Region 3 N3

Fixed-Partition Memory -- First-Fit
Operating System pi Region 0 N0 Region 1 N1 N2 Region 2 Region 3 N3

Fixed-Partition Memory -- Next-Fit
Operating System Region 0 N0 pi Region 1 N1 Pi+1 N2 Region 2 Region 3 N3

Variable partition memory allocation
Grant only the size requested Example: total 512 bytes: allocate(r1, 100), allocate(r2, 200), allocate(r3, 200), free(r2), allocate(r4, 10), free(r1), allocate(r5, 200) External Fragmentation Memory is divided up into small blocks that none of them can be used to satisfy any requests.

Issues in Variable partition memory allocation
Where are the free memory blocks? Keeping trace of the memory blocks List method and bitmap method Which memory blocks to allocate? There may exist multiple free memory blocks that can satisfy a request. Which block to use? Fragmentation must be minimized How to keep track of free and allocated memory blocks?

Variable Partition Memory Mechanism
Operating System Operating System Operating System Process 0 Process 6 Process 2 Process 5 Process 4 Compaction moves program in memory in (d) Operating System Process 0 Process 6 Process 2 Process 5 Process 4 External fragmentation in (c) Process 0 Process 1 Process 2 Process 3 Process 4 Loader adjusts every address in every absolute module when placed in memory

Cost of Moving Programs
Compaction requires that a program be moved load R1, 0x02010 3F013010 3F016010 Program loaded at 0x04000 Must run loader over program again! Program loaded at 0x01000 Consider dynamic techniques

Dynamic Memory Allocation
Could use dynamically allocated memory Process wants to change the size of its address space Smaller  Creates an external fragment Larger  May have to move/relocate the program Allocate “holes” in memory according to Best- /Worst- / First- /Next-fit

Contemporary Memory Allocation
Use some form of variable partitioning Usually allocate memory in fixed-size blocks (pages) Simplifies management of free list Greatly complicates binding problem

Dynamic Address Space Binding
Recall: in static binding Symbols first bound to relative addresses in a relocatable module at compile time Then to addresses in absolute module at link time Then to primary memory addresses at load time Dynamic binding Wait to bind absolute program addresses until run time Simplest mechanism is dynamic relocation Usually implemented by the processor

Dynamic Address Relocation
Performed automagically by processor CPU Relative Address 0x02010 0x12010 + Relocation Register 0x10000 load R1, 0x02010 MAR Program loaded at 0x10000  Relocation Register = 0x10000 Program loaded at 0x04000  Relocation Register = 0x04000 We never have to change the load module addresses!

Dynamic Address Relocation
Same holds for multiple segment registers CPU (Generated address) Relative Address Code register Stack register + Data register MAR Primary memory

Runtime Bound Checking
CPU Relative Address + Relocation Register <  Limit Register Bound checking is inexpensive to add Provides excellent memory protection MAR Interrupt

Memory Mgmt Strategies
Fixed-Partition used only in batch systems Variable-Partition used everywhere (except in virtual memory) Swapping systems Popularized in timesharing Relies on dynamic address relocation Dynamic Loading (Virtual Memory) Exploit the memory hierarchy Paging -- mainstream in contemporary systems Shared-memory multiprocessors

Swapping Special case of dynamic memory allocation
Suppose there is high demand for executable memory Equitable policy might be to time-multiplex processes into the memory (also space-mux) Means that process can have its address space unloaded when it still needs memory Usually only happens when it is blocked

Swapping – cont. Objective
Optimize system performance by removing a process from memory when it is blocked, allowing that memory to be used by other processes Block may be caused by a request for a resource, or by the memory manager Swapping only becomes necessary when processes are being denied access to memory

Swapping – cont. Image for pi Swap pi out Swap pj in Image for pj

Cost of Swapping Need to consider time to copy execution image from primary to secondary memory, and back This is the major part of the swap time In addition, there is the time required by the memory manager, and the usual context switching time

Swapping Systems Standard swapping used in few systems
Requires too much swapping time and provides too little execution time Most systems do use some modified version of swapping In UNIX, swapping is normally disabled, but will be enabled if memory usage reaches some threshold limit; when usage drops below the threshold, swapping is again disabled

Virtual Memory Allows a process to execute when only part of its address space is loaded in primary memory – the rest is in secondary Need to be able to partition the address space into parts that can be loaded into primary memory when needed

Virtual Memory – cont. A characteristic of programs that is very important to the strategy used by virtual memory systems is spatial reference locality Refers to the implicit partitioning of code and data segments due to the functioning of the program (portion for initializing data, another for reading input, others for computation, etc.) Can be used to select which parts of the process should be loaded into primary memory

Virtual Memory Barriers
Must be able to treat the address space in parts that correspond to the various localities that will exist during the programs execution Must be able to load a part anywhere in physical memory and dynamically bind the addresses appropriately More on this in next chapter

Shared-memory Multiprocessors
Several processors share an interconnection network to access a set of shared-memory modules Any CPU can read/write any memory unit CPU . . . Memory Interconnection Network

Shared-memory Multiprocessors – cont.
Goal is to use processes or threads to implement units of computation on different processors while sharing information via common primary memory locations One technique would be to have the address spaces of two processes overlap Another would split the address space of a process into a private part and a public part

Sharing a Portion of the Address Space
Address Space for Process 1 Process 1 Process 2 Primary Memory Address Space for Process 2

Figure 11‑26: Multiple Segments
CPU Executing Process 1 Private to Process 1 Limit Relocation Limit Relocation Shared CPU Executing Process 2 Private to Process 2 Limit Relocation Limit Relocation Primary Memory

Shared-memory Multiprocessors – cont.
A major problem is synchronization How can one process detect when the other process has written or read information Will need to use interprocess communication to handle the synchronization Another problem is overloading the interconnection network Use cache memories to decrease load on network

Virtual Memory

Virtual Memory Manager
Provides abstraction of physical memory Creates virtual address space in secondary memory and then “automatically” determines which part of the address space to load into primary memory at any given time Allows application programmers to think that they have a very large address space in which to write programs

Virtual Memory Organization
Primary Memory Secondary Memory Memory Image for pi

Locality Programs do not access their address space uniformly
they access the same location over and over Spatial locality: processes tend to access location near to location they just accessed because of sequential program execution because data for a function is grouped together Temporal locality: processes tend to access data over and over again because of program loops because data is processed over and over again

Spatial Reference Locality
Address Space for pi Address space is logically partitioned Text, data, stack Initialization, main, error handle Different parts have different reference patterns: 30% 20% 35% 15% <1% Execution time Initialization code (used once) Code for 1 Code for 2 Code for 3 Code for error 1 Code for error 2 Code for error 3 Data & stack

Virtual Memory Every process has code and data locality
Dynamically load/unload currently-used address space fragments as the process executes Uses dynamic address relocation/binding Generalization of base-limit registers Physical address corresponding to a compile-time address is not bound until run time

Virtual Memory – cont Since binding changes with time, use a dynamic virtual address map, Yt Virtual Address Space Yt

Virtual Memory – cont Physical Address Space Secondary Memory
Primary Memory n-1 Physical Address Space Fragments of the virtual address space are dynamically loaded into primary memory at any given time Secondary Memory Each address space is fragmented Virtual Address Space for pi Virtual Address Space for pj Virtual Address Space for pk Complete virtual address space is stored in secondary memory

Address Translation Virtual memory systems distinguish among symbolic names, virtual address, and physical address spaces Need to map symbolic names to virtual addresses, and then to physical addresses Compiler/assembler and link editor handle mapping from symbolic names in name space to virtual address When program is executed, the virtual addresses are mapped to physical addresses

Names, Virtual Addresses & Physical Addresses
Dynamically Executable Image Physical Address Space Yt: Virtual Address Space  Physical Address Space Source Program Absolute Module Name Space Pi’s Virtual Address Space

Address Formation Translation system creates an address space, but its address are virtual instead of physical A virtual address, x: Is mapped to physical address y = Yt(x) if x is loaded at physical address y Is mapped to W (the null address) if x is not loaded The map, Yt, changes as the process executes -- it is “time varying” Yt: Virtual Address  Physical Address  {W}

Translation Process If Yt(k) = W at time t and the process references location k, then The virtual manager will stop the process The referenced location is loaded at some location (say m) The manager changes Yt(k) = m The manager lets the process continue execution Note that the referenced element was determined missing after an instruction started execution – CPU needs to be able to “back out” of an instruction and reexecute instruction after translation mapping

Size of Blocks of Memory
Virtual memory system transfers “blocks” of the address space to/from primary memory Fixed size blocks: System-defined pages are moved back and forth between primary and secondary memory Variable size blocks: Programmer-defined segments – corresponding to logical fragments – are the unit of movement Paging is the commercially dominant form of virtual memory today

Paging A page is a fixed size, 2h, block of virtual addresses
A page frame is a fixed size, 2h, block of physical memory (the same size as a page) When a virtual address, x, in page i is referenced by the CPU If page i is loaded at page frame j, the virtual address is relocated to page frame j If page is not loaded, the OS interrupts the process and loads the page into a page frame

Practicality of paging
Paging only works because of locality at any one point in time programs don’t need most of their pages Page fault rates must be very, very low for paging to be practical like one page fault per 100,000 or more memory references

Addresses Suppose there are G= 2g2h=2g+h virtual addresses and H=2j+h physical addresses assigned to a process Each page/page frame is 2h addresses There are 2g pages in the virtual address space 2j page frames are allocated to the process Rather than map individual addresses Yt maps the 2g pages to the 2j page frames That is, page_framej = Yt(pagei) Address k in pagei corresponds to address k in page_framej

Page-Based Address Translation
Let N = {d0, d1, … dn-1} be the pages Let M = {b0, b1, …, bm-1} be page frames Virtual address, i, satisfies 0i<G= 2g+h Physical address, k = U2h+V (0V<G= 2h ) U is page frame number V is the line number within the page Yt:[0:G-1]  <U, V>  {W} Since every page is size c=2h page number = U = i/c line number = V = i mod c

Address Translation (cont)
g bits h bits Virtual Address Page # Line # “page table” Missing Page Yt j bits h bits Physical Address Frame # Line # CPU Memory MAR

Paging Algorithms Two basic types of paging algorithms
Static allocation Dynamic allocation Three basic policies in defining any paging algorithm Fetch policy – when a page should be loaded Replacement policy –which page is unloaded Placement policy – where page should be loaded

Fetch Policy Determines when a page should be brought into primary memory Usually don’t have prior knowledge about what pages will be needed Majority of paging mechanisms use a demand fetch policy Page is loaded only when process references it

Demand Paging Algorithm
Page fault occurs Process with missing page is interrupted Memory manager locates the missing page Page frame is unloaded (replacement policy) Page is loaded in the vacated page frame Page table is updated Process is restarted

Page references Processes continually reference memory
and so generate a stream of page references The page reference stream tells us everything about how a process uses memory For a given size, we only need to consider the page number If we have a reference to a page, then immediately following references to the page will never generate a page fault 0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103 0104, 0101, 0610, 0103, 0104, 0101, 0609, 0102, 0105 Suppose the page size is 100 bytes, what is the page reference stream? We use page reference streams to evaluate paging algorithms

Modeling Page Behavior
Let w = r1, r2, r3, …, ri, … be a page reference stream ri is the ith page # referenced by the process The subscript is the virtual time for the process Given a page frame allocation of m, the memory state at time t, St(m), is set of pages loaded St(m) = St-1(m)  Xt - Yt Xt is the set of fetched pages at time t Yt is the set of replaced pages at time t

More on Demand Paging If rt was loaded at time t-1, St(m) = St-1(m)
If rt was not loaded at time t-1 and there were empty page frames St(m) = St-1(m)  {rt} If rt was not loaded at time t-1 and there were no empty page frames St(m) = St-1(m)  {rt} - {y} where y is the page frame unloaded

Replacement Policy When there is no empty page frame in memory, we need to find one to replace Write it out to the swap area if it has been changed since it was read in from the swap area Dirty bit or modified bit pages that have been changed are referred to as “dirty” these pages must be written out to disk because the disk version is out of date this is called “cleaning” the page Which page to remove from memory to make room for a new page We need a page replacement algorithm

Page replacement algorithms
The goal of a page replacement algorithm is to produce the fewest page faults We can compare two algorithms on a range of page reference streams Or we can compare an algorithm to the best possible algorithm We will start by considering static page replacement algorithms

Static Paging Algorithms
A fixed number of page frames is allocated to each process when it is created Paging policy defines how these page frames will be loaded and unloaded Placement policy is fixed The page frame holding the new page is always the one vacated by the page selected for replacement

Static Allocation, Demand Paging
Number of page frames is static over the life of the process Fetch policy is demand Since St(m) = St-1(m)  {rt} - {y}, the replacement policy must choose y -- which uniquely identifies the paging policy

Random page replacement
Algorithm: replace a page randomly Theory: we cannot predict the future at all Implementation: easy Performance: poor but it is easy to implement but the best case, worse case and average case are all the same

Random Replacement Replaced page, y, is chosen from the m loaded page frames with probability 1/m Let page reference stream, v = Frame 1 2 2 2 2 3 2 1 3 2 1 3 2 1 3 1 3 1 3 1 2 1 2 3 2 3 2 6 2 4 6 2 4 5 2 7 5 2 No knowledge of v  doesn’t perform well 13 page faults

Belady’s Optimal algorithm
The one that produces the fewest possible page faults on all page reference sequences Algorithm: replace the page that will not be used for the longest time in the future Problem: it requires knowledge of the future Not realizable in practice but it is used to measure the effectiveness of realizable algorithms

Belady’s Optimal Algorithm
Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame

Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame FWD4(2) = 1 FWD4(0) = 2 FWD4(3) = 3

Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame

Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame FWD7(2) = 2 FWD7(0) = 3 FWD7(1) = 1

Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame FWD10(2) =  FWD10(3) = 2 FWD10(1) = 3

Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame FWD13(0) =  FWD13(3) =  FWD13(1) = 

Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x) Let page reference stream, v = Frame 10 page faults Perfect knowledge of v  perfect performance Impossible to implement

Theories of program behavior
All replacement algorithms try to predict the future and act like Belady’s optimal algorithm All replacement algorithms have a theory of how program behave they use it to predict the future, that is, when pages will be referenced then the replace the page that they think won’t be referenced for the longest time.

LRU page replacement Least-recently used (LRU)
Algorithm: remove the page that hasn’t been referenced for the longest time Theory: the future will be like the past, page accesses tend to be clustered in time Implementation: hard, requires hardware assistance (and then still not easy) Performance: very good, within 30%-40% of optimal

LRU model of the future

Least Recently Used (LRU)
Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x) Let page reference stream, v = Frame BKWD4(2) = 3 BKWD4(0) = 2 BKWD4(3) = 1

Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x) Let page reference stream, v = Frame

Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x) Let page reference stream, v = Frame Backward distance is a good predictor of forward distance -- locality

LFU page replacement Least-frequently used (LFU)
Algorithm: remove the page that hasn’t been used often in the past Theory: an actively used page should have a large reference count Implementation: hard, also requires hardware assistance (and then still not easy) Performance: not very good

Least Frequently Used (LFU)
Replace page with minimum use: yt = min xeS t-1(m)FREQ(x) Let page reference stream, v = Frame FREQ4(2) = 1 FREQ4(0) = 1 FREQ4(3) = 1

Replace page with minimum use: yt = min xeS t-1(m)FREQ(x) Let page reference stream, v = Frame FREQ6(2) = 2 FREQ6(1) = 1 FREQ6(3) = 1

Replace page with minimum use: yt = min xeS t-1(m)FREQ(x) Let page reference stream, v = Frame FREQ7(2) = ? FREQ7(1) = ? FREQ7(0) = ?

FIFO page replacement Algorithm: replace the oldest page
Theory: pages are used for a while and then stop being used Implementation: easy Performance: poor because old pages are often accessed, that is, the theory if FIFO is not correct

First In First Out (FIFO)
Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x) Let page reference stream, v = Frame

Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x) Let page reference stream, v = Frame AGE4(2) = 3 AGE4(0) = 2 AGE4(3) = 1

Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x) Let page reference stream, v = Frame AGE5(1) = ? AGE5(0) = ? AGE5(3) = ?

Belady’s Anomaly FIFO with m = 3 has 9 faults
Let page reference stream, v = Frame Frame FIFO with m = 3 has 9 faults FIFO with m = 4 has 10 faults

Belady’s Anomaly The paging algorithm has worse performance when the amount of primary memory allocated to the process increases Problem arises because the set of pages loaded with the smaller memory allocation is not necessarily also loaded with the larger memory allocation

Avoiding Belady’s Anomaly
Inclusion Property Set of pages loaded with an allocation of m frames is always a subset of the set of pages that has a page frame allocation of m+1 FIFO does not satisfy the inclusion property LRU and LFU do Algorithms that satisfy the inclusion property are called stack algorithms

Stack Algorithms Some algorithms are well-behaved
Inclusion Property: Pages loaded at time t with m is also loaded at time t with m+1 Frame LRU Frame

Stack Algorithms Some algorithms are not well-behaved
Inclusion Property: Pages loaded at time t with m aren’t loaded at time t with m+1 Frame FIFO Frame

Implementation LRU has become preferred algorithm
Difficult to implement Must record when each page was referenced Difficult to do in hardware Approximate LRU with a reference bit Periodically reset Set for a page when it is referenced Dirty bit pages that have been changed are referred to as “dirty” these pages must be written out to disk because the disk version is out of date this is called “cleaning” the page

First LRU approximation
When you get a page fault replace any page whose referenced bit is off then turn off all the referenced bits Two classes of pages Pages referenced since the last page fault Pages not referenced since the last page fault the least recently used page is in this class but you don’t know which one it is A crude approximation of LRU

Second LRU approximation
Algorithm: Keep a counter for each page Have a daemon wake up every 500 ms and add one to the counter of each page that has not been referenced zero the counter of pages that have been referenced turn off all referenced bits When you get a page fault replace the page whose counter is largest Divides pages into 256 classes

Dynamic Paging Algorithms
Static page replacement algorithms assume that a process is allocated to a fixed amount of primary memory But, the amount of physical memory – the number of page frames – varies as the process executes How much memory should be allocated? Fault rate must be “tolerable” Will change according to the phase of process Need to define a placement & replacement policy Contemporary models based on working set

Working Set Intuitively, the working set is the set of pages in the process’s locality Somewhat imprecise Time varying Given k processes in memory, let mi(t) be # of pages frames allocated to pi at time t mi(0) = 0 i=1k mi(t)  |primary memory| Also have St(mi(t)) = St(mi(t-1))  Xt - Yt Or, more simply S(mi(t)) = S(mi(t-1))  Xt - Yt

Placed/Replaced Pages
S(mi(t)) = S(mi(t-1))  Xt - Yt For the missing page Allocate a new page frame Xt = {rt} in the new page frame How should Yt be defined? Consider a parameter, , called the window size Determine BKWDt(y) for every yS(mi(t-1)) if BKWDt(y)  , unload y and deallocate frame if BKWDt(y) <  do not disturb y

Working Set Principle Process pi should only be loaded and active if it can be allocated enough page frames to hold its entire working set The size of the working set is estimated using  Unfortunately, a “good” value of  depends on the size of the locality Empirically this works with a fixed 

Working set algorithm Algorithm
Keep track of the working set of each running process Only run a process if its entire working set fits in memory – called working set principle

Working set algorithm example
With =3, there are 16 page faults With =4, there are 8 – the minimum size since there are 8 distinct pages

Working set algorithm example – cont.
Letting  =9 does not reduce the number of page faults In fact, not all the page frames are used

Working set algorithm example – cont.
Here the page frame allocation changes dynamically increasing and decreasing

Implementing the Working Set
Global LRU will behave similarly to a working set algorithm Page fault Add a page frame to one process Take away a page frame from another process Use LRU implementation idea Reference bit for every page frame Cleared periodically, set with each reference Change allocation of some page frame with a clear reference bit Clock algorithms use this technique by searching for cleared ref bits in a circular fashion

Performance of Demand Paging
Page Fault Rate probability: 0  p  1.0 if p = 0 no page faults if p = 1, every reference is a fault Effective Access Time (EAT) EAT = (1 – p) x memory access + p (page fault overhead + [swap page out ] + swap page in + restart overhead)

Demand Paging Performance Example
Assume memory access time = 100 nanosecond Assume fault service time = 25 ms = 25,000,000 ns Then EAT = (1 – p) x p (25,000,000) = ,999,900 p (in ns) So, if one out of 1000 accesses causes a page fault, then EAT = ,999,900x0.001=25,099.9 ns ≈ 25 microseconds

Demand Paging Performance Example
So, if one access out of 1000 causes a page fault, the computer would be slowed down by a factor of 250 because of demand paging! Can calculate that if we want less than 10% degradation, need to allow only one access out of 2,500,000 to page fault

Evaluating paging algorithms
Mathematical modeling powerful where it works but most real algorithms cannot be analyzed Measurement implement it on a real system and measure it extremely expensive Simulation Test on page reference traces reasonably efficient effective

Performance of paging algorithms

Thrashing VM allows more processes in memory, so several processes are more likely to be ready to run at the same time If CPU usage is low, it is logical to bring more processes into memory But, low CPU use may to due to too many pages faults because there are too many processes competing for memory Bringing in processes makes it worse, and leads to thrashing

Thrashing Diagram There are too many processes in memory and no process has enough memory to run. As a result, the page fault is very high and the system spends all of its time handling page fault interrupts.

Load control Load control: deciding how many processes should be competing for page frames too many leads to thrashing too few means that memory is underused Load control determines which processes are running at a point in time the others have no page frames and cannot run CPU load is a bad load control measure Page fault rate is a good load control measure

Load control and page replacement

Two levels of scheduling

Load control algorithms
A load control algorithm measures memory load and swaps processes in and out depending on the current load Load control measures rotational speed of the clock hand average time spent in the standby list page fault rate

Page fault frequency load control
L = mean time between page faults S = mean time to service a page fault Try to keep L = S if L < S, then swap a process out if L > S, then swap a process in If L = S, then the paging system can just keep up with the page faults

Windows NT Paging System
Reference to Address k in Page i (User space) Virtual Address Space Lookup (Page i, Addr k) Reference (Page Frame j, Addr k) Primary Memory Translate (Page i, Addr k) to (Page Frame j, Addr k) Paging Disk (Secondary Memory) Supv space User space

Windows Address Translation
Page Directory Page Table Byte Index Virtual page number Line number Page Tables Target Page Target Byte c b a A C B

Linux Virtual Address Translation

Segmentation Unit of memory movement is:
Variably sized Defined by the programmer Two component addresses, <Seg#, offset> Seg # is reference to a base location Offset is offset of target within segment Address translation is more complex than paging

Segment Address Translation
Address translation is more complex than paging Yt: segments x offsets  physical address  {W} Yt(i, j) = k; i = segment, j = offset, k = physical address Segment names typically symbolic; bound at runtime s: segments  segment addresses Yt(s(segName), j) = k Offset may also not be bound until runtime l: offset names  offset addresses So, address map could be as complex as Yt(s(segName), l(offsetName)) = k

Segment Address Translation
Task of designing segmentation system to handle such general address translation is very challenging Each memory reference is theoretically a pair of symbols to be translated when the reference occurs In addition, the mappings are time-varying The segment could be anywhere in primary and/or secondary memory

Address Translation ? + s l segment # offset Limit Yt Relocation
<segmentName, offsetName> s l segment # offset ? Limit Yt Relocation Missing segment + Limit Base P To Memory Address Register

Address Translation – cont.
System maintains segment table for each process (which is a segment itself) Table contains a set of entries – called segment descriptors Descriptors contain fields to support relocation; also indicates if not loaded Base: relocation register for segment Limit: length of segment Protection: allowable forms of access

Implementation Most implementations do not fully implement the address translation model Segmentation requires special hardware Segment descriptor support Segment base registers (segment, code, stack) Translation hardware Some of translation can be static No dynamic offset name binding Limited protection

Multics Designed in late 60’s
Old, but still state-of-the-art segmentation Uses linkage segments to support sharing Uses dynamic offset name binding Required sophisticated memory management unit See pp

Memory Management.

Similar presentations

Presentation on theme: "Memory Management."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Memory Management.

Similar presentations

Presentation on theme: "Memory Management."— Presentation transcript:

Similar presentations

About project

Feedback