Virtual Memory.

Virtual Memory

Virtual Memory : Motivation
Historically, there were two major motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs, and to remove the programming burden of a small, limited amount of main memory. [Patt&Henn 04] …a system has been devised to make the core drum combination appear to programmer as a single level store, the requisite transfers taking place automatically Kilbum et al.

automatically manage the M hierarchy (as “one-level”)
So: the purpose of VM provide sharing automatically manage the M hierarchy (as “one-level”) simplify loading (for relocation) MAIN PROCESSOR MEMORY MANAGE- MENT UNIT HIGH- SPEED CACHE BACKING STORE LOGICAL ADDRESS CONTROL DATA PHYSICAL ADDRESS

Structure of Virtual Memory
from Processor Virtual Address Page fault Using elaborate Software page fault Handling algorithm Address Translator Physical Address to Memory

A Paging System 64K virtual address space 32K main memory Main memory
(b) A Paging System (a)

Page Table Virtual page page frame Main memory Page frame
1 = present in main memory, 0 = not present in main memory

Physical memory Virtual page number Page table Disk storage The virtual page number is used to index the page table. If the valid bit is on, the page table supplies the physical page number (i.e.,., the starting address of the page in memory) corresponding to the virtual page. If the valid bit is off, the page currently resides only on disk, at a specified address. In many systems, the table of physical page addresses and disk page addresses, while logically one table, are stored in two separate data structures. Dual tables are justified in part because we must keep the disk addressers of all the pages, even if they are currently in main memory. The page table maps each page in virtual memory to either a page in physical memory or a page stored on disk, which is the next level in the hierarchy.

Technology Technology Access Time $ per GB in 2004
SRAM – 5ns $4,000 – 10,000 DRAM ns $ Magnetic disk x 10^6ns $

Typical ranges of parameters for virtual memory.
These figures, contrasted with the values for caches, represent increases of 10 to 100,000 times.

Virtual Address Mapping
within Page Page Number Displacement PAGE MAP Base Address of Page PAGE (in Memory) Virtual Address Mapping

Terminology Page Page fault Virtual address Physical address
Memory mapping or address translation

VM Simplifies Loading VM provide relocation function
Address mapping allows programs to be load in any location in physical M Under VM relocation does not need special OS + hardware support as in the past

Address Translation Consideration
Direct mapping using register sets Indirect mapping using tables Associative mapping of frequently used pages

How many Pages? How large is PT?
The Page Table (PT) must have one entry for each page in virtual memory! How many Pages? How large is PT? # of virtual pages does not need to be equal to # of pages addressable with physical address.

4 Key Design Decisions in VM Design
Pages should be large enough to amortize the high access time. (from 4 KB to 16 KB are typical, and some designers are considering size as large as 64 KB.) Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative)

4 Key Design Decisions in VM Design (con’d)
Page fault (misses) in a virtual memory system can be handled in software, because the overhead will be small compared to the access time to disk. Furthermore, the software can afford to used clever algorithms for choosing how to place pages, because even small reductions in the miss rate will pay for the cost of such algorithms. Using write-through to manage writes in virtual memory will not work since writes take too long. Instead, we need a scheme that reduce the number of disk writes.

What happens on a write ? Write-through to secondary storage is impractical for VM write-back is used: advantages (reduce number of writes to disk, amortize the cost) dirty-bit

Page Size Selection Constraints
Efficiency of secondary memory device Page table size Fragmentation (internal) (last part of last page) Program logic structure logic block size: < 1K ~ 4K Table fragmentation [Kai, P68] (PT occupies some space)

Page Size Selection PT size Miss ratio
PT transfer from disk to memory efficiency Internal fragmentation text heap stack Start-up time of a process - the smaller the faster! 3 x 0.5 = 1.5 times of a page size per process!

An Example Case 1 VM page size 512 VM address space 64K
Total virtual page = = 128 pages 64K 512

Case 2 VM page size 512 = 29 VM address space 4G = 232 Total virtual page = = 8M pages if each PTE has 32 bits: so total PT size (bytes) 8M x 4 = 32M bytes Note : assuming Main Memory has working set 4M byte or = = = 213 = 8192 frames 4G 512 ~ 4M 512 222 29

so total number of virtual pages:
How about VM address space =252 (R-6000) (4 Petabytes) page size 4K bytes so total number of virtual pages: 252 212 = 240 = !

Techniques for Reducing PT Size
Set a lower limit, and permit dynamic growth Permit growth from both directions Inverted page table (a hash table) Multi-Level page table (segments and pages) PT itself can be paged: I.e. put PT itself in virtual address space (Note: some small portion of pages should be in main memory and never paged out)

Two-level Address mapping
11 bits 11 bits 10 bits Segment Number Page Number Displacement Base of Segment Table 1 2047 SEGMENT TABLE Base Address of Page Table PAGE TABLE Base + 0 Base + 1 Base PAGE (in Memory) Base Address of Page Address within Page Two-level Address mapping

Placement: OS designers always pick lower miss rates vs. simpler placement algorithm So, “fully associativity - VM pages can go anywhere in the main M (compare with sector cache) Question: why not use associative hardware? (# of PT entries too big!)

VM: Implementation Issues
Page faults handling Translation lookahead buffer (TLB) Protection issues

Fast Address Translation
PT must involve at least two accesses of M for each M address Improvement: Store PT in fast registers (Example: Xerox: 256 R ?) TLB for multiprogramming, should store pid as part of tags in TLB.

Page Fault Handling When a virtual page number is not in TLB, then PT in M is accessed (through PTBR) to find the PTE If PTE indicates that the page is missing a page fault occurs Context switch!

The TLB acts as a cache on the page table for
Virtual page number Page table Disk storage Physical memory TLB The TLB contains a subset of the virtual-to-physical page mappings that are in the page table. (The TLB mappings are shown in color.) Because the TLB is a cache, it must have a tag field. If there is no matching entry in the TLB for a page, the page table must be examined. The page table either supplies a physical page number for the page (which can then be used to build a TLB entry) or indicates that the page resides on disk, in which case a page fault occurs. Since the page table has an entry for every virtual page (it is not a cache, in other words), no tag field is needed. The TLB acts as a cache on the page table for the entries that map to physical pages only

Some typical values for a TLB might be:
Miss penaly some time may be as high as upto 100 cycles. TLB size can be as long as 16 entries.

TLB Design Placement policy:
Small TLBs: full-associativity can be used large TLBs: fully-associativity may be too slow Replacement policy: sometime even random policy is used for speed/simplicity

Processing a read or a write through the DECStation 3100 TLB and cache
TLB access TLB hit? Virtual address Write? Try to read data from cache Check protection Write data into cache, update the dirty bit, and Put the data and the address into the write buffer Cache hit? Cache miss stall Yes No TLB miss exception If the TLB generates a hit, the cache can be accessed with the resulting physical address. If the operation is a write, the cache entry is overwritten and the data is sent to the write buffer; remember, though, that a cache write miss cannot occur for the DECStation 3100 cache, which uses one-word blocks and a write-through from memory. In actuality, the TLB does not contain a true dirty bit; instead, it uses the write protection bit to detect the first write. How this works will be explained in the next section. Notice that a TLB hit and a cache hit are independent events; this is examined further in the exercises at the end of this chapter. Processing a read or a write through the DECStation 3100 TLB and cache

Page frame address in memory
pid ip iw Virtual address TLB Page map RWX pid M C P Page frame address in memory (PFA) PFA in S.M iw Physical address Operation validation RWX Requested access type S/U Access fault Page fault PME (x) Replacement policy If s/u = 1 - supervisor mode PME(x) * C = 1-page PFA modified PME(x) * P = 1-page is private to process PME(x) * pid is process identification number PME(x) * PFA is page frame address Virtual to read address translation using page map PME - Page map entry

Translation Lookaside Buffer
TLB - miss rate is low (Clark-Emer data [85] 3~4 times smaller then usually cache miss ratio) When TLB-miss, the penalty is relatively low (a TLB miss usually result in a cache fetch)

TLB-miss implies higher miss rate for the main cache
cont’d TLB-miss implies higher miss rate for the main cache TLB translation is process-dependent strategies for context switching 1. tagging by context 2. flushing complete purge by context (shared) No absolute answer

A Case Study DECStation 3100 Virtual address TLB Cache
…………… ………..… Virtual page number Page offset 20 12 Valid Dirty Tag Physical page number TLB = 20 TLB hit Physical address Tag 16 14 2 Index Byte offset Valid Tag Data Cache 32 = Cache hit Data DECStation 3100

Review: The Memory Hierarchy
Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology Processor 4-8 bytes (word) 1 to 4 blocks 1,024+ bytes (disk sector = page) 8-32 bytes (block) Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory (Relative) size of the memory at each level

Virtual Memory Use main memory as a “cache” for secondary memory
Allows efficient and safe sharing of memory among multiple programs Provides the ability to easily run programs larger than the size of physical memory Simplifies loading a program for execution by providing for code relocation (i.e., the code can be loaded anywhere in main memory) What makes it work? – again the Principle of Locality A program is likely to access a relatively small portion of its address space during any period of time Each program is compiled into its own address space – a “virtual” address space During run-time each virtual address must be translated to a physical address (an address in main memory)

Two Programs Sharing Physical Memory
A program’s address space is divided into pages (all one fixed size) or segments (variable sizes) The starting location of each page (either in main memory or in secondary memory) is contained in the program’s page table Program 1 virtual address space main memory Program 2 virtual address space

Address Translation A virtual address is translated to a physical address by a combination of hardware and software Virtual Address (VA) Virtual page number Page offset Translation Page offset Physical page number Physical Address (PA) The page size is 212 = 4KB, the number of physical pages allowed in memory is 218, the physical address space is 1GB and the virtual address space is 4GB So each memory request first requires an address translation from the virtual space to the physical space A virtual memory miss (i.e., when the page is not in physical memory) is called a page fault

Address Translation Mechanisms
Virtual page # Offset Physical page # Offset Physical page base addr V 1 Main memory Page Table (in main memory) Disk storage

Virtual Addressing with a Cache
Thus it takes an extra memory access to translate a VA to a PA CPU Trans- lation Cache Main Memory VA PA miss hit data This makes memory (cache) accesses very expensive (if every access was really two accesses) The hardware fix is to use a Translation Lookaside Buffer (TLB) – a small cache that keeps track of recently used address mappings to avoid having to do a page table lookup

Making Address Translation Fast
1 Tag Physical page base addr V TLB Virtual page # Physical page base addr V 1 Main memory Page Table (in physical memory) Disk storage

Translation Lookaside Buffers (TLBs)
Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped V Virtual Page # Physical Page # Dirty Ref Access TLB access time is typically smaller than cache access time (because TLBs are much smaller than caches) TLBs are typically not more than 128 to 256 entries even on high end machines

A TLB in the Memory Hierarchy
CPU TLB Lookup Cache Main Memory VA PA miss hit data Trans- lation ¾ t ¼ t A TLB miss – is it a page fault or merely a TLB miss? If the page is loaded into main memory, then the TLB miss can be handled (in hardware or software) by loading the translation information from the page table into the TLB Takes 10’s of cycles to find and load the translation info into the TLB If the page is not in main memory, then it’s a true page fault Takes 1,000,000’s of cycles to service a page fault TLB misses are much more frequent than true page faults

Some Virtual Memory Design Parameters
Paged VM TLBs Total size 16,000 to 250,000 words 16 to 512 entries Total size (KB) 250,000 to 1,000,000,000 0.25 to 16 Block size (B) 4000 to 64,000 4 to 32 Miss penalty (clocks) 10,000,000 to 100,000,000 10 to 1000 Miss rates % to % 0.01% to 2%

Two Machines’ Cache Parameters
Intel P4 AMD Opteron TLB organization 1 TLB for instructions and 1TLB for data Both 4-way set associative Both use ~LRU replacement Both have 128 entries TLB misses handled in hardware 2 TLBs for instructions and 2 TLBs for data Both L1 TLBs fully associative with ~LRU replacement Both L2 TLBs are 4-way set associative with round-robin LRU Both L1 TLBs have 40 entries Both L2 TLBs have 512 entries TBL misses handled in hardware A trace cache finds a dynamic sequence of instructions including taken branches to load into a cache block. Thus, the cache blocks contain dynamic traces of the executed instructions as determined by the CPU rather than static sequences of instructions as determined by memory layout. It folds branch prediction into the cache.

TLB Event Combinations
Page Table Cache Possible? Under what circumstances? Hit Miss Miss/ Yes – what we want! Yes – although the page table is not checked if the TLB hits Yes – TLB miss, PA in page table Yes – TLB miss, PA in page table, but data not in cache Yes – page fault Impossible – TLB translation not possible if page is not present in memory Impossible – data not allowed in cache if page is not in memory

Reducing Translation Time
Can overlap the cache access with the TLB access Works when the high order bits of the VA are used to access the TLB while the low order bits are used as index into cache Block offset 2-way Associative Cache Index PA Tag VA Tag Tag Data Tag Data PA Tag Overlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translation This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache TLB Hit = = Cache Hit Desired word

Why Not a Virtually Addressed Cache?
A virtually addressed cache would only require address translation on cache misses data CPU Trans- lation Cache Main Memory VA hit PA but Two different virtual addresses can map to the same physical address (when processes are sharing data), i.e., two different cache entries hold data for the same physical address – synonyms Must update all cache entries with the same physical address or the memory becomes inconsistent Doing synonym updates requires significant hardware – essentially an associative lookup on the physical address tags to see if you have multiple hits

The Hardware/Software Boundary
What parts of the virtual to physical address translation is done by or assisted by the hardware? Translation Lookaside Buffer (TLB) that caches the recent translations TLB access time is part of the cache hit time May allot an extra stage in the pipeline for TLB access Page table storage, fault detection and updating Page faults result in interrupts (precise) that are then handled by the OS Hardware must support (i.e., update appropriately) Dirty and Reference bits (e.g., ~LRU) in the Page Tables Disk placement Bootstrap (e.g., out of disk sector 0) so the system can service a limited number of page faults before the OS is even loaded

Summary The Principle of Locality:
Program likely to access a relatively small portion of the address space at any instant of time. Temporal Locality: Locality in Time Spatial Locality: Locality in Space Caches, TLBs, Virtual Memory all understood by examining how they deal with the four questions Where can block be placed? How is block found? What block is replaced on miss? How are writes handled? Page tables map virtual address to physical address TLBs are important for fast translation Let’s summarize today’s lecture. I know you have heard this many times and many ways but it is still worth repeating. Memory hierarchy works because of the Principle of Locality which says a program will access a relatively small portion of the address space at any instant of time. There are two types of locality: temporal locality, or locality in time and spatial locality, or locality in space. So far, we have covered three major categories of cache misses. Compulsory misses are cache misses due to cold start. You cannot avoid them but if you are going to run billions of instructions anyway, compulsory misses usually don’t bother you. Conflict misses are misses caused by multiple memory location being mapped to the same cache location. The nightmare scenario is the ping pong effect when a block is read into the cache but before we have a chance to use it, it was immediately forced out by another conflict miss. You can reduce Conflict misses by either increase the cache size or increase the associativity, or both. Finally, Capacity misses occurs when the cache is not big enough to contains all the cache blocks required by the program. You can reduce this miss rate by making the cache larger. There are two write policy as far as cache write is concerned. Write through requires a write buffer and a nightmare scenario is when the store occurs so frequent that you saturates your write buffer. The second write polity is write back. In this case, you only write to the cache and only when the cache block is being replaced do you write the cache block back to memory. +3 = 77 min. (Y:57)

Virtual Memory.

Similar presentations

Presentation on theme: "Virtual Memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Virtual Memory.

Similar presentations

Presentation on theme: "Virtual Memory."— Presentation transcript:

Similar presentations

About project

Feedback