Lecture 12 Virtual Memory

Lecture 12 Virtual Memory
Peng Liu

Last time in Lecture 11 Cache
The memory system has a significant effect on program execution time. The number of memory-stall cycles depends on both the miss rate and the miss penalty. To reduce the miss rate, the use of associative placement schemes To reduce the miss penalty by allowing a larger secondary cache to handle misses to the primary cache. Cache performance Using associativity to reduce miss rate The use of multilevel cache hierarchies to reduce miss penalties

Review Direct-Mapped Cache Tag Index t k b V Tag Data Block 2k lines t
Offset t k b V Tag Data Block 2k lines t = HIT Data Word or Byte

2-Way Set-Associative Cache
Review Tag Index Block Offset b t k V Tag Data Block V Tag Data Block t Compare latency to direct mapped case? Data Word or Byte = = HIT

Fully Associative Cache
Review V Tag Data Block t = Tag t = HIT Block Offset Data Word or Byte = b

Tag & Index with Set-Associative Caches
Review Assume a 2n-byte cache with 2m-byte blocks that is 2a way set-associative Which bits of the address are the tag or the index? m least significant bits are byte select within the block Basic idea The cache contains 2n/2m=2n-m blocks Each cache way contains 2n-m/2a=2n-m-a blocks Cache index: (n-m-a) bits after the byte select Same index used with all cache ways … Observation For fixed size, length of tags increases with the associativity Associative caches incur more overhead for tags 2^a way set-associate

Review Replacement Methods Which line do you replace on a miss?
Direct Mapped Easy, you have only one choice Replace the line at the index you need N-way Set Associative Need to choose which way to replace Random (choose one at random) Least Recently Used (LRU) (the one used least recently) Often difficult to calculate, so people use approximations. Often they are really not recently used

Replacement only happens on misses
Replacement Policy Review In an associative cache, which block from a set should be evicted when the set becomes full? Random Least Recently Used (LRU) LRU cache state must be updated on every access true implementation only feasible for small sets (2-way) pseudo-LRU binary tree often used for 4-8 way First In, First Out (FIFO) a.k.a. Round-Robin used in highly associative caches This is a second-order effect. Why? NLRU used in Alpha TLBs. Replacement only happens on misses

Causes for Cache Misses
Review Compulsory: first-reference to a block a.k.a. cold start misses - misses that would occur even with infinite cache Capacity: cache is too small to hold all data needed by the program - misses that would occur even under perfect replacement policy Conflict: misses that occur because of collisions due to block-placement strategy - misses that would not occur with full associativity

Review Write Policy Choices Cache hit:
write through: write both cache & memory generally higher traffic but simplifies cache coherence write back: write cache only (memory is written only when the entry is evicted) a dirty bit per block can further reduce the traffic Cache miss: no write allocate: only write to main memory write allocate (aka fetch on write): fetch into cache Common combinations: write through and no write allocate write back with write allocate

Cache Design: Datapath + Control
Review Most design errors come from incorrect specification of state machine behavior! Common bugs: Stalls, Block replacement, Write buffer To Lower Level Memory To CPU Control State Machine Control Control To Lower Level Memory Addr Addr To CPU Blocks Tags Din Din Dout Dout

Virtual Memory 虚拟存储器使多道程序设计更为有效，同时消除了用户使用主存的限制。
许多虚拟存储器方案使用一个特殊的高速缓存来存放页表项，通常称为快表（TLB），这个高速缓存功能与存储器中高速缓存的相同，它用来存储最近使用的那些也的页表项。 A technique that uses main memory as a “cache” for secondary storage.

Motivation #1: Large Address Space for Each Executing Program
Each program thinks it has a ~232 byte address space of its own May not use it all though Available main memory may be much smaller To allow efficient and safe sharing of memory among multiple programs To remove the programming burdens of a small, limited amount of main memory.

Motivation #2: Memory Management for Multiple Programs
At an point in time, a computer may be running multiple programs E.g., Firefox + Foxmail Questions: How do we share memory between multiple programs? How do we avoid address conflicts? How do we protect programs Isolations and selective sharing

Virtual Memory in a Nutshell
Use hard disk (or Flash) as a large storage for data of all programs Main memory (DRAM) is a cache for the disk Managed jointly by hardware and the operating system (OS) Each running program has its own virtual address space Address space as shown in previous figure Protected from other programs Frequently-used portions of virtual address space copied to DRAM DRAM = physical address space Hardware + OS translate virtual addresses (VA) used by program to physical addresses (PA) used by the hardware Translation enables relocation (DRAM disk) & protection

Reminder: Memory Hierarchy Everything is a Cache for Something Else
Access time Capacity Managed by 1 cycle ~500B Software/compiler 1-3 cycles ~64KB hardware 5-10 cycles 1-10MB ~100 cycles ~10GB Software/OS cycles ~100GB

DRAM vs. SRAM as a “Cache”
DRAM vs. disk is more extreme than SRAM vs. DRAM Access latencies: DRAM ~10X slower than SRAM Disk ~100000X slower than DRAM Importance of exploiting spatial locality First byte is ~100,000X slower than successive bytes on disk vs, ~4X improvement for page-mode vs. regular accesses to DRAM

Impact of These Properties on Design
Bottom line: Design decision made for virtual memory driven by enormous cost of misses (disk accesses) Consider the following parameters for DRAM as a “cache” for the disk Line size? Large, since disk better at transferring large blocks and minminzes miss rate Associativity? High, to minminze miss rate Write through or write back? Write back, since can’t afford to perform small writes to disk Write-through will not work for virtual memory, since writes take too long. Instead, virtual memory systems use write-back.

Terminology for Virtual Memory
Virtual memory used DRAM as a cache for disk New terms VM block is called a “page” The unit of data moving between disk and DRAM It is typically larger than a cache block (e.g., 4KB or 16KB) Virtual and physical address spaces can be divided up to virtual pages and physical pages (e.g., contiguous chunks of 4KB) VM miss is called a “page fault” More on this later An event that occurs when an accessed page is not present in main memory.

Locating an Object in a “Cache”
SRAM Cache (L1,L2, etc) Tag stored with cache line Maps from cache block to a memory address No tag for blocks not in cache If not in cache, then it is in main memory Hardware retrieves and manages tag information Can quickly match against multiple tags

Locating an Object in a “Cache” (cont.)
SRAM Cache (virtual memory) Each allocated page of virtual memory has entry in page table Mapping from virtual pages to physical pages One entry per page in the virtual address space Page table entry even if page not in memory Specifies disk address OS retrieve and manages page table information

A System with Physical Memory Only
Examples: Most Cray machines, early PCs, nearly all embedded systems, etc Addresses generated by the CPU point directly to bytes in physical memory

A System with Virtual Memory
Examples: Workstations, serves, modern PCs, etc. Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table)

Page Faults (Similar to “Cache Misses”)
What if an object is on disk rather than in memory? Page table entry indicates virtual address not in memory OS exception handler invoked to move data from disk into memory OS has full control over placement Full-associativity to minimize future misses Before fault After fault

Does VM Satisfy Original Motivations?
Multiple active programs can share physical address space Address conflicts are resolved All programs think their code is at 0x400000 Data from different programs can be protected Programs can share data or code when desired

Answer: Yes, Using Separate Addresses Spaces Per Program
Each program has its own virtual address space and own page table Addresses 0x from different programs can map to different locations or same location as desired OS control how virtual pages as assigned to physical memory

Bare Machine Physical Address PC D E M Physical Address W Inst. Cache Decode Data Cache + Memory Controller Physical Address Physical Address Physical Address Main Memory (DRAM) In a bare machine, the only kind of address is a physical address CS252 S05 29

Dynamic Address Translation
Motivation In the early machines, I/O operations were slow and each word transferred involved the CPU Higher throughput if CPU and I/O of 2 or more programs were overlapped. How?multiprogramming with DMA I/O devices, interrupts Location-independent programs Programming and storage management ease  need for a base register Protection Independent programs should not affect each other inadvertently  need for a bound register Multiprogramming drives requirement for resident supervisor software to manage context switches between multiple programs prog1 Physical Memory prog2 OS CS252 S05 30

Translation: High-level View
Fixed-size pages Physical page sometimes called as frame

Translation: Process

Address Translation & Protection
Virtual Address Virtual Page No. (VPN) offset Kernel/User Mode Protection Check Read/Write Address Translation Exception? Physical Address Physical Page No. (PPN) offset Every instruction and data access needs address translation and protection checks A good VM design needs to be fast (~ one cycle) and space efficient CS252 S05 33

Translation Process Explained
Valid page Check access rights (R,W,X) against access type Generate physical address if allowed Generate a protection fault (exception) if illegal access Invalid page Page is not currently mapped and a page fault is generated Faults are handled by the operating system Sometimes due to a program error => program terminated E.g. accessing out of the bounds of array Sometimes due to “caching” => refill & restart Desired data or code available on disk Space allocated in DRAM, page copied from disk, page table updated Replacement may be needed

VM: Replacement and Writes
To reduce page fault rate, OS uses least-recently used (LRU) replacement Reference bit (aka use bit) in PTE set to 1 on access to page Periodically cleared to 0 by OS A page with reference bit = 0 has not been used recently Disk writes take millions of cycles Block at once, not individual locations Write through is impractical Use write-back Dirty bit in PTE set when page is written

Fast Translation Using a TLB
Address translation would appear to require extra memory references One to access the PTE Then the actual memory access But access to page tables has good locality So use a fast hardware cache of PTEs within the processor Called a Translation Look-aside Buffer (TLB) Typical: PTEs, cycle for hit cycles for miss, 0.01%-1% miss rate Misses could be handled by hardware or software

Fast Translation Using a TLB

Translation LookasideBuffers (TLB)
Address translation is very expensive! In a two-level page table, each reference becomes several memory accesses Solution: Cache translations in TLB TLB hit Single-Cycle Translation TLB miss Page-Table Walk to refill virtual address VPN offset V R W D tag PPN (VPN = virtual page number) 3 memory references 2 page faults (disk accesses) + .. Actually used in IBM before paged memory. (PPN = physical page number) hit? physical address PPN offset CS252 S05 39

TLB Entries The TLB is a cache for page table entries (PTE)
The data for a TLB entry ( == a PTE entry) Physical page number (frame #) Access rights (R/W bits) Any other PTE information (dirty bit, LRU info, etc) The tags for a TLB entry Physical page number Portion of it not used for indexing into the TLB Valid bit LRU bits If TLB is associative and LRU replacement is used

TLB Misses If page is in memory If page is not in memory (page fault)
Load the PTE from memory and retry Could be handled in hardware Can get complex for more complicated page table structures Or in software Raise a special exception, with optimized handler This is what MIPS does using a special vectored interrupt If page is not in memory (page fault) OS handles fetching the page and updating the page table Then restart the faulting instruction

TLB & Memory Hierarchies
Once address is translated, it used to access memory hierarchy A hierarchy of caches (L1, L2, etc)

TLB and Cache Interaction
Basic process Use TLB to get PA Use PA to access caches and DRAM Question: can you ever access the TLB and the cache in parallel?

Page-Based Virtual-Memory Machine (Hardware Page-Table Walk)
Page Fault? Protection violation? Page Fault? Protection violation? Virtual Address Virtual Address Physical Address Physical Address PC D E M W Inst. TLB Decode Data TLB Inst. Cache Data Cache + Miss? Miss? Page-Table Base Register Hardware Page Table Walker Memory Controller Physical Address Physical Address Physical Address Main Memory (DRAM) Assumes page tables held in untranslated physical memory CS252 S05 44

Virtual Memory Summary
Use hard disk ( or Flash) as large storage for data of all programs Main memory (DRAM) is a cache for the disk Managed jointly by hardware and the operating system (OS) Each running program has its own virtual address space Address space as shown in previous figure Protected from other programs Frequently-used portions of virtual address space copied to DRAM DRAM = physical address space Hardware + OS translate virtual addresses (VA) used by program to physical addresses (PA) used by the hardware Translation enables relocation & protection

Acknowledgements Read Book These slides contain material from courses:
UCB CS152 Stanford EE108B Read Book Pages Do exercises 5.10 (use reference stream a) 5.11 (use reference stream b)

Lecture 12 Virtual Memory

Similar presentations

Presentation on theme: "Lecture 12 Virtual Memory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 12 Virtual Memory

Similar presentations

Presentation on theme: "Lecture 12 Virtual Memory"— Presentation transcript:

Similar presentations

About project

Feedback