CMPE 421 Parallel Computer Architecture

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

Chapter 4 Memory Management Basic memory management Swapping
SE-292: High Performance Computing
Cache and Virtual Memory Replacement Algorithms
Module 10: Virtual Memory
Chapter 10: Virtual Memory
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
SE-292 High Performance Computing
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Chapter 101 Virtual Memory Chapter 10 Sections and plus (Skip:10.3.2, 10.7, rest of 10.8)
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Virtual memory.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
CMPE 421 Parallel Computer Architecture
Lecture 19: Virtual Memory
1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Three C’s of Misses 7.5 Compulsory Misses The first time a memory location is accessed, it is always a miss Also known as cold-start misses Only way.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Review (1/2) °Caches are NOT mandatory: Processor performs arithmetic Memory stores data Caches simply make data transfers go faster °Each level of memory.
Virtual Memory 1 1.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CS203 – Advanced Computer Architecture Virtual Memory.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual Memory Chapter 7.4.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
5.2 Eleven Advanced Optimizations of Cache Performance
Morgan Kaufmann Publishers
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
Contents Memory types & memory hierarchy Virtual memory (VM)
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
Translation Buffers (TLBs)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Review What are the advantages/disadvantages of pages versus segments?
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Virtual Memory 1 1.
Presentation transcript:

CMPE 421 Parallel Computer Architecture PART5 More Elaborations with cache & Virtual Memory

Cache Optimization into categories Reducing Miss Penalty Multilevel caches Critical word first: Don’t wait for the full block to be loaded before sending the requested word and restarting the CPU Read Miss Before write miss: This optimization serves reads before writes have been completed. SW R2, 512(R0) ; M[512] ← R3 (cache index 0) LW R1,1024(R0) ; R1 ← M[1024] (cache index 0) LW R2,512(R0) ; R2 ← M[512] (cache index 0) If the write buffer hasn’t completed writing to location 512 in memory, the read of location 512 will put the old, wrong value into the cache block, and then into R2. Victim Caches

Victim Caches One approach to lower miss penalty is to remember what was discarded in case it is needed again. This victim cache contains only blocks that are discarded from a cache because of a miss—“victims”—and are checked on a miss to see if they have the desired data before going to the next lower-level memory. The AMD Athlon has a victim cache with eight entries. Jouppi [1990] found that victim caches of one to five entries are effective at reducing misses, especially for small, direct-mapped data caches. Depending on the program, a four-entry victim cache might remove one quarter of the misses in a 4-KB direct-mapped data cache.

Cache Optimization into categories Reducing the miss rate Larger block size, Larger cache size, Higher associativity, Way prediction Pseudo-associativity, In way-prediction, extra bits are kept in the cache to predict the set of the next cache access. Compiler optimizations Reducing the time to hit in the cache small and simple caches, avoiding address translation, and pipelined cache access.

Cache Optimization Complier-based cache optimization reduces the miss rate without any hardware change For Instructions Reorder procedures in memory to reduce conflict Profiling to determine likely conflicts among groups of instructions For Data Merging Arrays: improve spatial locality by single array of compound elements vs. two arrays Loop Interchange: change nesting of loops to access data in order stored in memory Loop Fusion: Combine two independent loops that have same looping and some variables overlap Blocking: Improve temporal locality by accessing “blocks” of data repeatedly vs. going down whole columns or rows

Examples Reduces misses by improving spatial locality through combined arrays that are accessed simultaneously Sequential accesses instead of striding through memory every 100 words; improved spatial locality

Examples Some programs have separate sections of code that access the same arrays (performing different computation on common data) Fusing multiple loops into a single loop allows the data in cache to be used repeatedly before being swapped out Loop fusion reduces missed through improved temporal locality (rather than spatial locality in array merging and loop interchange) Accessing array “a” and “c” would have caused twice the number of misses without loop fusion

Blocking Example

Example B called Blocking Factor Conflict misses can go down too Blocking is also useful for register allocation

Summary of performance equations

VIRTUAL MEMORY You’re running a huge program that requires 32MB Your PC has only 16MB available... Rewrite your program so that it implements overlays Execute the first portion of code (fit it in the available memory) When you need more memory... Find some memory that isn’t needed right now Save it to disk Use the memory for the latter portion of code So on... The memory is to disk as registers are to memory Disk as an extension of memory Main memory can act as a “cache” for the secondary stage (magnetic disk)

A Memory Hierarchy Disk Extend the hierarchy Registers CPU Load or I-Fetch Store Main Memory (DRAM) Cache Extend the hierarchy Main memory acts like a cache for the disk Cache: About $20/Mbyte <2ns access time 512KB typical Memory: About $0.15/MBtye, 50ns access time 256MB typical Disk: About $0.0015/MByte, 15ms (15,000,000 ns) access time 40GB typical HW manages movement Disk SW manages movement The operating system is responsible for managing the movement of memory between disk and main memory, and for keeping the address translation table accurate.

Virtual Memory Idea: Keep only the portions of a program (code, data) that are currently needed in Main Memory Currently unused data is saved on disk, ready to be brought in when needed Appears as a very large virtual memory (limited only by the disk size) Advantages: Programs that require large amounts of memory can be run (As long as they don’t need it all at once) Multiple programs can be in virtual memory at once, only active programs will be loaded into memory A program can be written (linked) to use whatever addresses it wants to! It doesn’t matter where it is physically loaded! When a program is loaded, it doesn’t need to be placed in continuous memory locations Disadvantages: The memory a program needs may all be on disk The operating system has to manage virtual memory

Virtual Memory We will focus on using the disk as a storage area for chunks of main memory that are not being used. The basic concepts are similar to providing a cache for main memory, although we now view part of the hard disk as being the memory. Only few programs are active An active might not need all the memory that has been reserved by the program (store rest in the Hard disk)

The Virtual Memory Concept Virtual Memory Space: All possible memory addresses (4GB in 32-bit systems) All that can be held as an option (conceived) . Virtual Memory Space Disk Swap Space: Area on hard disk that can be used as an extension of memory. (Typically equal to ram size) All that can be used. Disk Swap Space Main Memory: Physical memory. (Typically 1GB) All that physically exists. Main Memory

The Virtual Memory Concept This address can be conceived of, but doesn’t correspond to any memory. Accessing it will produce an error. Virtual Memory Space Disk Swap Space Main Memory This address can be accessed. However, it currently is only on disk and must be read into main memory before being used. A table maps from its virtual address to the disk location. Error This address can be accessed immediately since it is already in memory. A table maps from its virtual address to its physical address. There will also be a back-up location on disk. Disk Address: 58984 Not in main memory Physical Address: 883232 Disk Address: 322321

The Process The CPU deals with Virtual Addresses Steps to accessing memory with a virtual address 1. Convert the virtual address to a physical address Need a special table (Virtual Addr-> Physical Addr.) The table may indicate that the desired address is on disk, but not in physical memory Read the location from the disk into memory (this may require moving something else out of memory to make room) 2. Do the memory access using the physical address Check the cache first (note: cache uses only physical addresses) Update the cache if needed

Structure of Virtual Memory Return our Library Analogy Virtual addresses as the title of a book Physical address as the location of that in the library From Processor Virtual Address Page fault Using elaborate Software page fault Handling algorithm Address Translator Physical Address To Memory

Translation (hardware that translates these virtual addresses to physical addresses) Since the hardware access memory, we need to convert from a logical address to a physical address in hardware The Memory Management Unit (MMU) provides this functionality 2n-1 Physical Memory CPU MMU Virtual Address (Logical) Physical Address (Real)

Address Translation In Virtual Memory, blocks of memory (called pages) are mapped from one set of address (called virtual addresses) to another set (called physical addresses)

Page Faults The virtual page number is used to index the page table. If the valid bit is on, the page table supplies the physical page number (i.e.,., the starting address of the page in memory) corresponding to the virtual page. If the valid bit is off, the page currently resides only on disk, at a specified address. In many systems, the table of physical page addresses and disk page addresses, while logically one table, are stored in two separate data structures. Dual tables are justified in part because we must keep the disk addressers of all the pages, even if they are currently in main memory. If the valid bit for a virtual page is off, a page fault occurs. The operating system must be given control. Once the operating system gets control, it must find the page in the next level of the hierarchy (usually magnetic disk) and decide where to place the requested page in main memory.

Terminology page: The unit of memory transferred between disk and the main memory. page fault: when a program accesses a virtual memory location that is not currently in the main memory. address translation: the process of finding the physical address that corresponds to a virtual address. Cache Virtual memory Block ⇒ Page Cache miss ⇒ page fault Block addressing ⇒ Address translation

Difference between virtual and cache memory The miss penalty is huge (millions of seconds) Solution: Increase block size (page size) around 8KB Because transfers have a large startup time, but data transfer is relatively fast after started Even on faults (misses) VM must provide info on the disk location VM system must have an entry for all possible locations When there is a hit, the VM system provides the physical address in memory (not the actual data, in the cache we have data itself ) Saves room – one address rather than 8 KB data Since miss penalty is very huge, VM systems typically have a miss (page fault) rate of 0.00001- 0.0001%

In Virtual Memory Systems Pages should be large enough to amortize the high access time. (from 4 kB to 16 kB are typical, and some designers are considering size as large as 64 kB.) Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative) Sophisticated LRU replacement policy is preferable Page faults can be handled in software. Write-back (Write-through scheme does not work.) we need a scheme that reduce the number of disk writes.

Keeping track of pages: The page table All programs use the same virtual addressing space Each program must have its own memory mapping Each program has its own page table to map virtual addresses to physical addresses virtual Address Physical Address The page table resides in memory, and is pointed to by the page table register The page table has an entry for every possible page (in principle, not in practice...), no tags are necessary. A valid bit indicates whether the page is in memory or on disk. Page Table

Virtual to Physical Mapping Both virtual and physical address are broken down a page number and page offset (index) No tag - All entries are unique Example 4GB (32-bit) Virtual Address Space 32MB (25-bit) Physical Address Space 8 KB (13-bit) page size (block size) Virtual Page Number Page Offset 12 13 31 Translation Note: may involve reading from disk Page tables are stored in main MEM Physical Page Number Page Offset 12 13 24 A 32-bit virtual address is given to the V.M. hardware The virtual page number (index) is derived from this by removing the page (block) offset The Virtual Page Number is looked up in a page table When found, entry is either: The physical page number, if in memory V->1 The disk address, if not in memory (a page fault) V->0 If not found, the address is invalid

Virtual Memory (32-bit system): 8KB page size,16MB Mem 12 13 31 Virtual Address 13 19 219=512K 4GB / 8KB = 512K entries Page offset Index Phys. Page # Disk Address Virt. Pg.# V 1 2 512K ... 11 12 13 23 Physical Address

Virtual Memory Consists Bits for page address Bits for virtual page number Number of virtual pages Entries in the page table Bits for physical page number Number of physical pages Bits per page table line Total page table size

Write issues Write Through - Update both disk and memory + Easy to implement - Requires a write buffer - Requires a separate disk write for every write to memory - A write miss requires reading in the page first, then writing back the single word Write Back - Write only to main memory. Write to the disk only when block is replaced. + Writes are fast + Multiple writes to a page are combined into one disk write - Must keep track of when page has been written (dirty bit)

Page replacement policy Exact Least Recently Used (LRU) but it is expensive. So, use Approximate LRU: a use bit (or reference bit) is added to every page table line If there is a hit, PPN is used to form the address and reference bit is turned on so the bit is set at every access the OS periodically clears all use bits the page to replace is chosen among the ones with their use bit at zero Choose one entry as a victim randomly If the OS chooses to replace the page, the dirty bit indicates whether the page to be written out before its location in memory can be given to another (give a Figure)

Virtual memory example System with 20-bit V.A., 16KB pages, 256KB of physical memory Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N. Page Table: Access to: 0000 1000 1100 1010 1010 Virtual Page # Valid Physical Page #/ (index) Bit Disk address 000000 1 1001 000001 0 sector 5000... 000010 1 0010 000011 0 sector 4323… 000100 1 1011 000101 1 1010 000110 0 sector 1239... 000111 1 0001 PPN = 0010 Physical Address: 00 1000 1100 1010 1010 Access to: 0001 1001 0011 1100 0000 sector xxxx... PPN = Page Fault to sector 1239... 1 1010 Pick a page to “kick out” of memory (use LRU). Read data from sector 1239 into PPN 1010 Assume LRU is VPN 000101 for this example.