ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

Virtual Storage SystemCS510 Computer ArchitecturesLecture Lecture 14 Virtual Storage System.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Lecture 12 Reduce Miss Penalty and Hit Time
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Caches Vincent H. Berk October 21, 2005
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
CS252/Culler Lec 4.1 1/31/02 CS203A Graduate Computer Architecture Lecture 14 Cache Design Taken from Prof. David Culler’s notes.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Translation Buffers (TLB’s)
ENGS 116 Lecture 141 Main Memory and Virtual Memory Vincent H. Berk October 26, 2005 Reading for today: Sections 5.1 – 5.4, (Jouppi article) Reading for.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2008.
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
Computer Architecture Lecture 28 Fasih ur Rehman.
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Lecture 15 Calculating and Improving Cache Perfomance
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Memory Hierarchy Virtual Memory, Address Translation
Morgan Kaufmann Publishers
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CMSC 611: Advanced Computer Architecture
Lecture 14: Reducing Cache Misses
Lecture 08: Memory Hierarchy Cache Performance
Translation Buffers (TLB’s)
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate
CSE451 Virtual Memory Paging Autumn 2002
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Cache - Optimization.
Translation Buffers (TLBs)
Review What are the advantages/disadvantages of pages versus segments?
10/18: Lecture Topics Using spatial locality
Presentation transcript:

ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday: Sections C.4 – C.7 Reading for Wednesday: Sections 5.1 – 5.3

ENGS 116 Lecture 122 Improving Cache Performance Average memory-access time (AMAT) = Hit time + Miss rate  Miss penalty (ns or clocks) Improve performance by: 1. Reducing the miss rate 2. Reducing the miss penalty 3. Reducing the time to hit in the cache

ENGS 116 Lecture 123 Reducing Miss Rate Larger Blocks Larger Cache Higher Associativity

ENGS 116 Lecture 124 Classifying Misses: 3 Cs Compulsory: The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses. (Misses even in an infinite cache) Capacity: If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in fully associative, size X cache) Conflict: If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses. (Misses in N-way set associative, size X cache)

ENGS 116 Lecture 125 3Cs Absolute Miss Rate (SPEC92) Conflict Cache Size (KB) Miss Rate per Type way 2-way 4-way 8-way Capacity Compulsory Compulsory vanishingly small

ENGS 116 Lecture 126 2:1 Cache Rule Cache Size (KB) Miss Rate per Type way 2-way 4-way 8-way Capacity Compulsory Conflict miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2

ENGS 116 Lecture 127 3Cs Relative Miss Rate Cache Size (KB) Miss Rate per Type 0% 20% 40% 60% 80% 100% way 2-way 4-way 8-way Capacity Compulsory Conflict Flaws: for fixed block size Good: insight

ENGS 116 Lecture 128 How Can We Reduce Misses? 3 Cs: Compulsory, Capacity, Conflict In all cases, assume total cache size not changed What happens if we: 1)Change Block Size: Which of 3Cs is obviously affected? 2)Change Associativity: Which of 3Cs is obviously affected? 3)Change Compiler: Which of 3Cs is obviously affected?

ENGS 116 Lecture Reduce Misses via Larger Block Size

ENGS 116 Lecture Reduce Misses: Larger Cache Size Obvious improvement but: Longer hit time Higher cost Each cache size favors a block-size, based on memory bandwidth AMAT = Hit time + Miss rate  Miss penalty (ns or clocks)

ENGS 116 Lecture Reduce Misses via Higher Associativity 2:1 Cache Rule: –Miss Rate DM cache size N ≈ Miss Rate 2-way SA cache size N/2 Beware: Execution time is final measure! –Will clock cycle time increase? 8-Way is almost fully associative

ENGS 116 Lecture 1212 Example: Avg. Memory Access Time vs. Miss Rate Example: assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs. CCT direct mapped Cache SizeAssociativity (KB)1-way2-way4-way8-way (Red means A.M.A.T. not improved by more associativity)

ENGS 116 Lecture 1213 Reducing Miss Penalty Multilevel caches Read priority over write AMAT = Hit time + Miss rate  Miss penalty (ns or clocks)

ENGS 116 Lecture Reduce Miss Penalty: L2 Caches L2 Equations AMAT = Hit Time L1 + Miss Rate L1  Miss Penalty L1 Miss Penalty L1 = Hit Time L2 + Miss Rate L2  Miss Penalty L2 AMAT = Hit Time L1 + Miss Rate L1  (Hit Time L2 + Miss Rate L2  Miss Penalty L2 ) Definitions: –Local miss rate — misses in this cache divided by the total number of memory accesses to this cache (Miss rate L2 ) –Global miss rate — misses in the cache divided by the total number of memory accesses generated by the CPU (Miss Rate L1  Miss Rate L2 ) –Global miss rate is what matters —indicates what fraction of memory accesses from CPU go all the way to main memory

ENGS 116 Lecture 1215 Comparing Local and Global Miss Rates 32 KByte 1st level cache; Increasing 2nd level cache Global miss rate close to single level cache rate provided L2 >> L1 Don’t use local miss rate L2 not tied to CPU clock cycle! Cost & A.M.A.T. Generally fast hit times and fewer misses Since hits are few, target miss reduction

ENGS 116 Lecture 1216 L2 cache block size & A.M.A.T. 32KB L1, 8-byte path to memory

ENGS 116 Lecture Reduce Miss Penalty: Read Priority over Write on Miss Write through with write buffers offer RAW conflicts with main memory reads on cache misses If simply wait for write buffer to empty, might increase read miss penalty (old MIPS 1000 by 50%) Check write buffer contents before read; if no conflicts, let the memory access continue Write Back? –Read miss replacing dirty block –Normal: Write dirty block to memory, and then do the read –Instead copy the dirty block to a write buffer, then do the read, and then do the write –CPU stalls less frequently since restarts as soon as read finished

ENGS 116 Lecture 1218 Reducing Hit Time Avoiding Address Translation in index AMAT = Hit time + Miss rate  Miss penalty (ns or clocks)

ENGS 116 Lecture Fast Hits by Avoiding Address Translation Send virtual address to cache? Called Virtually Addressed Cache or Virtual Cache vs. Physical Cache –Every time process is switched logically must flush the cache; otherwise get false hits >> Cost is time to flush + “compulsory” misses from empty cache –Must handle aliases (sometimes called synonyms): Two different virtual addresses map to same physical address Solution to aliases –HW guarantees each block a unique physical address OR page coloring used to ensure virtual and physical addresses match in last x bits Solution to cache flush –Add process identifier tag that identifies process as well as address within process: cannot get a hit if wrong process

ENGS 116 Lecture 1320 Virtually Addressed Caches CPU TB $ MEM VA PA Conventional Organization CPU $ TB MEM VA PA Virtually Addressed Cache Translate only on miss Synonym Problem CPU $TB MEM VA PA Tags PA Overlap $ access with VA translation: requires $ index to remain invariant across translation VA Tags L2 $

ENGS 116 Lecture 1321 If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag Limits cache to page size: what if want bigger caches and uses same trick? –Higher associativity moves barrier to right –Page coloring (software OS requires that all Aliases share lower address bits, leads to set-associative pages!) 2. Fast Cache Hits by Avoiding Translation: Index with Physical Portion of Address 0 Page Address Page Offset Address TagIndexBlock Offset

ENGS 116 Lecture 1422 Virtual Memory Virtual Address (2 32, 2 64 ) to Physical Address mapping (2 28 ) Virtual memory in terms of cache: –Cache block? –Cache miss? How is virtual memory different from caches? –What controls replacement –Size (transfer unit, mapping mechanisms) –Lower-level use

ENGS 116 Lecture 1423 Figure 5.36The logical program in its contiguous virtual address space is shown on the left; it consists of four pages A, B, C, and D. C A B 0 4K 8K 12K 16K 20K 24K 28K Physical address: A C B D 0 4K 8K 12K Virtual address: Physical main memory Virtual memory D Disk

ENGS 116 Lecture 1424 Figure 5.37Typical ranges of parameters for caches and virtual memory.

ENGS 116 Lecture 1425 Virtual Memory 4 Questions for Virtual Memory (VM)? –Q1:Where can a block be placed in the upper level? fully associative, set associative, or direct mapped? –Q2:How is a block found if it is in the upper level? –Q3:Which block should be replaced on a miss? random or LRU? –Q4:What happens on a write? write back or write through? Other issues: size; pages or segments or hybrid

ENGS 116 Lecture 1426 Figure 5.40The mapping of a virtual address to a physical address via a page table. Page offsetVirtual page number Virtual address Page table Physical address Main memory

ENGS 116 Lecture 1427 Fast Translation: Translation Buffer (TLB) Cache of translated addresses Data portion usually includes physical page frame number, protection field, valid bit, use bit, and dirty bit Alpha data TLB: 32-entry fully associative Page-frame address Page offset Tag Physical page # (low-order 13 bits of address) 34-bit physical address (high-order 21 bits of address) 32:1 MUX VRW  

ENGS 116 Lecture 1428 Selecting a Page Size Reasons for larger page size –Page table size is inversely proportional to the page size; therefore memory saved –Fast cache hit time easy when cache ≤ page size (VA caches); bigger page makes it feasible as cache grows in size –Transferring larger pages to or from secondary storage, possibly over a network, is more efficient –Number of TLB entries is restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses Reasons for a smaller page size –Fragmentation: don’t waste storage; data must be contiguous within page –Quicker process start for small processes Hybrid solution: multiple page sizes –Alpha: 8 KB, 16 KB, 32 KB, 64 KB pages (43, 47, 51, 55 virtual addr bits)

ENGS 116 Lecture 1429 Alpha VM Mapping “64-bit” address divided into 3 segments –seg0 (bit 63 = 0) user code/heap –seg1 (bit 63 = 1, 62 = 1) user stack –kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page –Alpha only 43 bits of VA –(future min page size up to 64 KB 55 bits of VA) PTE bits; valid, kernel & user, read & write enable (no reference, use, or dirty bit) –What do you do? Page table entry Page Table Base Register Physical address page offset physical page-frame number Main memory Virtual address page offsetlevel3 seg0/seg1 selector level1level … 0 or 111 … 1 8 bytes 32 bit address 32 bit fields L2 page table L3 page table L1 page table

Memory Hierarchy

ENGS 116 Lecture 1431 Protection Avoid separate processes to access each others memory –Causes Segmentation Fault: sigSEGV –Useful for Multitasking systems –Operating system issue Each Process has its own state –Page tables –Heap, Text, Stack pages –Registers, PC To prevent processes from modifying their own page tables: –Rings of protection, Kernel vs. User To prevent processes from modifying other process memory: –Page tables point to distinct physical pages

ENGS 116 Lecture 1432 Protection 2 Each page needs: –PID bit –Read/Write/Execute bit Each process needs –Stack frame page(s) –Text or code pages –Data or heap pages –State table keeping: »PC and other CPU status registers »State of all registers

ENGS 116 Lecture 1433 Alpha Separate Instruction & Data TLB & Caches TLBs fully associative TLB updates in SW (“Private Arch Lib”) Caches 8KB direct mapped, write through Critical 8 bytes first Prefetch instr. stream buffer 2 MB L2 cache, direct mapped, WB (off-chip) 256 bit path to main memory, 4 x 64-bit modules Victim buffer: to give read priority over write 4-entry write buffer between D$ & L2$ Stream Buffer Write Buffer Victim Buffer Instr Data