Download presentation
Presentation is loading. Please wait.
Published byReynold Ryan Modified over 9 years ago
1
Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy & Storage CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Fall, 2008 Portions of these slides are derived from: Dave Patterson © UCB
2
Review of Mem. HierarchyCSCE430/830 The Principle of Locality The Principle of Locality: –Program access a relatively small portion of the address space at any instant of time. Two Different Types of Locality: –Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) –Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Last 15 years, HW relied on locality for speed It is a property of programs which is exploited in machine design.
3
Review of Mem. HierarchyCSCE430/830 Memory Hierarchy - the Big Picture Problem: memory is too slow and too small Solution: memory hierarchy Control Datapath Secondary Storage (Disk) Processor Registers L2 Off-Chip Cache Main Memory (DRAM) L1 On-Chip Cache 0.5-25 5,000,000 (5ms)Speed (ns):80-250 <1K Size (bytes):>100G <16G<16M 0.25-0.5
4
Review of Mem. HierarchyCSCE430/830 Fundamental Cache Questions Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)
5
Review of Mem. HierarchyCSCE430/830 Q1: Where can a block be placed in the upper level? Block 12 placed in 8 block cache: –Fully associative, direct mapped, 2-way set associative –S.A. Mapping = (Block Number) Modulo (Number Sets) Cache 01234567 Memory 1111111111222222222233 01234567890123456789012345678901 Full Mapped Direct Mapped (12 mod 8) = 4 2-Way Assoc (12 mod 4) = 0
6
Review of Mem. HierarchyCSCE430/830 Q2: How is a block found if it is in the upper level? Tag on each block –No need to check index or block offset Increasing associativity shrinks index, expands tag Block Offset Block Address IndexTag
7
Review of Mem. HierarchyCSCE430/830 Q3: Which block should be replaced on a miss? Easy for Direct Mapped Set Associative or Fully Associative: –Random –LRU (Least Recently Used) Assoc: 2-way 4-way 8-way Size LRU Ran LRU Ran LRU Ran 16 KB5.2%5.7% 4.7%5.3%4.4%5.0% 64 KB1.9%2.0% 1.5%1.7% 1.4%1.5% 256 KB1.15%1.17% 1.13% 1.13% 1.12% 1.12%
8
Review of Mem. HierarchyCSCE430/830 Q4: What happens on a write? Write-ThroughWrite-Back Policy Data written to cache block also written to lower- level memory Write data only to the cache Update lower level when a block falls out of the cache DebugEasyHard Do read misses produce writes? NoYes Do repeated writes make it to lower level? YesNo Additional option (on miss)-- let writes to an un-cached address allocate a new cache line (“write-allocate”).
9
Review of Mem. HierarchyCSCE430/830 Set Associative Cache Design Key idea: –Divide cache into sets –Allow block anywhere in a set Advantages: –Better hit rate Disadvantage: –More tag bits –More hardware –Higher access time A Four-Way Set-Associative Cache
10
Review of Mem. HierarchyCSCE430/830 Cache Performance Measures Hit rate: fraction found in the cache –So high that we usually talk about Miss rate = 1 - Hit Rate Hit time: time to access the cache Miss penalty: time to replace a block from lower level, including time to replace in CPU –access time : time to acccess lower level –transfer time : time to transfer block Average memory-access time (AMAT) = Hit time + Miss rate x Miss penalty (ns or clocks)
11
Review of Mem. HierarchyCSCE430/830 Miss-oriented Approach to Memory Access: –CPI Execution includes ALU and Memory instructions Cache performance Separating out Memory component entirely –AMAT = Average Memory Access Time –CPI ALUOps does not include memory instructions
12
Review of Mem. HierarchyCSCE430/830 Physical Memory Space Page table maps virtual page numbers to physical frames ( “PTE” = Page Table Entry) Virtual memory => treat memory cache for disk Details of Page Table Virtual Address Page Table index into page table Page Table Base Reg V Access Rights PA V page no.offset 12 table located in physical memory P page no.offset 12 Physical Address frame virtual address Page Table
13
Review of Mem. HierarchyCSCE430/830 Page tables may not fit in memory! A table for 4KB pages for a 32-bit address space has 1M entries Each process needs its own address space! P1 indexP2 indexPage Offset 31121102122 32 bit virtual address Top-level table wired in main memory Subset of 1024 second-level tables in main memory; rest are on disk or unallocated Two-level Page Tables
14
Review of Mem. HierarchyCSCE430/830 V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 “Page fault” Physical and virtual pages must be the same size! The TLB caches page table entries TLB Page Table 2 0 1 3 virtual address page off 2 framepage 2 50 physical address page off TLB caches page table entries. MIPS handles TLB misses in software (random replacement). Other machines use hardware. for ASID Physical frame address
15
Review of Mem. HierarchyCSCE430/830 Virtually Indexed, Physically Tagged Cache What motivation? Fast cache hit by parallel TLB access No virtual cache shortcomings How could it be correct? Require cache way size <= page size; now physical index is from page offset Then virtual and physical indices are identical ⇒ works like a physically indexed cache!
16
Review of Mem. HierarchyCSCE430/830 Virtually Indexed, Physically Tagged Cache 28
17
Review of Mem. HierarchyCSCE430/830 Summary #1/3: The Cache Design Space Several interacting dimensions –cache size –block size –associativity –replacement policy –write-through vs write-back –write allocation The optimal choice is a compromise –depends on access characteristics »workload »use (I-cache, D-cache, TLB) –depends on technology / cost Simplicity often wins Associativity Cache Size Block Size Bad Good LessMore Factor AFactor B
18
Review of Mem. HierarchyCSCE430/830 Summary #2/3: Caches The Principle of Locality: –Program access a relatively small portion of the address space at any instant of time. »Temporal Locality: Locality in Time »Spatial Locality: Locality in Space Three Major Categories of Cache Misses: –Compulsory Misses: sad facts of life. Example: cold start misses. –Capacity Misses: increase cache size –Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! Write Policy: Write Through vs. Write Back Today CPU time is a function of (ops, cache misses) vs. just f(ops): affects Compilers, Data structures, and Algorithms
19
Review of Mem. HierarchyCSCE430/830 Summary #3/3: TLB, Virtual Memory Page tables map virtual address to physical address TLBs are important for fast translation TLB misses are significant in processor performance –funny times, as most systems can’t access all of 2nd level cache without TLB misses! Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: 1) Where can block be placed? 2) How is block found? 3) What block is replaced on miss? 4) How are writes handled? Today VM allows many processes to share single memory without having to swap all processes to disk; today VM protection is more important than memory hierarchy benefits, but computers insecure Prepare for debate + quiz on Wednesday
20
Review of Mem. HierarchyCSCE430/830 Summary of Virtual Machine Monitor Virtual Machine Revival –Overcome security flaws of modern OSes –Processor performance no longer highest priority –Manage Software, Manage Hardware “… VMMs give OS developers another opportunity to develop functionality no longer practical in today’s complex and ossified operating systems, where innovation moves at geologic pace.” [Rosenblum and Garfinkel, 2005] Virtualization challenges for processor, virtual memory, I/O –Paravirtualization, ISA upgrades to cope with those difficulties Xen as example VMM using paravirtualization –2005 performance on non-I/O bound, I/O intensive apps: 80% of native Linux without driver VM, 34% with driver VM Opteron memory hierarchy still critical to performance
21
Review of Mem. HierarchyCSCE430/830 Disk Device Performance Platter Arm Actuator HeadSector Inner Track Outer Track Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead Seek Time? depends no. tracks move arm, seek speed of disk Rotation Time? depends on speed disk rotates, how far sector is from head Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request Controller Spindle
22
Review of Mem. HierarchyCSCE430/830 Redundant Arrays of (Inexpensive) Disks Files are "striped" across multiple disks Redundancy yields high data availability –Availability: service still provided to user, even if some components failed Disks will still fail Contents reconstructed from data redundantly stored in the array Capacity penalty to store redundant info Bandwidth penalty to update redundant info
23
Review of Mem. HierarchyCSCE430/830 Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes 1001001110010011 1100110111001101 1001001110010011 0011001000110010 1001001110010011 1001001110010011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.