Download presentation
Presentation is loading. Please wait.
Published byCorey Randall Modified over 9 years ago
1
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 1 Lecture 12 Storage Hierarchy
2
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 2 Who Cares about Memory Hierarchy? Processor Only Thus Far in Course –CPU cost/performance, ISA, Pipelined Execution 1980: no cache in mproc; 1995 2-level cache, 60% transistors on Alpha 21164 mproc CPU-DRAM Gap
3
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 3 General Principles Locality –Temporal Locality: referenced again soon –Spatial Locality: nearby items referenced soon Locality + smaller HW is faster = memory hierarchy –Levels: smaller, faster, more expensive/byte than the level below –Inclusive: data found in top also found in the bottom Definitions –Upper is closer to processor –Block: minimum unit that present or not in the upper level –Address = Block frame address + block offset address –Hit time: time to access the upper level, including hit determination
4
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 4 MeasuresMeasures Hit rate: fraction found in that level –So high that usually talk about Miss Rate or Fault Rate –Miss rate fallacy: as MIPS to CPU performance, miss rate to average memory access time in memory Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks)) Miss penalty:: time to replace a block from the lower level, including to replace in CPU –access time : time to access the lower level = (lower level latency) –transfer time : time to transfer block = (BW upper & lower, block size)
5
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 5 Block Size vs. Measures Increasing Block Size generally increases Miss Penalty and decreases Miss Rate Miss Penalty x Miss Rate = Avg. Memory Access Time(AMAT) = Miss Penalty Transfer Time Access Time Block Size Miss Rate Block Size Average Memory Access Time Block Size
6
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 6 Implications for CPU Fast hit check since every memory access needs it –Hit is the common case Unpredictable memory access time –10s of clock cycles: wait –1000s of clock cycles: Interrupt & switch & do something else New style: multithreaded execution How to handle miss (10s => HW, 1000s => SW)?
7
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 7 4 Questions for Memory Hierarchy Designers Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)
8
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 8 Q1: Block Placement: Where can a Block be Placed in the Upper Level? Block 12 placed in 8 block cache –Fully Associative(FA), Direct Mapped, 2-way Set Associative(SA) –SA Mapping ; (Block #) Modulo(# of Sets) Block Number 0 1 2 3 4 5 6 7 Fully Associative: Block 12 can go anywhere... 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1 1 1 Block Number Memory Block Frame Address Direct mapped: Block 12 can go only into Block 4 (12 Mod 8)= 4 0 1 2 3 4 5 6 7 Set Set 0 1 2 3 Set Associative: Block 12 can go anywhere in Set 0 (12 Mod 4)= 0 0 1 2 3 4 5 6 7
9
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 9 Q2: Block Identification: How to Find a Block in the Upper Level? Tag on each block –No need to check index or block offset Increasing associativity shrinks index, expands tag FAM: No index DM: Large index Block Address TagIndex Block Offset
10
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 10 Q3: Block Replacement: Which Block Should be Replaced on a Miss? Easy for Direct Mapped SAM or FAM: –Random –LRU Miss Rates Associativity:2-way 4-way 8-way Cache Size LRU Random LRU Random LRU Random 16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96% 64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53% 256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
11
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 11 Q4: Write Strategy: What Happens on a Write? DLX : store 9%, load 26% in integer programs –STORE: 9%/(100%+26%+9%) 7% of the overall memory traffic 9%/(26%+9%) 25% of the data cache traffic –READ access is majority, thus to make the common case fast: optimizing caches for reads –High performance designs cannot neglect the speed of WRITEs
12
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 12 Q4: What Happens on a Write? Write Through: The information is written to both the block in the cache and to the block in the lower-level memory. Write Back: The information is written only to the block in the cache. The modified cache block(Dirty Block) is written to main memory only when it is replaced. –is block clean or dirty? Pros and Cons of each: –WT: read misses cannot result in writes (because of replacements) –WB: no writes of repeated writes WT needs to be combined with write buffers so that don’t wait for lower level memory
13
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 13 Q4: What Happens on a Write? Write Miss –Write Allocate (fetch on write) –No-Write Allocate (write around) WB caches generally use Write Allocate, while WT caches often use No-Write Allocate
14
Storage HierarchyCS510 Computer ArchitectureLecture 12 - 14 SummarySummary CPU-Memory gap is major performance obstacle for performance, HW and SW Take advantage of program behavior: locality Time of program still only reliable performance measure 4Qs of memory hierarchy –Block Placement –Block Identification –Block Replacement –Write Strategy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.