Download presentation
Presentation is loading. Please wait.
1
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach, Fifth Edition
2
2 In the beginning… Main memory (i.e., RAM) was faster than processors Memory access times less than clock cycle times Everything was “great” until ~1980 Copyright © 2012, Elsevier Inc. All rights reserved.
3
3 Memory Performance Gap Introduction
4
4 Copyright © 2012, Elsevier Inc. All rights reserved. Memory Performance Gap Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per bit than slower memory Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor Temporal and spatial locality insures that nearly all references can be found in smaller memories Gives the illusion of a large, fast memory being presented to the processor Introduction
5
5 Copyright © 2012, Elsevier Inc. All rights reserved. Memory Hierarchy Introduction
6
6 Copyright © 2012, Elsevier Inc. All rights reserved. Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors: Aggregate peak bandwidth grows with # cores: Intel Core i7 can generate two references per core per clock Four cores and 3.2 GHz clock 25.6 billion 64-bit data references/second + 12.8 billion 128-bit instruction references = 409.6 GB/s! DRAM bandwidth is only 6% of this (25 GB/s) Requires: Multi-port, pipelined caches Two levels of cache per core Shared third-level cache on chip Introduction
7
7 Copyright © 2012, Elsevier Inc. All rights reserved. Performance and Power High-end microprocessors have >10 MB on-chip cache Consumes large amount of area and power budget Introduction
8
8 Copyright © 2012, Elsevier Inc. All rights reserved. Cache Hits When a word is found in the cache, a hit occurs: Yay! Introduction
9
9 Copyright © 2012, Elsevier Inc. All rights reserved. Cache Misses When a word is not found in the cache, a miss occurs: Fetch word from lower level in hierarchy, requiring a higher latency reference Lower level may be another cache or the main memory Also fetch the other words contained within the block Takes advantage of spatial locality Place block into cache in a location determined by address Introduction
10
10 Copyright © 2012, Elsevier Inc. All rights reserved. Causes of Misses Miss rate Fraction of cache access that result in a miss Causes of misses Compulsory First reference to a block Capacity Blocks discarded and later retrieved Conflict Program makes repeated references to multiple addresses from different blocks that map to the same location in the cache Introduction
11
11 Copyright © 2012, Elsevier Inc. All rights reserved. Causes of Misses Introduction
12
12 Note that speculative and multithreaded processors may execute other instructions during a miss Reduces performance impact of misses Copyright © 2012, Elsevier Inc. All rights reserved. Quantifying Misses Introduction
13
13 Copyright © 2012, Elsevier Inc. All rights reserved. Cache Design 4 Questions: Where can a block be placed in the cache? How is a block found? Which block should be replaced? How is a write handled? Introduction
14
14 Copyright © 2012, Elsevier Inc. All rights reserved. Block Placement Divide cache into sets to reduce conflict misses n blocks per sets => n-way set associative Direct-mapped cache => one block per set Fully associative => one set Introduction
15
15 Copyright © 2012, Elsevier Inc. All rights reserved. Block Placement Introduction
16
16 Copyright © 2012, Elsevier Inc. All rights reserved. Block Placement For a direct-mapped cache High-order address bits determine block location (uniquely) For example: 32-bit addresses, 64 blocks, 8-byte blocks Low-order 3 address bits determine byte location in block High-order 32-3 = 29 bits determine block location in cache 64 blocks, so location is value of high-order 29 bits mod 64 Can simply use the low-order 6 bits of those 29 bits! Introduction
17
17 Block Placement In general Block address is divided into tag and index based on associativity An n-way associative cache has a number of sets equal to blocks/associativity, so index is lg blocks/associativity bits For example, a 2-way associative cache with 64 blocks has 32 sets, so 5 index bits Tag is used to identify the block during lookup Copyright © 2012, Elsevier Inc. All rights reserved.
18
18 Copyright © 2012, Elsevier Inc. All rights reserved. Causes of Misses Introduction
19
19 Copyright © 2012, Elsevier Inc. All rights reserved. Cache Lookup Easy for direct mapped, each block has a unique location Just need to verify the tag For n-way, check tag at index in all sets simultaneously Higher associativity means more-complex hardware (i.e., slower) Also, each entry needs a “valid” bit Avoids initialization issues Introduction
20
20 Copyright © 2012, Elsevier Inc. All rights reserved. Opteron L1 Data Cache (2-way) Introduction
21
21 Copyright © 2012, Elsevier Inc. All rights reserved. Replacement Easy for direct mapped, each block has a unique location (again) For higher associativities Least Recently Used (LRU) First In, First Out (FIFO) Random Introduction
22
22 Copyright © 2012, Elsevier Inc. All rights reserved. Writes Writing to cache: two strategies Write-through Immediately update lower levels of hierarchy Write-back Only update lower levels of hierarchy when an updated block is replaced (use dirty bit) Both strategies use write buffer to make writes asynchronous Introduction
23
23 Example 32-bit addressing; 128KB cache, 256B blocks (512 blocks), 2-way set associative (256 sets), LRU replacement policy Low-order 8 bits is offset, next lowest 8 bits is index, high-order 16 bits is tag Copyright © 2012, Elsevier Inc. All rights reserved.
24
24 Example From empty: Read 0xAABB0100 Tag: AABB, Index: 01, Offset: 00; place in set 1, bank 0 Miss (compulsory) Read 0xABBA0101 Tag: ABBA, Index: 01, Offset: 01; place in set 1, bank 1 Miss (compulsory) Read 0xABBA01A3 Tag: ABBA, Index: 01, Offset: A3; located in set 1, bank 1 Hit! Read 0xCBAF01FF Tag: CBAF, Index: 01, Offset: FF; place in set 1, bank 0 (LRU) Miss (conflict) Read 0xAABB01CC Tag: AABB, Index: 01, Offset: CC; place in set 1, bank 1 (LRU) Miss (conflict) Copyright © 2012, Elsevier Inc. All rights reserved.
25
25 Copyright © 2012, Elsevier Inc. All rights reserved. Basic Optimizations Six basic cache optimizations: Larger block size Reduces compulsory misses Increases conflict misses, increases miss penalty Larger total cache capacity Reduces capacity misses Increases hit time, increases power consumption Higher associativity Reduces conflict misses Increases hit time, increases power consumption Higher number of cache levels Reduces overall memory access time Giving priority to read misses over writes Reduces miss penalty Avoiding address translation in cache indexing Reduces hit time Introduction
26
26 Copyright © 2012, Elsevier Inc. All rights reserved. Pitfalls Ignoring the impact of the operating system on the performance of the memory hierarchy Memory Technology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.