Download presentation
Presentation is loading. Please wait.
Published byMalcolm Parrish Modified over 6 years ago
1
The Goal: illusion of large, fast, cheap memory
Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism
2
Types of Memories SRAM: DRAM:
value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10)
3
Exploiting Memory Hierarchy
Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns at cost of $5 to $10 per Mbyte. Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte. Try and give it to them anyway build a memory hierarchy 1997
4
An Expanded View of the Memory System
Processor Control Memory Memory Memory Datapath Memory Memory Instead, the memory system of a modern computer consists of a series of black boxes ranging from the fastest to the slowest. Besides variation in speed, these boxes also varies in size (smallest to biggest) and cost. What makes this kind of arrangement work is one of the most important principle in computer design. The principle of locality. +1 = 7 min. (X:47) Slowest Speed: Fastest Biggest Size: Smallest Lowest Cost: Highest
5
Memory Hierarchy: How Does it Work?
Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): => Move blocks consisting of contiguous words to the upper levels How does the memory hierarchy work? Well it is rather simple, at least in principle. In order to take advantage of the temporal locality, that is the locality in time, the memory hierarchy will keep those more recently accessed data items closer to the processor because chances are (points to the principle), the processor will access them again soon. In order to take advantage of the spatial locality, not ONLY do we move the item that has just been accessed to the upper level, but we ALSO move the data items that are adjacent to it. +1 = 15 min. (X:55) Lower Level Memory Upper Level To Processor From Processor Blk X Blk Y
6
Memory Hierarchy of a Modern Computer System
By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. Processor Control Tertiary Storage (Disk) Secondary Storage (Disk) Second Level Cache (SRAM) Main Memory (DRAM) The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. (We will go over this slide in details in the next lecture on caches). +1 = 16 min. (X:56) Datapath On-Chip Cache Registers Speed (ns): 1s 10s 100s 10,000,000s (10s ms) 10,000,000,000s (10s sec) Size (bytes): 100s Ks Ms Gs Ts
7
How is the hierarchy managed?
Registers <-> Memory by compiler (programmer?) cache <-> memory by the hardware memory <-> disks by the hardware and operating system (virtual memory) by the programmer (files)
8
Hits vs. Misses Read hits this is what we want! Read misses
stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word
9
Cache Two issues: Our first example: block size is one word of data
How do we know if a data item is in the cache? If it is, how do we find it? Our first example: block size is one word of data "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level
10
Direct Mapped Cache Mapping: address is modulo the number of blocks in the cache
11
Direct Mapped Cache For MIPS What kind of locality are we taking advantage of?
12
Direct Mapped Cache Taking advantage of spatial locality:
13
Choosing a block size Large block sizes help with spatial locality, but... It takes time to read the memory in Larger block sizes increase the time for misses It reduces the number of blocks in the cache Number of blocks = cache size/block size Need to find a middle ground 16-64 bytes works nicely Use split caches because there is more spatial locality in code
14
Performance Simplified model: execution time = (execution cycles + stall cycles) ´ cycle time stall cycles = # of instructions ´ miss ratio ´ miss penalty Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty What happens if we increase block size?
15
Decreasing miss ratio with associativity
16
Fully Associative vs. Direct Mapped
Fully associative caches provide much greater flexibility Nothing gets “thrown out” of the cache until it is completely full Direct-mapped caches are more rigid Any cached data goes directly where the index says to, even if the rest of the cache is empty A problem, though... Fully associative caches require a complete search through all the tags to see if there’s a hit Direct-mapped caches only need to look one place
17
A Compromise 7.3 2-Way set associative 4-Way set associative
0: 1: 2: 3: 4: 5: 6: 7: V Tag Data 0: 1: 2: 3: V Tag Data Each address has four possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes Two fewer index bits: 1/4 the indexes Address = Tag | Index | Block offset Address = Tag | Index | Block offset 7.3
18
Decreasing miss penalty with multilevel caches
Add a second level cache: often primary cache is on the same chip as the processor use SRAMs to add another cache above primary memory (DRAM) miss penalty goes down if data is in 2nd level cache Example: CPI of 1.0 on a 500Mhz machine with a 5% miss rate, 200ns DRAM access Adding 2nd level cache with 20ns access time decreases miss rate to 2% Using multilevel caches: try and optimize the hit time on the 1st level cache try and optimize the miss rate on the 2nd level cache
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.