Presentation is loading. Please wait.

Presentation is loading. Please wait.

10/20: Lecture Topics HW 3 Problem 2 Caches –Types of cache misses –Cache performance –Cache tradeoffs –Cache summary Input/Output –Types of I/O Devices.

Similar presentations


Presentation on theme: "10/20: Lecture Topics HW 3 Problem 2 Caches –Types of cache misses –Cache performance –Cache tradeoffs –Cache summary Input/Output –Types of I/O Devices."— Presentation transcript:

1 10/20: Lecture Topics HW 3 Problem 2 Caches –Types of cache misses –Cache performance –Cache tradeoffs –Cache summary Input/Output –Types of I/O Devices –How devices communicate with the rest of the system communicating with the processor communicating with memory

2 Problem #2 on HW 3 move $a0, $s0 move $a1, $s1 move $a2, $s2 move $a3, $s3 # Position A jal Add4 # Position B move $t0, $v0 move $a0, $s4 move $a1, $s5 move $a2, $s6 move $a3, $s7 # Position C jal Add4 # Position D move $t1, $v0 add $t2, $t0, $t1 Add4: # Position E jal Add2 # Position F move $s0, $v0 move $a0, $a2 move $a1, $a3 # Position G jal Add2 # Position H move $s1, $v0 add $v0, $s0, $s1 # Position I jr $ra Add2: add $v0, $a0, $a1 jr $ra

3 Preservation Conventions PreservedNot Preserved Saved registers: $s0- $s7 Stack pointer register: $sp Return address register: $ra Stack above the stack pointer Temporary registers: $t0-$t9 Argument registers: $a0-$a3 Return value registers: $v0-$v1 Stack below the stack pointer

4 Callee-Saved Registers Add4: jal Add2 move $s0, $v0 move $a0, $a2 move $a1, $a3 jal Add2 move $s1, $v0 add $v0, $s0, $s1 jr $ra

5 Caller-Saved Registers move $a0, $s0 move $a1, $s1 move $a2, $s2 move $a3, $s3 jal Add4 move $t0, $v0 move $a0, $s4 move $a1, $s5 move $a2, $s6 move $a3, $s7 jal Add4 move $t1, $v0 add $t2, $t0, $t1

6 Tag, Index, Block Offset Recall an address can be decomposed into [tag,index,block offset] The general rule for determining this decomposition is to start from the right and work to the left Be careful of word vs. byte addresses

7 Steps to bits for tag,index,b.o. Step 1: Determine how many bits for the block offset. If the block size is 2 b bytes, then b bits are required for the block offset Step 2: Determine how many blocks fit in the cache. (Bytes in cache)/(Bytes in a block). Step 3: Determine how many rows (unique indices) the cache has. –For direct mapped, rows = number of blocks –For fully associative, rows = 1 –For set associative, rows = (number of blocks)/associativity

8 Steps to bits for tag,index,b.o. Step 4: Determine how many bits are needed to represent the index. If there are 2 r rows then you r bits. Step 5: Tag bits are whatever is left over from Step 1 and Step 4.

9 Cache Examples 4Kbyte, 8-way associative, cache with 2 words per block –How do you split up the address?

10 i-Cache and d-Cache There usually are two separate caches for instructions and data. Why? –Avoids structural hazards in pipelining –The combined cache is twice as big but still has an access time of a small cache –Allows both caches to operate in parallel, for twice the bandwidth

11 Handling i-Cache Misses 1.Stall the pipeline and send the address of the missed instruction to the memory 2.Instruct memory to perform a read; wait for the access to complete 3. Update the cache 4. Restart the instruction, this time fetching it successfully from the cache d-Cache misses are even easier, but still require a pipeline stall

12 Cache Replacement How do you decide which cache block to replace? If the cache is direct-mapped, it’s easy Otherwise, common strategies: –Random –Least Recently Used (LRU) –Other strategies are used at lower levels of the hierarchy. More on those later.

13 LRU Replacement Replace the block that hasn’t been used for the longest time. Reference stream: A B C D B D E B A C B C E D C B

14 LRU Implementations LRU is very difficult to implement for high degrees of associativity 4-way approximation: –1 bit to indicate least recently used pair –1 bit per pair to indicate least recently used item in this pair Much more complex approximations at lower levels of the hierarchy

15 The Three C’s of Caches Three reasons for cache misses: –Compulsory miss: item has never been in the cache –Capacity miss: item has been in the cache, but space was tight and it was forced out (occurs even with fully associative caches) –Conflict miss: item was in the cache, but the cache was not associative enough, so it was forced out (never occurs with fully associative caches)

16 Eliminating Cache Misses What cache parameters (cache size, block size, associativity) can you change to eliminate the following kinds of misses –compulsory –capacity –conflict

17 Multi-Level Caches Use each level of the memory hierarchy as a cache over the next lowest level Inserting level 2 between levels 1 and 3 allows: –level 1 to have a higher miss rate (so can be smaller and cheaper) –level 3 to have a larger access time (so can be slower and cheaper) The new effective access time equation:

18 Which cache system is better? 32 KB unified data and instruction cache –hit rate of 97% 16 KB data cache –hit rate of 92% And 16 KB instruction cache –hit rate of 98% Assume –20% of instructions are loads or stores

19 Cache Parameters and Tradeoffs If you are designing a cache, what choices do you have and what are their tradeoffs?

20 Cache Comparisons Alpha 21164MIPS R10000Pentium ProUltraSparc 1 8KB direct-mapped 32B block 32KB 2-way (LRU) 64B block 8KB 4-way 32B block 16KB pseudo 2-way 32B block Alpha 21164MIPS R10000Pentium ProUltraSparc 1 8KB direct-mapped 32B block 32KB 2-way (LRU) 32B block 8KB 2-way 32B block 16KB direct-mapped 32B block Alpha 21164Pentium Pro 96KB 3-way 64B block on chip 256KB 4-way 32B block same package L1 i-Cache L1 d-Cache L2 unified Cache

21 Summary: Classifying Caches Where can a block be placed? –Direct mapped, Set/Fully associative How is a block found? –Direct mapped: by index –Set associative: by index and search –Fully associative: by search What happens on a write access? –Write-back or Write-through Which block should be replaced? –Random –LRU (Least Recently Used)


Download ppt "10/20: Lecture Topics HW 3 Problem 2 Caches –Types of cache misses –Cache performance –Cache tradeoffs –Cache summary Input/Output –Types of I/O Devices."

Similar presentations


Ads by Google