Download presentation
Presentation is loading. Please wait.
Published byLenard McDowell Modified over 8 years ago
1
15-213 Recitation 6 – 3/11/01 Outline Cache Organization Replacement Policies MESI Protocol –Cache coherency for multiprocessor systems Anusha e-mail: anusha@andrew.cmu.edu Office Hours: Tuesday 11:30-1:00 Wean Cluster 52xx Reminders Lab 4 due Tuesday night Exam 1 Grade Adjustments at end of recitation
2
Cache organization (review) B–110 B–110 valid tag set 0: B = 2 b bytes per cache block E lines per set S = 2 s sets t tag bits per line 1 valid bit per line B–110 B–110 valid tag set 1: B–110 B–110 valid tag set S-1: Cache is an array of sets. Each set contains one or more lines. Each line holds a block of data.
3
Addressing the cache (review) Address A is in the cache if its tag matches one of the valid lines in the set associated with the set index of A t bitss bits b bits 0m-1 Address A: B–110 B–110 v v tag set s:
4
Parameters of cache organization Parameters: –s = set index –b = byte offset –t = tag –m = address size –t + s + b = m B = 2 b = line size E = associativity (# lines per set) S = 2 s = number of sets Cache size = B × E × S
5
Determining cache parameters Suppose we are told we have a 8 KB, direct-map cache with 64 byte lines, and the word size is 32 bits. –A direct-map cache has an associativity of 1. What are the values of t, s, and b? B = 2 b = 64, so b = 6 B × E × S = C = 8192 (8 KB), and we know E = 1 S = 2 s = C / B = 128, so s = 7 t = m – s – b = 32 – 6 – 7 = 19 031512 t = 19s = 7b = 6
6
One more example Suppose our cache is 16 KB, 4-way set associative with 32 byte lines. These are the parameters to the L1 cache of the P3 Xeon processors used by the fish machines. B = 2 b = 32, so b = 5 B × E × S = C = 16384 (16 KB), and E = 4 S = 2 s = C / (E × B) = 128, so s = 7 t = m – s – b = 32 – 5 – 7 = 20 031411 t = 20s = 7b = 5
7
Example 1: Direct Mapped Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Assume Direct mapped cache, 4 four-byte lines, 6 bit addresses (t=2,s=2,b=2): LineVTagByte 0Byte 1Byte 2Byte 3 0 1 2 3
8
Direct Mapped Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Assume Direct mapped cache, 4 four-byte lines, Final state: LineVTagByte 0Byte 1Byte 2Byte 3 010116171819 11004567 21 891011 30
9
Example 2: Set Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Four-way set associative, 4 sets, one-byte blocks (t=4,s=2,b=0): SetVTagLine 0/2VTagLine 1/3 0 1 2 3
10
Set Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Four-way set associative, 4 sets, one-byte block, Final state: SetVTagLine 0/2VTagLine 1/3 0100014100108 1010120 1100001100015 1010017100109 2100016 310100191001011 1101043
11
Example 3: Fully Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Fully associative, 4 four-word blocks (t=4,s=0,b=2): SetVTagByte 0Byte 1Byte 2Byte 3 0 1 2 3
12
Fully Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Fully associative, 4 four-word blocks (t=4,s=0,b=2): SetVTagByte 0Byte 1Byte 2Byte 3 01101040414243 110010891011 21010016171819 3100014567 Note: Used LRU eviction policy
13
Replacement Policy Replacement policy: –Determines which cache line to be evicted –Matters for set-associative caches Non-existant for direct-mapped cache
14
Example Assuming a 2-way associative cache, determine the number of misses for the following trace. A B C A B C B A B D A, B, C, D all mapped to the same set.
15
Ideal Case: OPTIMAL Policy 0: OPTIMAL –Replace the cache line that is accessed furthest in the future Properties: –Knowledge of the future –The best case scenario
16
Ideal Case: OPTIMAL ABCABCBABDABCABCBABD Optimal # of Misses A, + A,B+ A,C+ A,C B,C+ B,C B,A+ B,A D,A+ 6
17
Policy 1: FIFO –Replace the oldest cache line
18
Policy 1: FIFO ABCABCBABDABCABCBABD Optimal # of Misses A, + A,B+ A,C+ A,C B,C+ B,C B,A+ B,A D,A+ 6 FIFO A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ 9
19
Policy 2: LRU Policy 2: Least-Recently Used –Replace the least-recently used cache line Properties: –Approximate the OPTIMAL policy by predicting the future behavior using the past behavior The least-recently used cache line will not be likely to be accessed again in near future
20
Policy 2: LRU ABCABCBABDABCABCBABD Optimal # of Misses A, + A,B+ A,C+ A,C B,C+ B,C B,A+ B,A D,A+ 6 FIFO A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ 9 LRU A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C B,A+ B,A B,D+ 8
21
Realty: Pseudo LRU Realty –LRU is hard to implement –Pseudo LRU is implemented as an approximation of LRU Pseudo LRU –Each cache line is equipped with a bit –The bit is cleared periodically –The bit is set when the cache line is accessed –Evict the cache line that has the bit unset
22
Multiprocessor Systems Multiprocessor systems are common, but they are not as easy to build as “adding a processor” Might think of a multiprocessor system like this: Processor 1 Processor 2 Memory
23
The Problem… Caches can become unsynchronized –Big problem for any system. Memory should be viewed consistently by each processor Processor 1 Processor 2 Memory Cache 1 Cache 2
24
Cache Coherency Imagine that each processor’s cache could see what the other is doing –Both of them could stay up to date (“coherent”) –How they manage to do so is a “cache coherency protocol” The most widely used protocol is MESI –MESI = Modified Exclusive Shared Invalid –Each of these is a state for each cache line –Invalid – Data is invalid and must be retrieved from memory –Exclusive – This processor has exclusive access to the data –Shared – Other caches have copies of the data –Modified – This cache holds a modified copy of the data (other caches do not have the updated copy)
25
MESI Protocol
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.