Download presentation
Presentation is loading. Please wait.
Published byBriana Young Modified over 9 years ago
2
IntroductionSnoopingDirectoryConclusion
3
IntroductionSnoopingDirectoryConclusion
4
IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3C 5E Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3F 5E
5
IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3C 5E Memory 1A 2B 3F 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3F 5E
6
IntroductionSnoopingDirectoryConclusion The goal of a coherence protocol is to maintain coherence by enforcing the SWMR invariant: Single-Writer, Multiple-Read (SWMR) invariant: For any memory location “A”, at any given time, there exist only one core that may write to A or some number of cores that may read it. issued coherence requests & responses Core Cache Controller Interconnection Network Cache loads & stores loaded values received coherence requests & responses received coherence requests & responses Memory Controller Interconnection Network Memory issued coherence requests & responses
7
IntroductionSnoopingDirectoryConclusionCore Cache Controller Interconnection Network Cache Memory Controller Interconnection Network Memory
8
IntroductionSnoopingDirectoryConclusion
9
IntroductionSnoopingDirectoryConclusion
10
Transient states occur during the transition from one stable state to another one. XY z : the block is transition from stable state X to stable state Y and the transition will not be complete until an event of type Z occurs. IM D : denotes that a block was in the I state and will become in the M state when data (D) is received. IntroductionSnoopingDirectoryConclusion
11
IntroductionSnoopingDirectoryConclusion
12
To maintain the state of blocks in caches, the most common way is to add some extra bit at the end of each block. For example, in MOSEI we need 3 bits to show the state. To maintain the state of blocks in memory, we can use the same approach. Alternatively, we can use logical gates. For example we can use an NOR gate and if one of its inputs are OWNED = 1, the state of the block in memory would be I = 0. IntroductionSnoopingDirectoryConclusion Block DataState 10011…….000 -> I 11111…….001 -> O 00000…….101 -> M Block state in cache 1 Block state in cache 2 Block state in cache 3 State of block in memory
13
Most protocols have a similar set of transactions, because the basic goals of the coherence controllers are similar. Transactions are all initiated by cache controllers that are responding to requests from their associated cores IntroductionSnoopingDirectoryConclusion TransactionGoal GetShared (GetS)Obtain block in Shared (read-only) state. GetModified (GetM)Obtain block in Modified (read-write) state. Upgrade (Upg)Upgrade block state from read-only (Shared or Owned) to read-write (Modified); Upg (unlike GetM) does not require data to be sent to requestor. PutShared (PutS)Evict block in Shared state. PutExclusive (PutE)Evict block in Exclusive state. PutOwned (PutO)Evict block in Owned state. PutModified (PutM)Evict block in Modified state.
14
Events are core requests to their cache controllers. IntroductionSnoopingDirectoryConclusion EventResponse of Cache Controller Load if cache hit, respond with data from cache; else initiate GetS transaction Storeif cache hit in state E or M, write data into cache; else initiate GetM or Upg transaction Atomic read-modify-writeif cache hit in state E or M, atomically execute read- modify-write semantics; else initiate GetM or Upg transaction Instruction fetch if cache hit (in I-cache), respond with instruction from cache; else initiate GetS transaction Read-only prefetch if cache hit, ignore; else may optionally initiate GetS transaction Read-write prefetchIf cache hit in state M, ignore; else may optionally initiate GetM or Upg transaction Replacement depending on state of block, initiate PutS, PutE, PutO, or PutM transaction
15
IntroductionSnoopingDirectoryConclusion
16
IntroductionSnoopingDirectoryConclusion
17
IntroductionSnoopingDirectoryConclusion TimeC1C2Memory 0A:I A:I, Owner 1A: GetM from C1 /M, OwnerA: GetM from C1/IGetM from C1/ M 2A: GetM from C2 /IA: GetM from C2/M, OwnerGetM from C2/ M TimeC1C2Memory 0A:I A:I, Owner 1A: GetM from C1 /M, OwnerA: GetM from C2/M, OwnerGetM from C1/ M 2A: GetM from C2 /IA: GetM from C1/IGetM from C2/ M
18
IntroductionSnoopingDirectoryConclusion MAIN MEMORY core Interconnection network LLC/direct ory controller Last-level cache (LLC) Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE PROCESSOR CHIP
19
IntroductionSnoopingDirectoryConclusion
20
IntroductionSnoopingDirectoryConclusion StateState Core EventsBus Event Own Transaction Other Cores Transactions LoadStore Replacemen t GetSGetMPutMdataGetSGetMPutM I GetS/IS D IS D stall loadstall storestall evictcopy data into cache, load hit/S (A) IM D stall loadstall storestall evictcopy data into cache, store hit/M (A) S load hitGetM/SM D -/I SM D load hitstall storestall evictcopy data into cache, load hit/S (A) M load hitstore hitPutM, Send data to memory /I send data to req and memory/S send data to req/I
21
IntroductionSnoopingDirectoryConclusion stateBus Events GetSGetMPutMData from Owner IorSSend data block to requestor/IorS Send data block to requestor/M IorS D (A) Update data block in memory/IorS M-/IorS D
22
IntroductionSnoopingDirectoryConclusion
23
Implements atomic transactions and non-atomic request properties. The Exclusive state is used in almost all commercial coherence protocols because it optimizes a common case: a core first reads a block and then subsequently writes it. In MSI, a core needs to issue a GetS message to get the read permission (in case a cache miss) and then have to issue a GetM message to get the write permission. In MESI, a core can get the block in the exclusive state and no other block can access it anymore. Thus, the core does not need to issue a GetM message. IntroductionSnoopingDirectoryConclusion
24
LoadStoreRepl. GetSGetMPutM GetSGetMPutMData I GetS/ IS AD GetM/ IM AD --- IS AD stall -/IS D --- IS D stall (A) -/S-/E IM AD stall -/IM D --- IM D stall (A) -/M S hitGetM/ SM AD -/I - - SM AD hitstall -/SM D --/IM AD - SM D hitstall (A) -/M E hithit/MPutM/ EI A data to R & M/S data to R/I - M hit PutM/ MI A data to R & M/S data to R/I - MI A hit stall data to M/I data to M & R/II A data to R/II A - EI A hitstall -/Idata to M & R/II A data to R/II A - II A stall -/I--- IntroductionSnoopingDirectoryConclusion
25
GetSGetMPutMDataNoDataNoData-E Idata to R/EorM data to R/EorM -/I D Sdata to R/EorM data to R/EorM -/S D EorM-/SD--/EorM D IDID (A) write data to M/I -/I SDSD (A) write data to M/S -/S EorM D (A) write data to M/I -/EorM-/I IntroductionSnoopingDirectoryConclusion
26
IntroductionSnoopingDirectoryConclusion
27
IntroductionSnoopingDirectoryConclusion
28
IntroductionSnoopingDirectoryConclusion
29
IntroductionSnoopingDirectoryConclusion States Processor Core EventsBus Events loadstore replacement OwnGetSOwnGetMOwnPutMOtherGetSOtherGetMOtherPutM Own Data response I issue GetS/IS AD issue GetM/IM AD --- IS AD stall -/IS D --- IS D stall (A) -/S IM AD stall -/IM D --- IM D stall (A) -/M S hit issue GetM/SM AD -/I- - SM AD hitstall -/SM D -- SM D hitstall (A) -/M O hit issue GetM/OM A issue PutM/OI A send data to requestor send data to requestor/I - OM A hitstall -/M send data to requestor send data to requestor/IM AD - M hit issue PutM/MI A send data to requestor/O send data to requestor/I - MI A hit stall send data to memory/I send data to requestor/OI A send data to requestor/II A - OI A hitstall send data to memory/I send data to requestor send data to requestor/II A - II A stall send NoData to memory/I ---
30
IntroductionSnoopingDirectoryConclusion States Bus Events GetSGetMPutMData from OwnerNoData IorS send data to requestorsend data to requestor/MorO-/IorS D IorS D (A) write data to memory/IorS-/IorS MorO ---/MorO D MorO D (A) write data to memory/IorS-/MorO
31
IntroductionSnoopingDirectoryConclusion MSIMOSI # Messages 613 # Stalls 2024 MSIMOSI # Messages 22 # Stalls 00
32
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory issue GetS / IS AD IntroductionSnoopingDirectoryConclusion
33
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory issue GetM / IM AD IntroductionSnoopingDirectoryConclusion
34
Core 1 Cache Controller request on BUS - GetS (C1) Cache Core 2 Cache Controller Cache Memory Controller Memory IntroductionSnoopingDirectoryConclusion
35
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory - / IS D send data to C1 / IorS IntroductionSnoopingDirectoryConclusion
36
Core 1 Cache Controller data on BUS – data from LLC/mem Cache Core 2 Cache Controller Cache Memory Controller Memory IntroductionSnoopingDirectoryConclusion
37
Core 1 Cache Controller request on BUS – GetM (C2) Cache Core 2 Cache Controller Cache Memory Controller Memory copy data from LLC/mem / S IntroductionSnoopingDirectoryConclusion
38
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory - / I send data to C2 / MorO - / IM D IntroductionSnoopingDirectoryConclusion
39
Core 1 Cache Controller data on BUS – data from LLC/mem Cache Core 2 Cache Controller Cache Memory Controller Memory IntroductionSnoopingDirectoryConclusion
40
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory copy data from LLC/mem / M IntroductionSnoopingDirectoryConclusion
41
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory issue GetS / IS AD IntroductionSnoopingDirectoryConclusion
42
Core 1 Cache Controller request on BUS - GetS (C1) Cache Core 2 Cache Controller Cache Memory Controller Memory IntroductionSnoopingDirectoryConclusion
43
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory - / IS D - / MorO send data to C1 / O IntroductionSnoopingDirectoryConclusion
44
Core 1 Cache Controller data on BUS – data from C2 Cache Core 2 Cache Controller Cache Memory Controller Memory IntroductionSnoopingDirectoryConclusion
45
Core 1 Cache Controller BUS Cache Core 2 Cache Controller Cache Memory Controller Memory copy data from C2 / S IntroductionSnoopingDirectoryConclusion
46
IntroductionSnoopingDirectoryConclusion
47
Address Bus Request 1Request 2Request 3 Data BusResponse 1Response 2Response 3 Atomic Bus IntroductionSnoopingDirectoryConclusion
48
Address Bus Request 1Request 2Request 3 Data BusResponse 1Response 2Response 3 Address Bus Request 1Request 2Request 3 Data BusResponse 2Response 3Response 1 IntroductionSnoopingDirectoryConclusion
49
IntroductionSnoopingDirectoryConclusion
50
IntroductionSnoopingDirectoryConclusion FIFO queues for buffering incoming & outgoing messages Memory controller does not have a connection to make requests
51
IntroductionSnoopingDirectoryConclusion States Processor Core EventsBus Events loadstore replacement OwnGetS or OwnGetM OwnGetMOwnPutMOtherGetSOtherGetMOtherPutM Own Data response (for own request) I issue GetS/IS AD issue GetM/IM AD --- IS AD stall -/IS D ----/-/IS A IS D stall - load hit/S IS A stall load hit/S -- IM AD stall -/IM D ----/IM A IM D stall store hit/M IM A stall store hit/M -- S hit issue GetM/SM AD -/I - SM AD hitstallistall-/SM D --/IM AD -/SM A SM D hitstall store hit/M SM A hitstall store hit/M --/IM A M hit issue PutM/MI A send data to requestor and to memory/S send data to requestor/I MI A hit stall send data to requestor/I send data to requestor and to memory/ II A send data to requestor/ II A II A stall -/I---
52
IntroductionSnoopingDirectoryConclusion States Processor Core EventsBus Events loadstore replacement OwnGetS or OwnGetM OwnGetMOwnPutMOtherGetSOtherGetMOtherPutM Own Data response (for own request) I issue GetS/IS AD issue GetM/IM AD --- IS AD stall -/IS D ----/-/IS A IS D stall - load hit/S IS A stall load hit/S -- IM AD stall -/IM D ----/IM A IM D stall store hit/M IM A stall store hit/M -- S hit issue GetM/SM AD -/I - SM AD hitstallistall-/SM D --/IM AD -/SM A SM D hitstall store hit/M SM A hitstall store hit/M --/IM A M hit issue PutM/MI A send data to requestor and to memory/S send data to requestor/I MI A hit stall send data to requestor/I send data to requestor and to memory/ II A send data to requestor/ II A II A stall -/I--- It now can receive an Other-GetS
53
IntroductionSnoopingDirectoryConclusion States Bus Events GetSGetMPutM from OwnerPutM from Non-OwnerData IorS send data to requestor send data to requestor, set Owner to requestor/M - M clear Owner/IorS D set Owner to requestorclear Owner/IorS D -write data to memory/IorS A IorS D stall -write data to memory/IorS IorS A clear Owner/IorS- -
54
IntroductionSnoopingDirectoryConclusion MSIMSI with Split-Transaction Bus # Messages65 # Stalls2033 MSIMSI with Split-Transaction Bus # Messages22 # Stalls03
55
IntroductionSnoopingDirectoryConclusion
56
IntroductionSnoopingDirectoryConclusion
57
States Processor Core EventsBus Events loadstore replacement OwnGetS or OwnGetM OwnGetMOwnPutMOtherGetSOtherGetMOtherPutMOwn Data response (for own request) I issue GetS/IS AD issue GetM/IM AD --- IS AD stall -/IS D ----/IS A IS D stall --/IS D Iload hit/S IS A stall load hit/S-- IS D I stall --load hit/I IM AD stall -/IM D ----/IM A IM D stall -/IM D S-/IM D Istore hit/M IM A stall store hit/M-- IM D I stall --store hit, send data to GetM requestor/I IM D S stall --/IM D SIstore hit, send data to GetM requestor and mem/S IM D SI stall -store hit, send data to GetM requestor and mem/I S hit issue GetM/SM AD -/I- SM AD hitstallistall-/SM D --/IM AD -/SM A SM D hitstall -/SM D S-/SM D Istore hit/M SM A hitstall store hit/M--/IM A SM D I hitstall --store hit, send data to GetM requestor/I SM D S hitstall --/SM D SIstore hit, send data to GetM requestor and mem/S SM D SI hitstall --store hit, send data to GetM requestor and mem/I M hit issue PutM/MI A send data to requestor and to memory/S send data to requestor/I MI A hit stallsend data to requestor/Isend data to requestor and to memory/II A send data to requestor/II A II A stall -/I---
58
IntroductionSnoopingDirectoryConclusion States Bus Events GetSGetMPutM from OwnerPutM from Non-OwnerData IorS send data to requestor send data to requestor, set Owner to requestor/M - M clear Owner/IorS D set Owner to requestorclear Owner/IorS D - write data to memory/IorS A IorS D stall -write data to memory/IorS IorS A clear Owner/IorS- -
59
Uses MOESI Non-atomic requests and transactions. Supports up to 64bit processors. Wired snooping busses consume lots of energy; thus, they do not scale up to large number of cores. To solve this problem. E10000 uses point-to-point links instead. Uses a separate bus for sending out-of-order data response messages. IntroductionSnoopingDirectoryConclusion
60
IntroductionSnoopingDirectoryConclusion
61
Benchmark suite: Splash-2 Benchmark application: Gem5, SE mode Hardware: four CPUs. Each CPU has private L1 cache of 32KB with associativity 4. Default cache line size is 64 bytes which we configure for our experiment. IntroductionSnoopingDirectoryConclusion L1 Block Size (bytes) Write-Back/ Memory References 1611214 3212350 6412672 12813001 Write backs L1 cash size (KB) Write backs L1 block size (bytes)
62
IntroductionSnoopingDirectoryConclusion
63
IntroductionSnoopingDirectoryConclusion
64
Benchmark suite: Splash-2 Benchmark applications: Barnes-Hut, LU, OCEAN, Radiosity, Radix, Ray Trace Protocols: MESI and MSI Hardware: ? IntroductionSnoopingDirectoryConclusion
65
Protocols: MSI and MESI, MOSI, MOESI IntroductionSnoopingDirectoryConclusion Hardware Splash-2 inputs and applications
66
Directory protocols were originally developed to address the lack of scalability of snooping protocols. Directory protocols is to avoid the broad cast nature of snooping. Snooping systems broadcast all requests on a totally ordered interconnection network and all requests are snooped by all coherence controllers. But the, directory protocols uses indirection to avoid both the ordered broadcast network and having each cache controller process every request. Directory based protocols should be competitive with snoopy protocols
67
core Interconnection network LLC/direct ory controller Last-level cache (LLC) directory MAIN MEMORY Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE PROCESSOR CHIP
68
ProtocolOrdered networkAdvantagesdisadvantages Snooping protocolYesSimpleDifficult to scale Directory based protocol NoScalableIndirection, extra hardware
69
A directory in the directory system model maintains a global view of the coherence state of each block. Keeps track of copies of cached blocks and their states. Every block has associated directory information. Every request goes to directory and the directory then sends directives to each cache. One restriction on the interconnection network that is that it enforces point-to-point ordering. That is, if controller A sends two messages to controller B, then the messages arrive at controller B in the same order in which they were sent.
70
In Figure, we show the transactions in which a cache controller issues coherence requests to change permissions from I to S, I or S to M, M to I, and S to I. Cache sends request to GetM to the directory, and the directory takes two actions. First, it responds to the requestor with a message that includes the data and the AckCount. It is the number of current sharers of the block. Second, the directory sends an Invalidation message to all of the current sharers. Each sharer, upon receiving the Invalidation, sends an Invalidation-Ack to the requestor. PutM message that includes the data to the directory. The directory responds with a Put-Ack. If the PutM did not carry the data with it, then the protocol would require a third message—a data message from the cache controller to the directory with the evicted block that had been in state M—to be sent in a PutM transaction.
74
I to S (common case #1) The cache controller sends a GetS request to the directory and changes the block state from I to IS D. The directory receives this request and, if the directory is the owner (i.e., no cache currently hast he block in M), the directory responds with a Data message, changes the block’s state to S (if it is not S already), and adds the requestor to the sharer list. When the Data arrives at the requestor, the cache controller changes the block’s state to S, completing the transaction. I to S (common case #2) The cache controller sends a GetS request to the directory and changes the block state from I to IS D. If the directory is not the owner (i.e., there is a cache that currently has the block in M), the directory forwards the request to the owner and changes the block’s state to the transient state SD. The owner responds to this Fwd-GetS message by sending Data to the requestor and changing the block’s state to S. The now-previous owner must also send Data to the directory since it is relinquishing ownership to the directory, which must have an up-to-date copy of the block. When the Data arrives at the requestor, the cache controller changes the block state to S and considers the transaction complete. When the Data arrives at the directory, the directory copies it to memory, changes the block state to S, and considers the transaction complete.
78
Consider a complete directory maintaining complete state of each block, including the full set of caches that may have shared copies Point-to-point ordering for the Forwarded Request network
79
Recall: if a cache has a block in the Owned state, then the block is valid, read-only, dirty (i.e., it must eventually update memory), and owned (i.e., the cache must respond to coherence requests for the block) Adding Owned State changes the protocols (compare with MSI) in three important ways: 1. More coherence requests are satisfied by caches (in O state) than by the LLC/mem 2. There are more 3-hop transactions
80
Req I S Req I S S S Req I S Dir M O O O Owner M O O O (1) GetS MOSI Directory Protocol – Cache Controller load store replaceme nt Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv- Ack Isend GetS to Dir/IS D IS D S If directory is the owner If directory is not the owner (2) Fwd-GetS
81
Req I S Req I S S S Req I S Dir M O O O Owner M O O O (1) GetS MOSI Directory Protocol – Cache Control load store replacement Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv- Ack Isend GetS to Dir/IS D IS D Stall -/S S If directory is the owner If directory is not the owner (2) Fwd-GetS
82
If directory is the owner If directory is not the owner Req I S Req I S S S (1) GetS (2) Data Req I S Dir M O O O Owner M O O O (1) GetS (2) Fwd-GetS (3) Data load Store replacement Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv-Ack ISend GetS to Dir/IS D IS D Stall -/S SHit IS D : I -> S, waits for D
83
Req I M Req I M (1) GetM Req I S S M Dir S M Sharer S I Sharer S I (1) GetM load Store replacement Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv- Ack ISend GetM to Dir/IM AD IM AD IM A S SM AD SM A M IM AD : the cache wants I -> M, waits for D + possibly Ack The cache know how many ack it expects to receive
84
Req I M Req I M (1) GetM (2) Data [ack =0] Req I S S M Dir S M Sharer S I Sharer S I (1) GetM (2) Data[ack>0] (2) Inv load Store replaceme nt Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv- Ack ISend GetM to Dir/IM AD IM AD Stall -/M-/IM A -/MAck-- IM A Stall SSend GetM to Dir/SM AD Send Inv- Ack to Req/I SM AD HitStall Send Inv- Ack to Req/IM AD -/M-/SM A -/MAck-- SM A HitStall M
85
Req I M Req I M (1) GetM (2) Data [ack =0] load Store replacement Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv-Ack ISend GetM to Dir/IM AD IM AD Stall -/M-/IM A -/MAck-- IM A Stall SHitSend GetM to Dir/SM AD send PutS to Dir/SI A Send Inv- Ack to Req/I SM AD HitStall Send Inv- Ack to Req/IM AD -/M-/SM A -/MAck-- SM A HitStall MHit Send PutM + data to Dir/MI A Send data to Req/Q Send data to Req/I -/I Req I S S M Dir S M Sharer S I Sharer S I (1) GetM (2) Data[ack>0] (2) Inv (3) Inv-Ack
86
Req O M Dir O M Sharer S I Sharer S I (1) GetM (2) AckCount (2) Inv (3) Inv-Ack load Store replacement Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv-Ack MHit Send PutM + data to Dir/MI A Send data to Req/Q Send data to Req/I -/I OHitSend GetM to Dir/OM AM Send PutO+data to Dir/OI A Send data to Req Send data to Req/I OM AC HitStall Send data to Req Send data to Req/IM AD -/OM A Ack- - OM A HitStall Send data to Req StallAck --/M
87
Req O I Dir O M (1) PutO + data (2) Put_ack Req S I Dir S I S S (1) PutS (2) Put-Ack Req M I Dir M I (1) PutM + data (2) Put-Ack load Store replacement Fwd-GetS Fwd-getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCount from Dir Inv-Ack Last-Inv- Ack MHit Send PutM + data to Dir/MI A Send data to Req/Q Send data to Req/I -/I MI A Stall Send data to Req/OI A Send data to Req/II A Ack---/M OHitSend GetM to Dir/OM AM Send PutO+data to Dir/OI A Send data to Req Send data to Req/I OM AC HitStallSallSend data to Req Send data to Req/IM AD -/OM A Ack= OM A HitStall Send data to Req StallAck --/M OI A Stall Send data to Req Send data to Req/II A -/I SI A Stall Send Inv-Ack to Req/II A -/I II A Stall -/I
88
load Store replacem ent Fwd- GetS Fwd- getM Inv Put-Ack Data form Dir (ack =0) Data from Owner (ack >0) AckCoun t from Dir Inv-Ack Last-Inv- Ack ISend GetS to Dir/IS D Send GetM to Dir/IS AD IS D Stall -/S IM AD Stall -/M-/IM A -/MAck-- IM A Stall SHitSend GetM to Dir/SM AD send PutS to Dir/SI A Send Inv-Ack to Req/I SM AD HitStall Send Inv-Ack to Req/IM AD -/M-/SM A -/MAck-- SM A HitStall MHit Send PutM + data to Dir/MI A Send data to Req/Q Send data to Req/I -/I MI A Stall Send data to Req/OI A Send data to Req/II A OHitSend GetM to Dir/OM AM Send PutO+data to Dir/OI A Send data to Req Send data to Req/I OM AC HitStall Send data to Req Send data to Req/IM AD -/OM A Ack= OM A HitStall Send data to Req StallAck --/M OI A Stall Send data to Req Send data to Req/II A -/I SI A Stall Send Inv-Ack to Req/II A -/I II A Stall -/I
89
GetSGetM from Owner GetM from NonOwner: PutS – NonLeaf data PutS-LastPutM+data from Owner PutO+data from NonOwner PutO + dat Isend Data to Req, add Req to Sharers/S GetM from Owner send Data to Req, set Owner to Req/M send Put-Ack to Req send Put-Ack to Req Send Put- Ack to Req Ssend Data to Req, add Req to Sharers send Data to Req, send Inv to Sharers, set Owner to Req, clear Sharers/M remove Req from Sharers, send Put- Ack to Req Remove Req from Sharers, send Put-Ack to Req/I remove Req from Sharers, send Put-Ack to Req Oforward GetS to Owner, add Req to Sharers send Ack- Count to Req, send Inv to Sharers, clear Sharers/M forward GetM to Owner, send Inv to Sharers, set Owner to Req, clear Sharers, send AckCount to Req/M remove Req from Sharers, send Put- Ack to Req remove Req from Sharers, send Put-Ack to Req remove Req from Sharers, copy data to mem, send Put-Ack to Req, clear Owner/S remove Req from Sharers, send Put-Ack to Req copy data to memory, send Put-Ack to Req, clear Owner/ S remove Req from Sharers, send Put-Ack to Req Mforward GetS to Owner, add Req to Sharers/O forward GetM to Owner, set Owner to Req send Put-Ack to Req send Put-Ack to Req copy data to mem, send Put- Ack to Req, clear Owne/I remove Req from Sharers, send Put-Ack to Req send Put-Ack to Req
90
Comparison between cache controller on MSI and MOSI Comparison between memory controller on MSI and MOSI MSIMOSI Total # of messages1520 Total # of stalls3138 MSIMOSI Total # of messages1928 Total # of stalls22
91
We have assumed a complete directory maintaining the complete state of each blocks, including the full set of caches that may have shared copies Coarse directories and limited pointers are two ways to reduce how much state directory maintains state owner complete sharer list (bit error) 2-bitlog 2 C-bit C-bit state owner coarse sharer list (bit error) 2-bitlog 2 C-bit C/K-bit state owner pointers to I sharers 2-bitlog 2 C-bit i*log 2 C-bit Complete directory: each bit in sharer list represents one cache Coarse directory: each bit in sharer list represents K caches Limited directory: sharer list is divided into i entries, each of which is a pointer to a cache
92
Interconnection network Memory directory Cache Cache controller core Cache controller Cache Directory controller Memory directory Directory controller core Multiple directories provides greater bandwidth of coherence transactions Idea: in a system with N directories, block B’s directory might be at directory B modulo N because the allocation of memory address to nodes is often static.
93
Recall: one of the limitation of directory protocols is that the stall situation happens frequently When a cache controller has a block in state IM A and receives a Fwd-GetS, it processes the request and changes the block’s state to IMAS. This state indicates that after the cache controller’s GetM transaction completes (i.e., when the last Inv-Ack arrives), the cache controller will change the block state to S. the cache controller must also send the block to the requestor of the GetS and to the directory, which is now the owner. Conclude: By not stalling on the Fwd-GetS, the cache controller can improve performance by continuing to process other forwarded requests behind that Fwd-GetS in its incoming queue.
94
NOTE: So far, we now do not have point-to-point ordering in interconnection network Considering MOSI situation as an example One of the approaches is to have a customized message to take care of the situation (a) Example with point-to-point ordering (b) Example without point-to-point ordering. Note that C2’s Fwd-GetS arrives at C1 in state I and thus C1 does not respond.
95
(a) Adaptive Routing Example Adaptive routing is the solution to enable a message to dynamically choose its path as its traverses the network Congested links and switches can be avoided Moreover, point-to-point ordering problem could also be solved
96
Flat memory-based directory protocol Uses a bit vector directory representation Consists 512 nodes Two processors per node, but there is no snooping protocol within a node –combining multiple processors in a node reduces cost
97
Distinguishing Features As its scalability, each directory entry contains fewer bits than necessary to present every possible cache that could be sharing a block. Directory dynamically choose coarse bit vector or limited pointer presentation Since network provides no ordering, there are several new messages have been used for reordering purposes Protocol considers all of these conditions by not enforcing ordering in the network Use only two networks request and response to avoid deadlock. Note that directory has three types of message (request, forwarded request and response)
99
Benchmarks: SPLASH-2: fft, Barnes-Hut, LU, Ocean, Radiosity, Radix, Ray Trace SPECibb: benchmark for computing the performance of java servers, applications PERSEC: benchmark for shared memory, multithreaded programs. Metrics System performance (time efficiency) Processor Utilization (time spent waiting for memory) Directory utilization Number of access to physical mem Power consumption (difficult)
100
Benchmark suite: Splash-2 Benchmark application: Gem5, SE mode Hardware: Hydra (UCDenver) L1 Cache Size (KB) Write-Back /Memory References 1617300 3212672 645251 1280 L1 Block Size (bytes) Write-Back/ Memory References 1611214 3212350 6412672 12813001 Write backs L1 cash size (KB) Write backs L1 block size (bytes) Example results:
103
[1] – Daniel J. S., Mark D. H., and David A. W., “A Primer on Memory Consistency and Cache Coherence,” Morgan Claypool Publishers, 2011. [2] – Linda Suleman, Bigelow Veynu, and Narasiman Aater, “An Evaluation of Snoop-Based Cache Coherence Protocols,” [3] – Anoop Tiwari, “Performance comparison of cache coherence protocol on multi-core architecture,” Diss. 2014. [4] – Chang, Mu-Tien, Shih-Lien Lu, and Bruce Jacob. “Impact of Cache Coherence Protocols on the Power Consumption of STT-RAM-Based LLC,” [5] – CMU 15-418: Parallel Architecture and Programming. Lecture Series. Spring 2012. IntroductionSnoopingDirectoryConclusion
104
IntroductionSnoopingDirectoryConclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.