Download presentation
Presentation is loading. Please wait.
Published byHannah Emily Bates Modified over 9 years ago
1
5/8/2015 slide 1 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Snoop-based multiprocessor design Correctness issues semantic model: coherence and memory consistency dead-lock, live-lock, and starvation Design issues simplistic-to-realistic one-by-one: Single-level cache and an atomic bus Multi-level cache design issues Split-transaction bus design issues Scalable snoop-based design techniques More Architectural Support for MIMD
2
5/8/2015 slide 2 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Key goals Correctness Design simplicity (verification is costly) High performance Design simplicity and performance are often at odds Get picture of bus-based coherence organization, dual tags, proc-side and bus-side controllers
3
5/8/2015 slide 3 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Correctness Requirements Semantic model: contract between HW/SW cache coherence -> write serialization sequential consistency -> prog. order, write atomicity Deadlock: no forward progress and no system activity resources being held in a cyclic relationship Livelock: no forward progress but system activity allocation/de-allocation of resources with no progress Starvation: some processes are denied service often temporary
4
5/8/2015 slide 4 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Single-Level Cache and Atomic Bus Single-level caches and an atomic bus Tag and cache controller design issues Snoop protocol design Race conditions: non-atomic state transitions Correctness issues serialization deadlock, livelock, and starvation Atomic (synchronization) operations
5
5/8/2015 slide 5 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Cache Controller Design Extension for snoop support: bus requests also access cache processor-side controller bus-side controller Recall actions on a cache access: 1. Indexing cache with tag check 2. Get/request data 3. Update state bits Cached data Tags Processor requests bus requests Performance issue: Simultaneous tag accesses from processor and bus Solution: Duplicate tags but keep them consistent Tags
6
5/8/2015 slide 6 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Reporting Snoop Results Where to read (memory or cache) and what state transition to make? support wired-and/or bus lines When is the snoop result available? (main alternatives) synchronous: requires dual tags and must adapt to worst-case because of updates of state bits caused by processor asynchronous (variable delay snoop): assume minimum delay but add enough cycles if necessary memory state bit to distinguish between valid/invalid memory block
7
5/8/2015 slide 7 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Dealing with Write-backs One would like to service miss before writing back the replaced block Two implications: Add a write-back buffer Bus snoops must also look into write-back buffer
8
5/8/2015 slide 8 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Baseline Architecture Write-back buffer
9
5/8/2015 slide 9 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 State Transitions Must Appear Atomic Upgr Cache 1 Cache 2 Upgr 1. Await use of bus 2. Cache 2 gets access to bus 3. Upgrade from Cache 2 updates state of Cache 1 to invalid 4. Upgrade from cache 1 is performed. However, Upgrade is not appropriate Assume a block is in shared state in both caches
10
5/8/2015 slide 10 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Non-Atomic State Transitions Time window between issuing and performing of a bus operation Problem: another transaction may change action Solution: extend with non-atomic state
11
5/8/2015 slide 11 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Correctness Issues Write serialization: ownership acquisition and cache block modification should appear atomic processor may not write data into cache until read- exclusive request is on bus; it is committed Deadlock: Two cache controllers may be in a circular dependence relation if one is locking the cache while waiting for the bus (fetch deadlock) Livelock: If several controllers issue read-exclusive requests for same block at the same time Let each one complete before taking care of next Starvation: Bus arbitration is unfair to some nodes
12
5/8/2015 slide 12 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 A Fetch-Deadlock Situation ReadX B Cache 1Cache 2 BusRd A 1. Await use of bus, but Cache 1 is locked 2. Cache 2 gets access to bus 3. Cache 2 waits for Cache 1 to respond and Cache 1 waits for Cache 2 to release the bus Deadlock! AB
13
5/8/2015 slide 13 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 A Livelock Situation ReadX A Cache 1 Cache 2 ReadX A 1. Try to get bus 3. Make Cache 2’s copy invalid Etc……Livelock! A read exclusive operation involves: 1.Acquisition of an exclusive block 2.Reattempting the write in the local cache 2. Make cache 1’s copy invalid
14
5/8/2015 slide 14 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Remedies to Correctness Issues Do not update cache until Upgrade is on bus Service incoming snoops while waiting for bus Complete the transaction with no interruption Upgr Cache 1 Cache 2 Upgr
15
5/8/2015 slide 15 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Implementation of Atomic Memory Operations Test&set should result in atomic read-modify-write Cacheable t&s vs memory-based implementation lower latency & bw for spinning and self-acquisition longer time to transfer lock to other node memory-based requires bus to be locked down Load-linked (LL) and store-conditional (SC) implementation Lock flag and lock address register at each processor LL reads block, sets lock flag, puts block address in reg Incoming invalidates checked against address: if match, reset flag SC checks lock flag as indicator of intervening conflicting write: if reset, fail; if not, succeed
16
5/8/2015 slide 16 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Multi-Level Cache Designs Coherence needs to be extended across L1 and L2 L1 on-chip. Snoop support in L1 expensive Is snoop support needed in L1? P L1 L2 M Definition: L1 included in L2 iff all blocks in L1 also in L2 If inclusion maintained then snoop support only needed at L2 (must be able to invalidate blocks in L1) Consequence: a block in owned state in L1 (M in MSI) must be marked modified in L2
17
5/8/2015 slide 17 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Maintaining Inclusion Violations to the inclusion property: Set-associative L1 with history-based replacement algorithm Split I- and D-caches at L1 and unified at L2 Different cache block sizes in L1 and L2 Techniques to maintain inclusion: Direct-mapped L1 and L2 with any associativity given some additional constraints for block size, fetch policy, … Note: One can always displace a block in L1 on replacement in L2 to maintain inclusion
18
5/8/2015 slide 18 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Split Transaction Buses Challenging issues: Avoid conflicting requests in progress simultaneously Buffers needed => flow control Correctness issues (coherence, SC, deadlock, livelock,...) Separate request-response phases improve bus utilization Mem Access Delay Address/CMD Mem Access Delay Data Address/CMD Data Address/CMD Bus arbitration
19
5/8/2015 slide 19 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Example of Conflict Situation With atomic bus, Upgrade is committed when bus is granted Here, two Upgrades can be on bus and may invalidate both copies Upgr Cache 1 Cache 2 Upgr
20
Some real examples Details can be interesting Supports historical emphasis of the course SGI Power Challenge 5/8/2015 slide 20 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011
21
5/8/2015 slide 21 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 SGI Challenge 1(4) High-level design decisions Avoid conflicts: Allow a fixed number of requests to different blocks in progress at a time Flow-control: Limited buffers, so NACK when full and retry Ordering: Allow out-of-order responses (to cope with non- uniform delays)
22
5/8/2015 slide 22 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 SGI Challenge 2(4) Separate request-response buses Request phase: (use address request bus) present the address and initiate snooping report snoop result (prolong or nack if necessary) Response phase: (use data request bus) send data back
23
5/8/2015 slide 23 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Design of SGI Challenge 3(4) Max 8 outstand. requests 3-bit tag to separate req. Request table in each node to keep track of outstanding requests Writes are committed when request is granted Flow control: NACK and retry when buffers are full Conflict resolution Before address request is done, request table is checked Memory and caches check request independently
24
5/8/2015 slide 24 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Serialization and SC 4(4) Serialization to a single location guaranteed 1. Only a single request to each block allowed 2. Request committed when request on bus Problems to guarantee SC: requires serialization across writes to different locations requests can be reordered in buffers so being committed is not same as performed A solution: Servicing incoming requests before processor’s own requests guarantees write atomicity
25
5/8/2015 slide 25 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Multiple Outstanding Processor Requests Modern processors allow multiple outstanding memory operations Problem: may violate sequential consistency Solution: Buffer all outstanding requests Don’t make writes visible to any until committed Don’t perform reads before previously issued requests are committed Lockup-free caches implement the buffering capability to enforce ordering of uncommitted memory operations
26
5/8/2015 slide 26 PCOD: MIMD II Lecture (Coherence) Per Stenström (c) 2008, Sally A. McKee (c) 2011 Commercial Machines SGI Challenge: 36 MIPS R8000 processors with a 1.2 GB/s bus Peak: 5.4 GFLOPS Sun Enterprise 6000: 30 UltraSparc processors with 2.67 GB/s bus Peak: 9 GFLOPS Look these up on the net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.