Download presentation
Presentation is loading. Please wait.
Published byJocelyn Tucker Modified over 9 years ago
1
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline MESI protocol Dragon update-based protocol Impact of protocol optimizations
2
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin2 Lower-Level Protocol Choices BusRd observed in M state: what transition to make? Change to S: assume I’ll read again soon good for mostly read data what about “migratory” data, thus: Change to I: assume other will write to it (Synapse) I read and write, then you read and write, then X reads and writes... Sequent Symmetry and MIT Alewife use adaptive protocols
3
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin3 MESI (4-state) Invalidation Protocol Problem with MSI protocol Rd, Wr sequence incurs 2 transactions even when no one is sharing (e.g., serial program!) BusRd (I S) followed by BusRdX or BusUpgr (S M) In general, coherence traffic from serial programs is unacceptable Add exclusive state: Invalid Modified (dirty) Shared (two or more caches may have copies) Exclusive (only this cache has clean copy, same value as in memory) How to decide I E or I S? Need to check whether someone else has copy “Shared” signal on bus: wired-or line asserted in response to BusRd
4
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin4 MESI: Processor-Initiated Transactions MSE PrRd/– PrWr/ – PrRd/ – PrWr/ – I PrRd/BusRd(~S) PrRd/BusRd(S) PrWr/BusRdX PrRd/ –
5
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin5 MESI: Bus-Initiated Transactions M IE BusRd/– BusRdX/– S BusRd/Flush BusRdX/Flush BusRdX/Flush ׳ BusRd/Flush ׳
6
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin6 MESI State Transition Diagram BusRd(S) means shared line asserted on BusRd transaction
7
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin7 Flush vs. Flush' Flush: mandatory Flush' happens only when Cache-to-cache sharing is used, and, Only one cache flushes data
8
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin8 MESI Visualization P1P3 P2 Cache Main Memory Bus Snooper X=1 Mem Ctrl
9
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin9 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl rd &X BusRd
10
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin10 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E
11
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin11 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E wr &X (X=2) M2 One less bus request due to Exclusive state, esp. for serial programs
12
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin12 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M rd &X BusRd
13
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin13 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M S 2 S Flush
14
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin14 MESI Visualization P1P3 P2 Snooper X=2 Mem Ctrl X=2S S wr &X X=3 BusUpgr IM3 Note: BusUpgr instead of BusRdX
15
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin15 MESI Visualization P1P3 P2 Snooper X=2 Mem Ctrl X=2IX=3 rd &X BusRd 3 S3 M S Flush
16
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin16 MESI Visualization P1P3 P2 Snooper X=3 Mem Ctrl X=3S S rd &X
17
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin17 MESI Visualization P1P3P2 Snooper X=3 Mem Ctrl X=3S S rd &X BusRd X=3S Referred to as Cache-to-cache transfer in Illinois MESI protocol Flush1
18
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin18 MESI Example (Cache-to-Cache Transfer) * Data from memory if no cache2cache transfer, BusRd/- Proc Action State P1State P2State P3Bus ActionData From R1E––BusRdMem W1M–––Own cache R3S–SBusRd/FlushP1 cache W3I–MBusRdXMem R1S–SBusRd/FlushP3 cache R3S–S–Own cache R2SSSBusRd/Flush׳׳ P1/P3 Cache*
19
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin19 MESI Example (Cache-to-Cache Transfer+BusUpgr) * Data from memory if no cache2cache transfer, BusRd/- Proc Action State P1State P2State P3Bus ActionData From R1E--BusRdMem W1M---Own cache R3S-SBusRd/FlushP1 cache W3I-MBusUpgrOwn cache R1S-SBusRd/FlushP3 cache R3S-S-Own cache R2SSS BusRd/Flush ׳ P1/P3 Cache*
20
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin20 Lower-Level Protocol Choices Who supplies data on miss when not in M state: memory or cache? Original, lllinois MESI: cache assume cache faster than memory (cache-to-cache transfer) Not necessarily true Adds complexity How does memory know it should supply data? (must wait for caches) Selection algorithm if multiple caches have valid data Valuable for distributed memory May be cheaper to obtain from nearby cache than distant memory Especially when constructed out of SMP nodes (Stanford DASH)
21
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin21 Lecture 9 Outline MESI protocol Dragon update-based protocol Impact of protocol optimizations
22
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin22 Dragon Writeback Update Protocol Four states Exclusive-clean (E): I and memory have it Shared clean (Sc): I, others, and maybe memory, but I’m not owner Shared modified (Sm): I and others but not memory, and I’m the owner Sm and Sc can coexist in different caches, with at most one Sm Modified or dirty (M): I and, no one else On replacement: Sc can silently drop, Sm has to flush No invalid state If in cache, cannot be invalid If not present in cache, can view as being in not-present or invalid state New processor events: PrRdMiss, PrWrMiss Introduced to specify actions when block not present in cache New bus transaction: BusUpd Broadcasts single word written on bus; updates other relevant caches
23
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin23 Dragon: Processor-Initiated Transactions EM Sc Sm PrRdMiss/BusRd(~S) PrRd/– PrWr/ – PrRd/ – PrWr/BusUpd(S) PrWr/BusUpd(~S) PrRdMiss/BusRd(S) PrWrMiss/ (BusRd(S);BusUpd) PrRd/ – PrWr/BusUpd(~S) PrRdMiss/BusRd(~S) PrRd/ – PrWr/BusUpd(S) PrWr/–
24
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin24 Dragon: Bus-Initiated Transactions EM Sc Sm BusRd/– BusUpd/Update BusRd/– BusRd/Flush BusUpd/Update BusRd/Flush
25
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin25 Dragon State Transition Diagram E Sc Sm M PrWr/— PrRd/— PrRdMiss/ BusRd(S) PrRdMiss/ BusRd(S) PrWr/— PrWrMiss/ (BusRd(S); BusUpd) PrWrMiss/ BusRd(S) PrWr/ BusUpd(S) PrWr/BusUpd(S) BusRd/— BusRd/Flush PrRd/— BusUpd/Update BusRd/Flush PrWr/BusUpd(S)
26
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin26 Dragon Visualization P1P3 P2 Cache Main Memory Bus Snooper X=1 Mem Ctrl
27
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin27 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl rd &X BusRd
28
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin28 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E
29
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin29 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E wr &X (X=2) M2 One less bus request due to Exclusive state, esp. for serial programs
30
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin30 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M rd &X BusRd
31
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin31 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M Sc Sm
32
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin32 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2SmX=2Sc wr &X X=3 BusUpd Sm3 Note: BusUpdate instead of BusUpgr (no inval is performed) Sc3
33
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin33 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=3ScX=3 rd &X Sm This is a miss in the MESI and MSI protocols
34
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin34 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm rd &X
35
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin35 Dragon Visualization P1P3P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm rd &X BusRd X=3Sc Note: Only the cache in State Sm is responsible for cache-to-cache transfer
36
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin36 Dragon Visualization P1P3P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm X=3Sc P1 replaces X
37
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin37 Dragon Visualization P1P3P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm X=3Sc P3 replaces X Owner responsible for writing back to mem 3 vs. MSI or MESI where write-back only when the line is in M state
38
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin38 Dragon Example Proc Action State P1State P2State P3Bus ActionData From R1E––BusRdMem W1M–––Own cache R3Sm–ScBusRd/FlushP1 cache W3Sc–SmBusUpd/UpdOwn cache R1Sc–Sm–Own cache R3Sc–Sm–Own cache R2Sc SmBusRd/FlushP3 cache
39
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin39 Lower-Level Protocol Choices Can shared-modified state be eliminated? If update memory as well on BusUpd transactions (DEC Firefly) Dragon protocol doesn’t (assumes DRAM memory slow to update) Should replacement of an Sc block be broadcast? Would allow last copy to go to Exclusive state and not generate updates Replacement bus transaction is not in critical path, later update may be Shouldn’t update local copy on write hit before controller gets bus Can mess up serialization Coherence, consistency considerations much like write-through case In general, many subtle race conditions in protocols But first, let’s illustrate quantitative assessment at logical level
40
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin40 Lecture 9 Outline MESI protocol Dragon update-based protocol Impact of protocol optimizations
41
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin41 Assessing Protocol Tradeoffs Methodology: Use simulator; choose parameters per earlier methodology (default 1MB, 4-way cache, 64-byte block, 16 processors; 64K cache for some) Focus on frequencies, not end performance for now transcends architectural details, but not what we’re really after Use idealized memory performance model to avoid changes of reference interleaving across processors with machine parameters Cheap simulation: no need to model contention
42
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin42 Impact of Protocol Optimizations MSI = MESI Upgrades instead of read-exclusive helps Same story when working sets don’t fit for Ocean, Radix, Raytrace MESI vs. MSI (w/ BusUpgr) vs. MSI (w/ BusRdX)
43
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin43 Impact of Cache-Block Size Multiprocessors add new kind of miss to cold, capacity, conflict Coherence misses: Due to invalidations True sharing: Write to same word False sharing: Write to different words Reducing misses architecturally in invalidation protocol Capacity: enlarge cache; increase block size (if spatial locality) Conflict: increase associativity Cold and coherence: only block size Increasing block size has advantages and disadvantages Can reduce misses if spatial locality is good Can hurt too increase misses due to false sharing if spatial locality not good increase misses due to conflicts in fixed-size cache increase traffic due to fetching unnecessary data and due to false sharing can increase miss penalty and perhaps hit cost
44
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin44 Impact of Block Size on Miss Rate For default problem size: vary block/line size from 8-256 Bytes Decreases with larger lines: cold, capacity (due to spatial locality), true sharing (due to spatial locality) Increases with larger lines: false sharing Working set doesn’t fit: impact of capacity misses large: (Ocean, Radix)
45
Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin45 Impact of Block Size on Traffic Results different than for miss rate: traffic almost always increases When working sets fits, overall traffic still small, except for Radix Fixed overhead is significant component So total traffic often minimized at 16-32 byte block, not smaller Working set doesn’t fit: even 128-byte good for Ocean due to capacity Address bus traffic behaves in opposite way as the data bus traffic Traffic (bytes/inst) affects performance indirectly through contention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.