Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol.

Similar presentations


Presentation on theme: "Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol."— Presentation transcript:

1 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol  Impact of protocol optimizations

2 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin2 Lower-Level Protocol Choices  BusRd observed in M state: what transition to make? Change to S: assume I’ll read again soon  good for mostly read data  what about “migratory” data, thus: Change to I: assume other will write to it (Synapse)  I read and write, then you read and write, then X reads and writes... Sequent Symmetry and MIT Alewife use adaptive protocols

3 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin3 MESI (4-state) Invalidation Protocol  Problem with MSI protocol Rd, Wr sequence incurs 2 transactions  even when no one is sharing (e.g., serial program!)  BusRd (I  S) followed by BusRdX or BusUpgr (S  M)  In general, coherence traffic from serial programs is unacceptable  Add exclusive state:  Invalid  Modified (dirty)  Shared (two or more caches may have copies)  Exclusive (only this cache has clean copy, same value as in memory)  How to decide I  E or I  S? Need to check whether someone else has copy “Shared” signal on bus: wired-or line asserted in response to BusRd

4 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin4 MESI: Processor-Initiated Transactions MSE PrRd/– PrWr/ – PrRd/ – PrWr/ – I PrRd/BusRd(~S) PrRd/BusRd(S) PrWr/BusRdX PrRd/ –

5 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin5 MESI: Bus-Initiated Transactions M IE BusRd/– BusRdX/– S BusRd/Flush BusRdX/Flush BusRdX/Flush ׳ BusRd/Flush ׳

6 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin6 MESI State Transition Diagram BusRd(S) means shared line asserted on BusRd transaction

7 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin7 Flush vs. Flush'  Flush: mandatory  Flush' happens only when Cache-to-cache sharing is used, and, Only one cache flushes data

8 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin8 MESI Visualization P1P3 P2 Cache Main Memory Bus Snooper X=1 Mem Ctrl

9 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin9 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl rd &X BusRd

10 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin10 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E

11 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin11 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E wr &X (X=2) M2 One less bus request due to Exclusive state, esp. for serial programs

12 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin12 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M rd &X BusRd

13 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin13 MESI Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M S 2 S Flush

14 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin14 MESI Visualization P1P3 P2 Snooper X=2 Mem Ctrl X=2S S wr &X X=3 BusUpgr IM3 Note: BusUpgr instead of BusRdX

15 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin15 MESI Visualization P1P3 P2 Snooper X=2 Mem Ctrl X=2IX=3 rd &X BusRd 3 S3 M S Flush

16 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin16 MESI Visualization P1P3 P2 Snooper X=3 Mem Ctrl X=3S S rd &X

17 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin17 MESI Visualization P1P3P2 Snooper X=3 Mem Ctrl X=3S S rd &X BusRd X=3S Referred to as Cache-to-cache transfer in Illinois MESI protocol Flush1

18 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin18 MESI Example (Cache-to-Cache Transfer) * Data from memory if no cache2cache transfer, BusRd/- Proc Action State P1State P2State P3Bus ActionData From R1E––BusRdMem W1M–––Own cache R3S–SBusRd/FlushP1 cache W3I–MBusRdXMem R1S–SBusRd/FlushP3 cache R3S–S–Own cache R2SSSBusRd/Flush׳׳ P1/P3 Cache*

19 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin19 MESI Example (Cache-to-Cache Transfer+BusUpgr) * Data from memory if no cache2cache transfer, BusRd/- Proc Action State P1State P2State P3Bus ActionData From R1E--BusRdMem W1M---Own cache R3S-SBusRd/FlushP1 cache W3I-MBusUpgrOwn cache R1S-SBusRd/FlushP3 cache R3S-S-Own cache R2SSS BusRd/Flush ׳ P1/P3 Cache*

20 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin20 Lower-Level Protocol Choices  Who supplies data on miss when not in M state: memory or cache?  Original, lllinois MESI: cache assume cache faster than memory (cache-to-cache transfer) Not necessarily true  Adds complexity How does memory know it should supply data? (must wait for caches) Selection algorithm if multiple caches have valid data  Valuable for distributed memory May be cheaper to obtain from nearby cache than distant memory Especially when constructed out of SMP nodes (Stanford DASH)

21 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin21 Lecture 9 Outline  MESI protocol  Dragon update-based protocol  Impact of protocol optimizations

22 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin22 Dragon Writeback Update Protocol  Four states Exclusive-clean (E): I and memory have it Shared clean (Sc): I, others, and maybe memory, but I’m not owner Shared modified (Sm): I and others but not memory, and I’m the owner  Sm and Sc can coexist in different caches, with at most one Sm Modified or dirty (M): I and, no one else On replacement: Sc can silently drop, Sm has to flush  No invalid state If in cache, cannot be invalid If not present in cache, can view as being in not-present or invalid state  New processor events: PrRdMiss, PrWrMiss Introduced to specify actions when block not present in cache  New bus transaction: BusUpd Broadcasts single word written on bus; updates other relevant caches

23 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin23 Dragon: Processor-Initiated Transactions EM Sc Sm PrRdMiss/BusRd(~S) PrRd/– PrWr/ – PrRd/ – PrWr/BusUpd(S) PrWr/BusUpd(~S) PrRdMiss/BusRd(S) PrWrMiss/ (BusRd(S);BusUpd) PrRd/ – PrWr/BusUpd(~S) PrRdMiss/BusRd(~S) PrRd/ – PrWr/BusUpd(S) PrWr/–

24 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin24 Dragon: Bus-Initiated Transactions EM Sc Sm BusRd/– BusUpd/Update BusRd/– BusRd/Flush BusUpd/Update BusRd/Flush

25 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin25 Dragon State Transition Diagram E Sc Sm M PrWr/— PrRd/— PrRdMiss/ BusRd(S) PrRdMiss/ BusRd(S) PrWr/— PrWrMiss/ (BusRd(S); BusUpd) PrWrMiss/ BusRd(S) PrWr/ BusUpd(S) PrWr/BusUpd(S) BusRd/— BusRd/Flush PrRd/— BusUpd/Update BusRd/Flush PrWr/BusUpd(S)

26 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin26 Dragon Visualization P1P3 P2 Cache Main Memory Bus Snooper X=1 Mem Ctrl

27 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin27 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl rd &X BusRd

28 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin28 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E

29 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin29 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=1E wr &X (X=2) M2 One less bus request due to Exclusive state, esp. for serial programs

30 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin30 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M rd &X BusRd

31 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin31 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2M Sc Sm

32 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin32 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=2SmX=2Sc wr &X X=3 BusUpd Sm3 Note: BusUpdate instead of BusUpgr (no inval is performed) Sc3

33 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin33 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=3ScX=3 rd &X Sm This is a miss in the MESI and MSI protocols

34 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin34 Dragon Visualization P1P3 P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm rd &X

35 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin35 Dragon Visualization P1P3P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm rd &X BusRd X=3Sc Note: Only the cache in State Sm is responsible for cache-to-cache transfer

36 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin36 Dragon Visualization P1P3P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm X=3Sc P1 replaces X

37 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin37 Dragon Visualization P1P3P2 Snooper X=1 Mem Ctrl X=3ScX=3Sm X=3Sc P3 replaces X Owner responsible for writing back to mem 3 vs. MSI or MESI where write-back only when the line is in M state

38 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin38 Dragon Example Proc Action State P1State P2State P3Bus ActionData From R1E––BusRdMem W1M–––Own cache R3Sm–ScBusRd/FlushP1 cache W3Sc–SmBusUpd/UpdOwn cache R1Sc–Sm–Own cache R3Sc–Sm–Own cache R2Sc SmBusRd/FlushP3 cache

39 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin39 Lower-Level Protocol Choices  Can shared-modified state be eliminated? If update memory as well on BusUpd transactions (DEC Firefly) Dragon protocol doesn’t (assumes DRAM memory slow to update)  Should replacement of an Sc block be broadcast? Would allow last copy to go to Exclusive state and not generate updates Replacement bus transaction is not in critical path, later update may be  Shouldn’t update local copy on write hit before controller gets bus Can mess up serialization  Coherence, consistency considerations much like write-through case  In general, many subtle race conditions in protocols  But first, let’s illustrate quantitative assessment at logical level

40 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin40 Lecture 9 Outline  MESI protocol  Dragon update-based protocol  Impact of protocol optimizations

41 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin41 Assessing Protocol Tradeoffs  Methodology: Use simulator; choose parameters per earlier methodology (default 1MB, 4-way cache, 64-byte block, 16 processors; 64K cache for some) Focus on frequencies, not end performance for now  transcends architectural details, but not what we’re really after Use idealized memory performance model to avoid changes of reference interleaving across processors with machine parameters  Cheap simulation: no need to model contention

42 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin42 Impact of Protocol Optimizations MSI = MESI Upgrades instead of read-exclusive helps Same story when working sets don’t fit for Ocean, Radix, Raytrace MESI vs. MSI (w/ BusUpgr) vs. MSI (w/ BusRdX)

43 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin43 Impact of Cache-Block Size  Multiprocessors add new kind of miss to cold, capacity, conflict Coherence misses: Due to invalidations  True sharing: Write to same word  False sharing: Write to different words  Reducing misses architecturally in invalidation protocol Capacity: enlarge cache; increase block size (if spatial locality) Conflict: increase associativity Cold and coherence: only block size  Increasing block size has advantages and disadvantages Can reduce misses if spatial locality is good Can hurt too  increase misses due to false sharing if spatial locality not good  increase misses due to conflicts in fixed-size cache  increase traffic due to fetching unnecessary data and due to false sharing  can increase miss penalty and perhaps hit cost

44 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin44 Impact of Block Size on Miss Rate  For default problem size: vary block/line size from 8-256 Bytes Decreases with larger lines: cold, capacity (due to spatial locality), true sharing (due to spatial locality) Increases with larger lines: false sharing Working set doesn’t fit: impact of capacity misses large: (Ocean, Radix)

45 Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin45 Impact of Block Size on Traffic  Results different than for miss rate: traffic almost always increases  When working sets fits, overall traffic still small, except for Radix  Fixed overhead is significant component So total traffic often minimized at 16-32 byte block, not smaller  Working set doesn’t fit: even 128-byte good for Ocean due to capacity Address bus traffic behaves in opposite way as the data bus traffic Traffic (bytes/inst) affects performance indirectly through contention


Download ppt "Lecture 9 ECE/CSC 506 - Spring 2007 - E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol."

Similar presentations


Ads by Google