Download presentation
Presentation is loading. Please wait.
Published byCandace Gilmore Modified over 9 years ago
1
Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004
2
Motivation Multi-threading and Multi-processing have become common When a cache line is marked as invalid very often not all data in the line is incorrect If the data in invalid lines can be used speculatively there is a great potential for performance improvement
3
Background Cache Coherence Protocol Used in shared-memory multiprocessors for managing correct data sharing Vital to the design of multiprocessors since it contributes the most to inter-processor communication latency
4
Proposed Idea Separate the traditional cache coherence protocol into two parts –Speculative cache lookup (SCL) – uses a speculative value from an invalid cache line thus allowing the processor to work continuously –Safe coherence protocol – obtains the correct value which is then compared with the value provided by SCL
5
Coherence Decoupling
6
Related Work Customized Coherence Protocols Speculative Coherence Operations Dynamic self-invalidation, coherence message predictor, token coherence etc. Speculation on outcome of events in multi-processor execution
7
Coherence Decoupling Architecture Must support the following: 1.Split - means to split a memory op into speculative load and a coherence operation 2.Compute - mechanisms to support execution with speculative values 3.Recover – means to recover and rollback upon misprediction
8
SCL Protocols for Coherence Decoupling Use a simple safe coherence protocol and rely on an aggressive SCL protocol to increase performance Two components of an SCL protocol –Read component – obtains the speculative value –Update component – updates an invalid cache line so subsequent speculative reads can use it (can be left out in some SCL protocols)
9
Read vs Update components SCL protocol with only a read component can be used if the word in an invalid block has: –Not changed remotely (false sharing) –Changed remotely to a same value (silent stores) –Changed remotely to a different value and then back to the original value (temporally silent stores) For truly-shared data an update component needs to be added –Speculatively sends data around the system by writing it into invalid cache lines
10
SCL protocol Read component CD - Use the locally cached incoherent value for every L2 miss Simple but since it is triggered on every load operation it could produce many mis- speculations CD-F - Add a PC-indexed confidence predictor to filter speculations Reduces the number of (mis)speculative reads thus improving the average accuracy
11
SCL protocol Update component CD-IA Use invalidation piggyback to update all invalid blocks CD-C Use invalidation piggyback if the value is compressed
12
SCL protocol Update component (Ctd.) CD-N - Update all sharers after N writes to a block Increases the number of messages (bandwidth) CD-W - Update on every write if any sharers exist CD assumed wherever Write update is being used
13
Methodology Simulator MP-Sauce & SimpleScalar 16-node SMP systems simulated Coherence protocol used – simple invalidation snooping-bus protocol 3 commercial applications and 5 scientific shared memory SPLASH2 suite benchmarks simulated
14
Results - Microbenchmarks Simple-fs – loads falsely shared data and then executes (in)dependent instructions Critical-fs – forces data dependence between two loads by placing consecutive false sharing misses in critical path
15
L2 Miss Profiling Results
16
Coherence Decoupling Accuracy Results CD, CD-F, CD-IA, CD-C, CD-N, CD-W
17
Timing Results
18
Bandwidth Requirements
19
Latency Tolerance Profiles Executed instructions during coherence decoupling The number of control dependent instructions will grow in future processors
20
Conclusions Coherence Misses – significant fraction of L2 misses ranging from 10% to 80% Coherence Decoupling has the potential to hide the miss latency for 40% to 90% of coherence misses Mis-speculation occurs 20% of the time
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.