Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451
What Does This Program Print? 1.public class VisiblityExample extends Thread { 2. private static int x = 1; 3. private static int y = 1; 4. private static boolean ready = false; 5. public static void main(String[] args) { 6. Thread t = new VisiblityExample(); 7. t.start(); x = 2; 10. y = 2; 11. ready = true; 12. } 13. public void run() { 14. while (! ready) 15. Thread.yield(); // give up the processor 16. System.out.println(“x= “ + x + “ y= “ + y); 17. } 18.}
Answer It’s a race condition. Many different outputs are possible: x=2, y=2 x=1,y=2 x=2,y=1 x=1,y=1 Or, the program may print nothing! The ready loop runs forever
What’s Going on Here? Processor caches ($) can get out-of-sync CPU $ Memory CPU $ $ $
1.// Not real code; for illustration purposes only 2.public class Example extends Thread { 3. private static final int NUM_PROCESSORS = 4; 4. private static int x[NUM_PROCESSORS]; 5. private static int y[NUM_PROCESSORS]; 6. private static boolean ready[NUM_PROCESSORS]; 7.// … A Mental Model Every thread/processor has its own copy of every variable Yikes!
Two Issues Cache coherence Do caches eventually converge on the same state All modern caches are coherent Cache consistency When are operations by one processor visible on other processors? Sometimes called “publication” How much re-ordering is possible across processors?
Subjective View of Cache Consistency Strategies Fast and scalable Amount of reordering Relaxed Strict
Factors Pushing Towards Relaxed Consistency Models Hardware perspective: consistency operations are expensive Writing processor must invalidate all other processors Reading processor must re-validate its cached state Compiler perspective: optimizations frequently re-arrange memory operations to hide latency These are guaranteed to be transparent, but only on a single processor
Caches 101 Caches store blocks of main memory Blocks are fairly small (perhaps 64 bytes) Each cache block exists in one of three states Invalid, shared, exclusive Memory operations causes the cache block to change states CPUs must communicate to implement cache block state changes
Cache Block State During a Coherence Operation Invalid Shared (read-only) Exclusive (read-write) Writing processor Reading processors
Some Terminology Publication: A CPU announces its updates to some or all of cache memory Fetch: A CPU loads that latest values for previously published updates
Hardware Support: Memory Fences (Barriers) No memory operation can be moved across a fence No operation after the fence appears before the fence No operation before the fence appears after the fence Several variants: Write fences (for publication) Read fences (for fetch) Read/write (total) fences
Sequential Consistency All writes are immediately published All reads fetch the latest value All processors agree on order of memory accesses Every operation is a fence Behaves like shuffling cards
Sequential Consistency Example A.x = 2; B.y = 3; C.x = 4; D.y = 5; Processor 1 Processor 2 A always appears before B C always appears before D A. x = 2; B. y = 3; C. x = 4; D. y = 5; C. x = 4; D. y = 5; A. x = 2; B. y = 3; C. x = 4; A. x = 2; D. y = 5; B. y = 3; A. x = 2; C. x = 4; D. y = 5; B. y = 3; A subset of legal orderings:
The Cost of Sequential Consistency Every write requires a complete cache invalidation Writing processor acquires exclusive access Writing processor sends an invalidation message Writing processor receives acknowledgements from all processors Expensive!
Relaxed Consistency Models Updates are published lazily Therefore, updates may appear out-of-order Challenge: Exposing a programming model that a human can understand
Release Consistency Observation: concurrent programs usually use proper synchronization “All shared, mutable state must be properly synchronized” It suffices to sync-up memory during synchronized operations Big performance win: the number of cache coherency operations scales with synchronization, not the number of loads and stores
synchronized (this) { x++; y++; } Fetch current values Publish new values Simple Example Within the critical section, updates can be re- ordered Without publication, updates may never be visible
Java Volatile Variables Java synchronized does double-duty It provides mutual exclusion, atomicity It ensures safe publication of updates Sometimes, we don’t want to pay the cost of mutual exclusion Volatile variables provide safe publication without mutual exclusion volatile int x = 7;
More on Volatile Updates to volatile fields are propagated immediately “Don’t cache me!” Effectively, this activates sequential consistency Volatile serves as a fence to the compiler and hardware Memory operations are not re-ordered around a volatile
Rule #1, Revised All shared, mutable state must be properly synchronized With a synchronized statement, an Atomic variable, or with volatile
Example: Lazy Initialization class Example { static List list = null; public static List getList () { if (list == null) { list = new LinkedList(); return list; } Need synchronization to ensure publication