Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Similar presentations


Presentation on theme: "Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451."— Presentation transcript:

1 Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451

2 What Does This Program Print? 1.public class VisiblityExample extends Thread { 2. private static int x = 1; 3. private static int y = 1; 4. private static boolean ready = false; 5. public static void main(String[] args) { 6. Thread t = new VisiblityExample(); 7. t.start(); 8. 9. x = 2; 10. y = 2; 11. ready = true; 12. } 13. public void run() { 14. while (! ready) 15. Thread.yield(); // give up the processor 16. System.out.println(“x= “ + x + “ y= “ + y); 17. } 18.}

3 Answer It’s a race condition. Many different outputs are possible:  x=2, y=2  x=1,y=2  x=2,y=1  x=1,y=1  Or, the program may print nothing! The ready loop runs forever

4 What’s Going on Here? Processor caches ($) can get out-of-sync CPU $ Memory CPU $ $ $

5 1.// Not real code; for illustration purposes only 2.public class Example extends Thread { 3. private static final int NUM_PROCESSORS = 4; 4. private static int x[NUM_PROCESSORS]; 5. private static int y[NUM_PROCESSORS]; 6. private static boolean ready[NUM_PROCESSORS]; 7.// … A Mental Model Every thread/processor has its own copy of every variable  Yikes!

6 Two Issues Cache coherence  Do caches eventually converge on the same state All modern caches are coherent Cache consistency  When are operations by one processor visible on other processors? Sometimes called “publication”  How much re-ordering is possible across processors?

7 Subjective View of Cache Consistency Strategies Fast and scalable Amount of reordering Relaxed Strict

8 Factors Pushing Towards Relaxed Consistency Models Hardware perspective: consistency operations are expensive  Writing processor must invalidate all other processors  Reading processor must re-validate its cached state Compiler perspective: optimizations frequently re-arrange memory operations to hide latency  These are guaranteed to be transparent, but only on a single processor

9 Caches 101 Caches store blocks of main memory  Blocks are fairly small (perhaps 64 bytes) Each cache block exists in one of three states  Invalid, shared, exclusive Memory operations causes the cache block to change states CPUs must communicate to implement cache block state changes

10 Cache Block State During a Coherence Operation Invalid Shared (read-only) Exclusive (read-write) Writing processor Reading processors

11 Some Terminology Publication: A CPU announces its updates to some or all of cache memory Fetch: A CPU loads that latest values for previously published updates

12 Hardware Support: Memory Fences (Barriers) No memory operation can be moved across a fence  No operation after the fence appears before the fence  No operation before the fence appears after the fence Several variants:  Write fences (for publication)  Read fences (for fetch)  Read/write (total) fences

13 Sequential Consistency All writes are immediately published All reads fetch the latest value All processors agree on order of memory accesses  Every operation is a fence Behaves like shuffling cards

14 Sequential Consistency Example A.x = 2; B.y = 3; C.x = 4; D.y = 5; Processor 1 Processor 2 A always appears before B C always appears before D A. x = 2; B. y = 3; C. x = 4; D. y = 5; C. x = 4; D. y = 5; A. x = 2; B. y = 3; C. x = 4; A. x = 2; D. y = 5; B. y = 3; A. x = 2; C. x = 4; D. y = 5; B. y = 3; A subset of legal orderings:

15 The Cost of Sequential Consistency Every write requires a complete cache invalidation  Writing processor acquires exclusive access  Writing processor sends an invalidation message  Writing processor receives acknowledgements from all processors Expensive!

16 Relaxed Consistency Models Updates are published lazily  Therefore, updates may appear out-of-order Challenge: Exposing a programming model that a human can understand

17 Release Consistency Observation: concurrent programs usually use proper synchronization  “All shared, mutable state must be properly synchronized” It suffices to sync-up memory during synchronized operations Big performance win: the number of cache coherency operations scales with synchronization, not the number of loads and stores

18 synchronized (this) { x++; y++; } Fetch current values Publish new values Simple Example Within the critical section, updates can be re- ordered Without publication, updates may never be visible

19 Java Volatile Variables Java synchronized does double-duty  It provides mutual exclusion, atomicity  It ensures safe publication of updates Sometimes, we don’t want to pay the cost of mutual exclusion Volatile variables provide safe publication without mutual exclusion volatile int x = 7;

20 More on Volatile Updates to volatile fields are propagated immediately  “Don’t cache me!”  Effectively, this activates sequential consistency Volatile serves as a fence to the compiler and hardware  Memory operations are not re-ordered around a volatile

21 Rule #1, Revised All shared, mutable state must be properly synchronized  With a synchronized statement, an Atomic variable, or with volatile

22 Example: Lazy Initialization class Example { static List list = null; public static List getList () { if (list == null) { list = new LinkedList(); return list; } Need synchronization to ensure publication


Download ppt "Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451."

Similar presentations


Ads by Google