Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Slides:

Advertisements

Similar presentations

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Advertisements

50.003: Elements of Software Construction Week 6 Thread Safety and Synchronization.

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Concurrent Programming Abstraction & Java Threads

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

CSE 502: Computer Architecture

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.

CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.

Concurrency 101 Shared state. Part 1: General Concepts 2.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Threading Part 2 CS221 – 4/22/09. Where We Left Off Simple Threads Program: – Start a worker thread from the Main thread – Worker thread prints messages.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

Synchronization in Java Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

02/17/2010CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

02/19/2007CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.

CompSci 143A1 9. Linking and Sharing 9.1 Single-Copy Sharing –Why Share –Requirements for Sharing –Linking and Sharing 9.2 Sharing in Systems without Virtual.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

CS510 Concurrent Systems Introduction to Concurrency.

Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.

CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Sharing Objects  Synchronization  Atomicity  Specifying critical sections  Memory visibility  One thread’s modification seen by the other  Visibility.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Java Thread and Memory Model

December 1, 2006©2006 Craig Zilles1 Threads and Cache Coherence in Hardware  Previously, we introduced multi-cores. —Today we’ll look at issues related.

Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.

CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 5 th Lecture Pavel Ježek

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

1 Based on: The art of multiprocessor programming Maurice Herlihy and Nir Shavit, 2008 Appendix A – Software Basics Appendix B – Hardware Basics Introduction.

System Programming Practical Session 4: Concurrency / Safety.

CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.

CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.

Concurrency (Threads) Threads allow you to do tasks in parallel. In an unthreaded program, you code is executed procedurally from start to finish. In a.

EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.

Concurrency 2 CS 2110 – Spring 2016.

Advanced .NET Programming I 11th Lecture

Software Coherence Management on Non-Coherent-Cache Multicores

Multi-processor Scheduling

Memory Consistency Models

Atomic Operations in Hardware

Atomic Operations in Hardware

Memory Consistency Models

Atomicity CS 2110 – Fall 2017.

Multithreaded Programming in Java

Cache Coherence Protocols 15th April, 2006

Condition Variables and Producer/Consumer

Shared Memory Consistency Models: A Tutorial

Implementing synchronization

Condition Variables and Producer/Consumer

More on Thread Safety CSE451 Andrew Whitaker.

CSE 153 Design of Operating Systems Winter 19

Programming with Shared Memory Specifying parallelism

Problems with Locks Andrew Whitaker CSE451.

Threads CSE451 Andrew Whitaker TODO: print handouts for AspectRatio.

Presentation transcript:

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451

What Does This Program Print? 1.public class VisiblityExample extends Thread { 2. private static int x = 1; 3. private static int y = 1; 4. private static boolean ready = false; 5. public static void main(String[] args) { 6. Thread t = new VisiblityExample(); 7. t.start(); x = 2; 10. y = 2; 11. ready = true; 12. } 13. public void run() { 14. while (! ready) 15. Thread.yield(); // give up the processor 16. System.out.println(“x= “ + x + “ y= “ + y); 17. } 18.}

Answer It’s a race condition. Many different outputs are possible:  x=2, y=2  x=1,y=2  x=2,y=1  x=1,y=1  Or, the program may print nothing! The ready loop runs forever

What’s Going on Here? Processor caches ($) can get out-of-sync CPU $ Memory CPU $ $ $

1.// Not real code; for illustration purposes only 2.public class Example extends Thread { 3. private static final int NUM_PROCESSORS = 4; 4. private static int x[NUM_PROCESSORS]; 5. private static int y[NUM_PROCESSORS]; 6. private static boolean ready[NUM_PROCESSORS]; 7.// … A Mental Model Every thread/processor has its own copy of every variable  Yikes!

Two Issues Cache coherence  Do caches eventually converge on the same state All modern caches are coherent Cache consistency  When are operations by one processor visible on other processors? Sometimes called “publication”  How much re-ordering is possible across processors?

Subjective View of Cache Consistency Strategies Fast and scalable Amount of reordering Relaxed Strict

Factors Pushing Towards Relaxed Consistency Models Hardware perspective: consistency operations are expensive  Writing processor must invalidate all other processors  Reading processor must re-validate its cached state Compiler perspective: optimizations frequently re-arrange memory operations to hide latency  These are guaranteed to be transparent, but only on a single processor

Caches 101 Caches store blocks of main memory  Blocks are fairly small (perhaps 64 bytes) Each cache block exists in one of three states  Invalid, shared, exclusive Memory operations causes the cache block to change states CPUs must communicate to implement cache block state changes

Cache Block State During a Coherence Operation Invalid Shared (read-only) Exclusive (read-write) Writing processor Reading processors

Some Terminology Publication: A CPU announces its updates to some or all of cache memory Fetch: A CPU loads that latest values for previously published updates

Hardware Support: Memory Fences (Barriers) No memory operation can be moved across a fence  No operation after the fence appears before the fence  No operation before the fence appears after the fence Several variants:  Write fences (for publication)  Read fences (for fetch)  Read/write (total) fences

Sequential Consistency All writes are immediately published All reads fetch the latest value All processors agree on order of memory accesses  Every operation is a fence Behaves like shuffling cards

Sequential Consistency Example A.x = 2; B.y = 3; C.x = 4; D.y = 5; Processor 1 Processor 2 A always appears before B C always appears before D A. x = 2; B. y = 3; C. x = 4; D. y = 5; C. x = 4; D. y = 5; A. x = 2; B. y = 3; C. x = 4; A. x = 2; D. y = 5; B. y = 3; A. x = 2; C. x = 4; D. y = 5; B. y = 3; A subset of legal orderings:

The Cost of Sequential Consistency Every write requires a complete cache invalidation  Writing processor acquires exclusive access  Writing processor sends an invalidation message  Writing processor receives acknowledgements from all processors Expensive!

Relaxed Consistency Models Updates are published lazily  Therefore, updates may appear out-of-order Challenge: Exposing a programming model that a human can understand

Release Consistency Observation: concurrent programs usually use proper synchronization  “All shared, mutable state must be properly synchronized” It suffices to sync-up memory during synchronized operations Big performance win: the number of cache coherency operations scales with synchronization, not the number of loads and stores

synchronized (this) { x++; y++; } Fetch current values Publish new values Simple Example Within the critical section, updates can be reordered Without publication, updates may never be visible

Java Volatile Variables Java synchronized does double-duty  It provides mutual exclusion, atomicity  It ensures safe publication of updates Sometimes, we don’t want to pay the cost of mutual exclusion Volatile variables provide safe publication without mutual exclusion volatile int x = 7;

More on Volatile Updates to volatile fields are propagated immediately  “Don’t cache me!”  Effectively, this activates sequential consistency Volatile serves as a fence to the compiler and hardware  Memory operations are not re-ordered around a volatile

Rule #1, Revised All shared, mutable state must be properly synchronized  With a synchronized statement, an Atomic variable, or with volatile

Example: Lazy Initialization class Example { static List list = null; public static List getList () { if (list == null) { list = new LinkedList(); return list; } Need synchronization to ensure publication