CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.

Slides:

Advertisements

Similar presentations

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Advertisements

Threads Cannot be Implemented As a Library Andrew Hobbs.

50.003: Elements of Software Construction Week 6 Thread Safety and Synchronization.

Process management Information maintained by OS for process management  process context  process control block OS virtualization of CPU for each process.

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

Outline CPU caches Cache coherence Placement of data Hardware synchronization instructions Correctness: Memory model & compiler Performance: Programming.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

Lecture 13: Consistency Models

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

Memory Consistency Models

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

More on Locks: Case Studies

CS510 Concurrent Systems Introduction to Concurrency.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.

Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Threads Cannot be Implemented as a Library Hans-J. Boehm.

Java Thread and Memory Model

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 5 th Lecture Pavel Ježek

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.

Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.

1 Written By: Adi Omari (Revised and corrected in 2012 by others) Memory Models CDP

Lecture 20: Consistency Models, TM

The C++11 Memory Model CDP

Outline CPU caches Cache coherence Placement of data

Concurrency 2 CS 2110 – Spring 2016.

Advanced .NET Programming I 11th Lecture

Speculative Lock Elision

Memory Consistency Models

Threads Cannot Be Implemented As a Library

Memory Consistency Models

Specifying Multithreaded Java semantics for Program Verification

Lecture 5: GPU Compute Architecture

Threads and Memory Models Hal Perkins Autumn 2011

Lecture 5: GPU Compute Architecture for the last time

Shared Memory Consistency Models: A Tutorial

The C++ Memory model Implementing synchronization)

Implementing synchronization

Synchronization Issues

COP 4600 Operating Systems Fall 2010

Threads and Memory Models Hal Perkins Autumn 2009

COT 5611 Operating Systems Design Principles Spring 2012

Lecture 22: Consistency Models, TM

Dr. Mustafa Cem Kasapbaşı

Memory Consistency Models

CSE 153 Design of Operating Systems Winter 19

CS333 Intro to Operating Systems

Compilers, Languages, and Memory Models

Programming with Shared Memory Specifying parallelism

Lecture 21: Synchronization & Consistency

Problems with Locks Andrew Whitaker CSE451.

Presentation transcript:

CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s atomic<> Weapons talk.atomic<> Weapons talk Created by Eran Gilad 1

 The guarantees provided by the runtime environment to a multithreaded program, regarding the order of memory operations.  Each level of the environment might have a different memory model – CPU, virtual machine, language.  The correctness of parallel algorithms depends on the memory model. 2

Isn’t the CPU memory model enough?  Threads are now part of the language. Their behavior must be fully defined.  Until now, different code was required for every compiler, OS and CPU combination.  Having a standard guarantees portability and simplifies the programmer’s work. 3

Given the following data race: Thread 1: x = 10; y = 20; Thread 2: if (y == 20) assert(x == 10) Pre-2011 C++ C++11 Huh? What’s a “Thread”? Namely – behavior is unspecified. Undefined Behavior. But now there’s a way to get it right. 4

if (x >= 0 && x < 3) { switch (x) { case 0: do0(); break; case 1: do1(); break; case 2: do2(); break; } Can the compiler convert the switch to a quick jump table, based on x’s value? At this point, on some other thread: x = 8; Switch jumps to unknown location. Your monitor catches fire. Optimizations that are safe on sequential programs might not be safe on parallel ones. So, races are forbidden! 5

Optimizations are crucial for increasing performance, but might be dangerous:  Compiler optimizations: ◦ Reordering to hide load latencies ◦ Using registers to avoid repeating loads and stores  CPU optimizations: ◦ Loose cache coherence protocols ◦ Out Of Order Execution  A thread must appear to execute serially to other threads. Optimizations must not be visible to other threads. 6

 Every variable of a scalar (simple) type occupies one memory location.  Bit fields are different.  One memory location must not be affected by writes to an adjacent memory location! 7 struct s { char c[4]; int i: 3, j: 4; struct in { double d; } id; }; Can’t read and write all 4 bytes when changing just one. Same mem. location Considered one mem. loc. even if hardware has no 64-bit atomic ops.

Sequenced before : The order imposed by the appearance of (most) statements in the code, for a single thread. 8 Thread 1 read x, 1 Sequenced Before write y, 2

Synchronized with : The order imposed by an atomic read of a value that has been atomically written by some other thread. 9 Thread 1 write y, 2 Thread 2 read y, 2 Synchronized with

Inter-thread happens-before : A combination of sequenced before and synchronized with. 10 Thread 1 read x, 1 Sequenced Before write y, 2 Thread 2 read y, 2 Sequenced Before write z, 3 Synchronized With

An event A happens-before an event B if: ◦ A inter-thread happens-before B or ◦ A is sequenced-before B. 11 Thread 1 (1) read x, 1 Happens before (2) write y, 2 Thread 2 (3) read y, 2 Happens before (4) write z, 3 Happens before

Use std::mutex and friends, or: 12 template struct atomic { void store(T val, memory_order m = memory_order_seq_cst); T load(memory_order m = memory_order_seq_cst); // etc... }; // and on some specializations of atomic: T operator++(); T operator+=(T val); bool compare_exchange_strong(T& expected, T desired, memory_order m = memory_order_seq_cst); // etc...

 Strongest ordering, default memory_order.  Atomic writes are globally ordered.  Compiler and processor optimizations are limited – no action can be reordered across an atomic action.  Atomic writes flush non-atomic writes that are sequenced-before them. 13

atomic x; int y; // not atomic void thread1() { y = 1; x.store(2); } void thread2() { if (x.load() == 2) assert(y == 1); } The assert is guaranteed not to fail. 14

Now, if you like to play with sharp tools… 15

 Synchronized-with relation exists only between the releasing thread and the acquiring thread.  Other threads might see updates in a different order.  All writes before a release are visible after an acquire, even if they were relaxed or non- atomic.  Similar to release consistency memory model. 16

#define rls memory_order_release /* save slide space… */ #define acq memory_order_acquire /* save slide space… */ atomic x, y; void thread1() { y.store(20, rls); } void thread2() { x.store(10, rls); } void thread3() { assert(y.load(acq) == 20 && x.load(acq) == 0); } void thread4() { assert(y.load(acq) == 0 && x.load(acq) == 10); } Both asserts might succeed – no order between writes to x and y. 17

 Happens-before now only exists between actions to the same variables.  Similar to cache coherence, but can be used on systems with incoherent caches.  Imposes fewer limitations on the compiler – atomic actions can now be reordered if they access different memory locations.  Less synchronization is required on the hardware side as well. 18

#define rlx memory_order_relaxed /* save slide space… */ atomic x, y; void thread1() { y.store(20, rlx); x.store(10, rlx); } void thread2() { if (x.load(rlx) == 10) { assert(y.load(rlx) == 20); y.store(10, rlx); } void thread3() { if (y.load(rlx) == 10) assert(x.load(rlx) == 10); } 19 So, what was removed? a) sequenced before b) synchronized with Both asserts might fail – no order between operations on x and y.

Let’s look at the trace of a run in which the assert failed (read 0 instead of 10 or 20): T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; 20

 Predates C++ multi-threading  Serves different purpose  However Java (and C#) volatile is very similar to C++ atomic. 21

 Lock-free code is hard to write. Unless speed is crucial, better use mutexes.  Use the default sequential consistency and avoid data races. Really. ◦ Acquire-release is hard ◦ Relaxed is much harder ◦ Mixing orders in insane  Only covered main orders.  C++11 is new. Not all features are supported by common compilers. 22