Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Threads Cannot be Implemented As a Library Andrew Hobbs.
Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Mutual Exclusion.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
CS444/CS544 Operating Systems Introduction to Synchronization 2/07/2007 Prof. Searleman
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
CPS110: Implementing threads/locks on a uni-processor Landon Cox.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Threads Cannot be Implemented as a Library Hans-J. Boehm.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
CS533 Concepts of Operating Systems Jonathan Walpole.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.
6/27/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Memory Consistency Models
Threads Cannot Be Implemented As a Library
Memory Consistency Models
143a discussion session week 3
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Designing Parallel Algorithms (Synchronization)
Chapter 26 Concurrency and Thread
The C++ Memory model Implementing synchronization)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Implementing synchronization
Threads and Memory Models Hal Perkins Autumn 2009
Computer Organization and Design Assembly & Compilation
Grades.
Shared Memory Consistency Models: A Tutorial
Chapter 6: Process Synchronization
Memory Consistency Models
Kernel Synchronization II
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
Relaxed Consistency Finale
Compilers, Languages, and Memory Models
Foundations and Definitions
Programming with Shared Memory Specifying parallelism
Synchronization CS Spring 2002.
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

Concurrency as a Library ●Many common languages are designed without native concurrency support. ●Consequently, many of these languages have libraries that provide locking primitives. ●This mostly works, but there are some unpleasant surprises that are non-obvious.

Fundamental Problems ●Many of the problems with multiprocessors lie at the hardware or compiler levels, and thus cannot be addressed with libraries. ●This requires a good understanding of underlying implementation to do this correctly.

Overview of sequential consistency ●View concurrent execution as an “interleaving of the steps from the threads.” x = 1; r1 = y; y = 1; r2 = x; ●One or both of r1 and r2 must have the value 1.

The real world ●This model is inefficient and disallows many types of optimizations. ●Nearly all modern architectures (like x86) allow the hardware to reorder memory operations. ●Compilers will generally reorder memory loads and stores for efficiency.

The Pthreads way ●Pthreads uses functions that guaranteed to “synchronize memory”. These include hardware instructions to prevent memory reordering. ● pthread_mutex_lock() and others ●The compiler treats them as black boxes which may touch global memory, so it won’t move memory across a function call.

Surprises ●This approach is not precise enough. ●Programs may fail intermittently, or when using a new compiler or hardware. ●Thread ordering is often non-deterministic, making this trickier to debug. ●In some cases, it excludes efficient algorithms

Consider this example if (x == 1) ++y; if (y == 1) ++x;

Consider this example Becomes ++y; if (x != 1) --y; ++x; if (y != 1) --x; Outcome of the later is not consistent.

Adjacent data struct { int a:17; int b:15 } x; becomes { tmp = x; // Read both fields into // 32-bit variable. tmp &= ~0x1ffff; // Mask off old a. tmp |= 42; x = tmp; // Overwrite all of x. }

Adjacent data struct { char a; char b; char c; char d; char e; char f; char g; char h; } x; Let’s say you wish to assign each variable independently: x.b = ’b’; x.c = ’c’; x.d = ’d’; x.e = ’e’; x.f = ’f’; x.g = ’g’; x.h = ’h’;

Adjacent data Instead of assigning each variable independently, the compiler may compress that to. x = ’hgfedcb\0’ | x.a; Pthreads allows this, but this may lead to non- portable code. Why?

Register promotion ●It is unsafe to promote variables in a critical section to places outside the critical section.

Register Promotion Example for (...) {... if (mt) pthread_mutex_lock(...); x =... x... if (mt) pthread_mutex_unlock(...); } This gets transformed to

Register Promotion Example r = x; for (...) {... if (mt) { x = r; pthread_mutex_lock(...); r = x; } r =... r... if (mt) { x = r; pthread_mutex_unlock(...); r = x; } x = r;

Performance Issues ●Pthreads assumes that memory operation order is irrelevant unless explicitly specified by the coder. ●This strategy precludes lock-free and wait- free strategies. ●The performance of these strategies is lost when reordering is restricted.