“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

Slides:



Advertisements
Similar presentations
Threads Cannot be Implemented As a Library Andrew Hobbs.
Advertisements

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Chapter 6: Process Synchronization
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Mutual Exclusion.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任 指導教授 : 楊武 Abstract Multicore.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
CS533 Concepts of Operating Systems Class 3 Monitors.
Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4.
6: Process Synchronization 1 1 PROCESS SYNCHRONIZATION I This is about getting processes to coordinate with each other. How do processes work with resources.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
CS533 - Concepts of Operating Systems 1 Class Discussion.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Evaluation of Memory Consistency Models in Titanium.
Concurrency, Mutual Exclusion and Synchronization.
CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Threads Cannot be Implemented as a Library Hans-J. Boehm.
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 5 th Lecture Pavel Ježek
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Advanced .NET Programming I 11th Lecture
Software Coherence Management on Non-Coherent-Cache Multicores
Threads Cannot Be Implemented As a Library
Atomic Operations in Hardware
Atomic Operations in Hardware
Atomicity CS 2110 – Fall 2017.
Chapter 5: Process Synchronization
Specifying Multithreaded Java semantics for Program Verification
Lecture 5: GPU Compute Architecture
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 5: GPU Compute Architecture for the last time
The C++ Memory model Implementing synchronization)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Implementing synchronization
Threads and Memory Models Hal Perkins Autumn 2009
Kernel Synchronization II
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
Relaxed Consistency Finale
Programming with Shared Memory Specifying parallelism
Problems with Locks Andrew Whitaker CSE451.
CSE 542: Operating Systems
Presentation transcript:

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510

Premise Multithreaded programs are often written in languages that are not designed with threading primitives. Thread support in these languages are added through thread libraries (Eg: Pthread for C/C++). Authors highlight that without a compiler designed explicitly to handle threads/concurrency through language semantics, correctness of code can’t be guaranteed.

Sequential Consistency Consider the code: initially with x =0 and y =0; Sequential consistency: Assignment of either x=1 or y=1 must happen first before assigning r1/r2 to x/y. So, value of either r1 or r2 should be >0; Thread #1Thread #2 x = 1;y = 1; r1 = y;r2 = x;

Sequential consistency (cont…) This memory model is very restrictive and not true for many modern architectures Compilers can reorder memory operations if they don’t violate intra- thread dependencies; Hardware can reorder memory operations as well to improve performance. r1 and r2 can both be equal to 0 Thread #1Thread #2 r1 = y;r2 = x; x = 1;y = 1;

Concurrency and Language Languages such as C/C++ don’t include threads as part of the language specifications. Hence, No concurrency support. Compiler reorders memory instructions without knowing concurrency impact to improve performance which might lead to race conditions.

Pthread library C/C++ languages support thread using Pthread library. But Pthread standard doesn’t specify how language should handle concurrency and delegates to the developer to ensure that all shared memory location access to be synchronized using locks. pthread_mutex_lock – to acquire a lock pthread_mutex_unlock – to release a lock

Programmer’s dilemma The onus is now on application developer to ensure correctness of the software by applying locks at appropriate places. How to determine when the memory reordering will occur leading to race conditions? This requires formal memory model which the C/C++ language doesn’t provide. Thus making this problem circular.

Disabling Optimizations In practice, C/C++ implementations supporting Pthread Use locking functions that include hardware instructions (“memory barriers”) that prevent hardware reordering of memory operations around the call. Disable compiler optimization to prevent compiler from reordering memory operations around locking calls. This works but NOT always! Race conditions are still possible.

Correctness Issue 1:Concurrent modification Scenario #1: If x =0, y=0 initially, Expected final value: x = 0 and y =0 as well. Scenario #2: If compiler reorders memory instructions Here x and y values can’t be guaranteed to be 0 in concurrent environment. Thread #1Thread #2 If (x == 1) ++y;If (y==1) ++x; Thread #1Thread #2 ++y; If (x != 1) { --y;} ++x; If (y!=1) { --x;} Compiler assumes no concurrency leading to race conditions

Solution to the issue Programmer’s action: Programmer can avoid race condition by placing locks that will prevent the compiler from optimizing the code Compiler’s action: Compiler can avoid reordering the code if it’s aware of the concurrency issues that can arise due to threads. In both the cases, change in the language specification is needed i.e Memory model and concurrency semantics addition

Correctness issue 2:Adjacent Data overwrite Consider the code: struct { int a:17; int b:15;} x; { tmp =x; // Store it to temp variable tmp &= ~0x1ffff; // Mask the lower 17 bits tmp |= 42; // Adding the value of 42 to variable x=tmp; // updating the entire 32-bits into x } Byte 0Byte 1Byte 2Byte 3 b = 15 bits a = 17 bits Lets update x.a = 42; At this instant, another thread sets x.b = tmp x Original x.b is lost

Correctness issue 2:Adjacent Data overwrite (cont…) Even if x.a and x.b are protected by separate locks, the overwrite will happen Programmer has to acquire both locks for x.a and x.b before update How can the programmer know about this? Language specification needs to be define clearly when adjacent bit-fields can be overwritten How many adjacent bit-fields will be copied depends if architecture allows bit, byte or word sized copies. Optimizing for such architecture will make the SW non-portable.

Correctness issue 3: Register promotion Compiler optimizations can introduce race conditions that were not in source for (…) { if (mt) { pthread_mutex_lock(); } x = … x …; if(mt){ pthread_mutex_unlock(); } r = x; // extra reads/ writes introduced for (…) { if (mt) { x = r; pthread_mutex_lock(); r= x; } r = … r …; if(mt){ x = r; pthread_mutex_unlock(); r=x;} } x = r;

Correctness issue 3: Register promotion Compiler promotes the frequently updated variable within the loop to a register for better performance. The extra read and write operations on the variable are introduced when the lock is not held. This can introduces race condition despite not being in source code. Compiler being cognizant of threads can avoid this register promotion optimization thereby preventing race.

Performance Pthreads standard advocate concurrent access to shared variable be protected by using mutual exclusion. Mutual exclusion locks utilize hardware atomic instructions that are much more expensive than normal operations. This has an impact on the performance of the concurrent SW. Author suggests C/C++ language specification should support lock-free programming methods as well.

Performance comparison Graphs show that performance of different concurrency techniques for Sieve of Eratosthenes algorithm. Non-blocking methods using atomic operations provide the biggest performance boost. Sieve execution time for bit array(sec)HT P4 time for byte array (sec)

Conclusion Compilers need to be aware of threads for correctness of concurrent programs Threads as libraries cannot guarantee correctness. Memory model should be part of language specification. Potential opportunity to increase performance on multiprocessor system using lock-free methods. Joint effort from Boehm et. al to address these issues in C++ standard.

Attributions Prof. Jonathan Walpole, from CS510-Spring 2010