Threads Cannot be Implemented as a Library Hans-J. Boehm.

Slides:

Advertisements

Similar presentations

Threads Cannot be Implemented As a Library Andrew Hobbs.

Advertisements

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

4.7.1 Thread Signal Delivery Two types of signals –Synchronous: Occur as a direct result of program execution Should be delivered to currently executing.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Describing Syntax and Semantics

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.

Threads Chapter 4. Modern Process & Thread –Process is an infrastructure in which execution takes place  (address space + resources) –Thread is a program.

Semantics of Multithreaded Java Jeremy Manson and William Pugh Background Material Jack Newton University of Alberta

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Evaluation of Memory Consistency Models in Titanium.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

Concurrency, Mutual Exclusion and Synchronization.

CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.

CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.

 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.

Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Java Thread and Memory Model

Threads Tutorial #7 CPSC 261. A thread is a virtual processor Each thread is provided the illusion that it owns a core – Copy of the registers – It is.

Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.

Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

Department of Computer Science and Software Engineering

CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

CS533 Concepts of Operating Systems Jonathan Walpole.

CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.

Operating System Concepts

Week 8, Class 3: Model-View-Controller Final Project Worth 2 labs Cleanup of Ducks Reducing coupling Finishing FactoryMethod Cleanup of Singleton SE-2811.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.

Chapter 4 – Thread Concepts

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

An Operational Approach to Relaxed Memory Models

Speculative Lock Elision

Memory Consistency Models

Threads Cannot Be Implemented As a Library

Chapter 4 – Thread Concepts

Memory Consistency Models

Specifying Multithreaded Java semantics for Program Verification

Threads and Memory Models Hal Perkins Autumn 2011

Chapter 26 Concurrency and Thread

Implementing synchronization

Introduction to High Performance Computing Lecture 20

Threads Chapter 4.

Threads and Memory Models Hal Perkins Autumn 2009

Shared Memory Consistency Models: A Tutorial

Memory Consistency Models

CSE 153 Design of Operating Systems Winter 19

Relaxed Consistency Finale

Programming with Shared Memory - 3 Recognizing parallelism

Programming with Shared Memory Specifying parallelism

Presentation transcript:

Threads Cannot be Implemented as a Library Hans-J. Boehm

About the Author Hans-J. Boehm –Boehm conservative garbage collector Parallel GC for C/C++ –Participated in revising the Java Memory Model –Co-authored the Memory model for multithreaded C++ –Compiler-centric background

Introduction Multi-threaded programs are ubiquitous –Many programs need to manage logically concurrent interactions Multiprocessors are becoming mainstream –Desktop computers support multiple hardware contexts, which makes them logically multiprocessors Multi-threaded programs are a good way to utilize increasing hardware parallelism

Thread support Threads included in language specification –Java –C# –Ada Multiple-threads not a part of language specification –C/C++ Thread support provided by add-on libraries –Posix threads Ptreads standard does not specify formal semantics for concurrency

Memory Model Which assignments to a variable by one thread can be seen by a concurrently executing thread Sequential Consistency v v –All actions occur in a total order (the execution order) that is consistent with program order; furthermore, each read r of a variable v sees the value written by the write w to v such that: w comes before r in the execution order, and rThere is no other write w´ such that w comes before w´ and w´ comes before r in the execution order Happens-Before –Simple version of java memory model, slightly too weak Weak –Allows for compiler optimizations

Surprising results caused by statement reordering r1 & r2 are local, A & B are shared Write in one thread Read of same variable in another thread Write and read are not ordered by synchronization -

Surprising results caused by statement reordering r1 & r2 are local, A & B are shared Write in one thread Read of same variable in another thread Write and read are not ordered by synchronization Race Condition!

Pthread approach Provided as add-on library Include hardware instructions to prevent reordering Avoid compiler reordering by appearing as an opaque function Require disciplined style of synchronization Valid 98% of the time –What about the other two percent??

Pthread correctness Apparently correct programs may fail intermittently –New compiler or hardware induced failure –Poor performance may force slight rule bending Difficult for programmer to reason about correctness Let’s see some examples why…..

Concurrent modification Pthread specifications prohibit races –But is this enough? x=y=0 if(x==1) ++y;++y; if(x!=1) –-y; if (y==1) ++x; ++x; if (y!=1) --x; Is x==1 y==1 acceptable? No for sequential consistent interpretation But, if the compiler makes the modifications on the right, there is a race! T1: T2:

Why threads cannot be implemented as a library Argument ( 1 ) –Since the compiler is unaware of threads, it is allowed to transform code subject only to sequential correctness constraints and produce a race But, example is kind of far-fetched

Rewriting of Adjacent Data Bit fields on a little endian 32-bit machine Concurrent write to memory location, not variable. Implementation of x.a=42 { tmp = x; tmp &= ~0x1ffff; //mask off old a tmp | 42; x = tmp; //replace x } struct {int a:17; int b:15 } x;

Rewriting of Adjacent Data Bit fields on a little endian 32-bit machine Concurrent write to memory location, not variable. Implementation of x.a=42 { tmp = x; tmp &= ~0x1ffff; //mask off old a tmp | 42; x = tmp; //replace x } struct {int a:17; int b:15 } x; Updates to x.b introduce a race

Why threads cannot be implemented as a library Argument ( 2 ) – For languages like C, if the specification does not define when adjacent data can be overwritten, then race conditions can be introduced. If so, then the compiler would know to avoid this optimization

Register promotion for(…) { … if (mt) pthread_mutex_lock(…); x = … x …. if ( mt) pthread_mutex_unlock(…); } r = x; for(…) { … if (mt) { x = r; pthread_mutex_lock(…); r = x; } r = … r …. if ( mt) { x = r; pthread_mutex_unlock(…); r = x; } x = r; x Repeatedly update globally shared variable x

Register promotion for(…) { … if (mt) pthread_mutex_lock(…); x = … x …. if ( mt) pthread_mutex_unlock(…); } r = x; for(…) { … if (mt) { x = r; pthread_mutex_lock(…); r = x; } r = … r …. if ( mt) { x = r; pthread_mutex_unlock(…); r = x; } x = r; x Repeatedly update globally shared variable x Using profile feedback or static heuristics x r it becomes beneficial to promote x to a register r in the loop

Register promotion for(…) { … if (mt) pthread_mutex_lock(…); x = … x …. if ( mt) pthread_mutex_unlock(…); } r = x; for(…) { … if (mt) { x = r; pthread_mutex_lock(…); r = x; } r = … r …. if ( mt) { x = r; pthread_mutex_unlock(…); r = x; } x = r; x Repeatedly update globally shared variable x Using profile feedback or static heuristics x r it becomes beneficial to promote x to a register r in the loop Thus Extra reads and writes introduce possible race conditions

Why threads cannot be implemented as a library Argument ( 3 ) –If the compiler is not aware of existence of threads, and a language specification does not address thread-specific semantic issues, then optimizations might cause race conditions

Implications Compilers forced into blanket removal of optimization in many cases Or perhaps a toned-down version of the optimization This can degrade performance of code that is not thread-specific

Sieve of Eratosthenes 10,000 10,002 10, ,005 10,007….. 100,000,000 false false false false false false true true false false false true true true true false false true true true true truefalse true true true true true prime true For(mp=start ; mp < 10,000 ; ++mp) if(!get(mp)) {.for(multiple = mp ; multiple <100,000,000 ; multiple+=mp).if(!get(multiple)).set(multiple); }

Synchronizing global array access For(mp=start ; mp < 10,000 ; ++mp) if(!get(mp)) {.for(multiple = mp ; multiple <100,000,000 ; multiple+=mp).if(!get(multiple)).set(multiple); } Mutex Spin-locks Non-blocking None

Performance results Pthreads library approaches (1)&(2) cannot reach optimal levels This algorithm is designed for a weak memory model, which is not possible using thread library

Performance results Similar results for hyper- threaded p4 processor Even more dramatic performance differences moving to a more parallel processor Itanium  HT P4

Additional Implications of Pthreads approach If we choose to allow concurrent accesses to concurrent variables, within library code –Unpredictable results can occur without language specifications x = 1; pthread_mutex_lock(lock); y = 1; pthread_mutex_unlock(lock); pthread_mutex_lock(lock); y = 1; x= 1; pthread_mutex_unlock(lock);

Additional Implications of Pthreads approach If we choose to allow concurrent accesses to concurrent variables, within library code –Unpredictable results can occur without language specifications x = 1; pthread_mutex_lock(lock); y = 1; pthread_mutex_unlock(lock); pthread_mutex_lock(lock); x = 1; y = 1; pthread_mutex_unlock(lock); Is this a problem??

Conclusion Compilers can introduce race conditions where there are none in source code –Library code cannot intervene Impossible to achieve the performance gains of a multiprocessor without direct fine-grained use of atomic operations –Which is impossible to do in a library based thread implementation Why not just use the java memory model –Designed to preserve type-safety –which C/C++ are not C++ needs it’s own memory model

REFERENCES JSR-133 Expert Group, “JSR-133: Java Memory Model and Thread Specification” Daniel P. Bovet,Marco Cesati, “Understanding the Linux Kernel 3 rd Edition” O’Reilly Sarita V. Adve, Kourosh Gharachorloo, “Shared Memory Consistency Models: A Tutorial” Digital Western Research Laboratory

Appendix Happens-Before

Appendix Section 5

Appendix Section 5(cont)