Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Threads Cannot be Implemented As a Library Andrew Hobbs.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Mutual Exclusion.
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任 指導教授 : 楊武 Abstract Multicore.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Introductory Comments Regarding Hardware Description Languages.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace.
Discussion Week 3 TA: Kyle Dewey. Overview Concurrency overview Synchronization primitives Semaphores Locks Conditions Project #1.
OSE 2013 – synchronization (lec3) 1 Operating Systems Engineering Locking & Synchronization [chapter #4] By Dan Tsafrir,
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.
Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Threads Cannot be Implemented as a Library Hans-J. Boehm.
Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
CS533 Concepts of Operating Systems Jonathan Walpole.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Review of the numeration systems The hardware/software representation of the computer and the coverage of that representation by this course. What is the.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:
The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Lecture 20: Consistency Models, TM
An Operational Approach to Relaxed Memory Models
Memory Consistency Models
Threads Cannot Be Implemented As a Library
Lecture 11: Consistency Models
Memory Consistency Models
Threads and Memory Models Hal Perkins Autumn 2011
Changing thread semantics
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Memory Consistency Models
Amir Kamil and Katherine Yelick
CSE 153 Design of Operating Systems Winter 19
Relaxed Consistency Finale
Compilers, Languages, and Memory Models
Lecture: Consistency Models, TM
Presentation transcript:

Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4

Assignment : Dining Phil code Some versions of Dining Phil have data races What are races? Why are they harmful? Are they always harmful? – P1 : temp = shared-x – P2 : x = 1 versus – the same codes inside a single lock/unlock In this case, the atomicity of the locations gives the same computational semantics Be sure of the atomicity being assumed!

Why we should know memory models Not very intuitive Takes time to sink in – Something as important as this stays with one only through repeated exposures Other classes do not give emphasis – They attempt to sweep things under the rug – They are playing ‘head in the sand’! While it is like a grain of sand, its presence under the eye-lid or in a ball-bearing is what mem models are akin to… – This is dangerous! Stifles understanding We are in a world where even basic rules are being broken – Academia is about not buying into decrees e.g. “goto”s always harmful?

Why we should know memory models Clearly, success in multi-core programming depends on having high-level primitives Unfortunately nobody has a clue as to which high level primitives “work” – are safe and predictable – are efficient Offering an inefficient high-level primitive does more damage – People will swing clear back to a much lower primitive!

Why we should know memory models Till we form a good shared understanding of which high level primitives work well, we must be prepared to evaluate the low level effects of existing high level primitives The added surprises that compilers throw in can cause such non-intuitive outcomes that we had better know that they exist, and solve issues when they arise

Why we should know memory models Locks are expensive – Performance and energy If lock-free code works superbly faster, and there is an alternate (lock-free) reasoning to explain such behaviors, clearly one must entertain such thoughts – Need all tools in one’s kit HW costs are becoming very skewed – Attend Uri Weiser’s talk Feb 12 th Finally, we need to understand what tools such as Inspect are actually doing!

Where mem models mattered PCI bus ordering (producer/consumer broken) Holzmann’s experience in multi-core SPIN Our class experiments OpenMP mem model in conflict with Gcc mem model In understanding architectural consequences – Hit-under-miss optimization in speculative execution (in snoopy busses such as HP Runway)

On “HW / SW” split Till the dust settles (if at all) in multi-core computing, you had better be interested in HW and SW matters – HW matters – C-like low level behavior matters Later we will learn whether “comfortable” abstractions such as C# / Java are viable Of course when programming in the large, we will prefer such high level views; when understanding concepts, however, we need all the “nuts and bolts” exposed…

Boehm’s points Threads are going to be increasingly used We focus on languages such as C/C++ where threads are not built into the language – but are provided through add-on libraries Ability to program in C/Pthreads comes through ‘painful experience’ – not through strict adherence to standards This paper is an attempt to ameliorate that

Page 2: Thread lib, lang, compiler … Thread semantics cannot be argued purely within the context of the libraries They involve the – compiler semantics – language semantics (together the “software” or “language” mem model) Disciplined use of concurrency thru thread APIs is OK for 98% of the users But need to know the 2% uses outside.. esp in a world where we rely on MP systems for performance

P2 S3: Pthread Approach to Concur. Seq consistency is the intuitive model Too expensive to implement as such – x = 1 ; r1 = y; – y = 1 ; r2 = x; – final value of x=y=0 is allowed (and is what happens today) Compilers may reorder subject to intra-thread dependencies HW may reorder subject to intra-thread dependencies

P2 S3: Pthread silent on mem model semantics ; reasons: Many don’t understand So they preferred “simple rules” Instead, it “decrees” : – Synchronize thread execution using mutex_lock, mutex_unlock – then it is expected that no two threads race on a single location (Java is more precise even about racing semantics)

P2 S3: Pthread silent on mem model semantics ; reasons: In practice, mutex_lock etc contain memory barries (fences) that prevent HW reordering around the call Calls to mutex_lock etc treated as opaque function calls – No instructions can be moved across If f() calls mutex_lock(), even f() is treated as such Unfortunately, many real systems intentionally or unknowingly violate these rules

P4 S4: Correctness Issues Consider this program – if (x==1) ++y; – if (y==1) ++x; – Is (x==1, y==1) acceptable? Is there a race? Not under SC! However if the compiler transforms the code to – ++y ; if (x != 1) –y; – ++x ; if (y != 1) –x; – then there is a race / x==1, y==1 is allowed… is a possible conclusion (or say the semantics are undefined)

P5 S4.2 Rewriting of adjacent data Bugs of this type actually have arisen struct (int a:17, int b:15} x Now realise “x.a=42” as {tmp =x; tmp &= ~0x1fff; tmp |= 42; x=tmp; } Introduces an “unintended” write of b also! OK for sequential But in concurrent setting, a concurrent “b” update could now race !! Race is not “seen” at source level!

P5 : another example struct {char a; char b; … char h; } x x.b = ‘b’; x.c=‘c’; … ; x.h = ‘h’; can be realized as x = ‘hgfedcb\0’ | x.a Now if you protect “a” with one lock and “b thru h” with another lock”, you are hosed – there is a data race! C should define when adjacent data may be over- written

P5/P6 : register promotion Compilers must be aware of existence of threads Consider the code optimized to speed up for serial case for(..){ – if (mt) lock(…); – x =..x… – if (mt) unlock(..); – }

P5/P6 : register promotion for(..){ – if (mt) lock(…); – x =..x… – if (mt) unlock(..); } can be optimized according to Pthread rules to r=x; for(…) {.. if (mt) { x=r; lock(…); r=x; } r = …x…; if (mt) { x=r; unlock(..); r=x; } } x=r; Fully broken – reads/writes to x without holding lock!

avoiding expensive synch. for(mp = start; mp<10^4; ++mp) if (!get(mp)) { for (mult=mp; mult<10^8; mult +=mp) if (!get(mult)) set(mult) Sieve algo Benefits from races !!