Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4.

Similar presentations

Presentation on theme: "Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4."— Presentation transcript:

1 Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4

2 Assignment : Dining Phil code Some versions of Dining Phil have data races What are races? Why are they harmful? Are they always harmful? – P1 : temp = shared-x – P2 : x = 1 versus – the same codes inside a single lock/unlock In this case, the atomicity of the locations gives the same computational semantics Be sure of the atomicity being assumed!

3 Why we should know memory models Not very intuitive Takes time to sink in – Something as important as this stays with one only through repeated exposures Other classes do not give emphasis – They attempt to sweep things under the rug – They are playing ‘head in the sand’! While it is like a grain of sand, its presence under the eye-lid or in a ball-bearing is what mem models are akin to… – This is dangerous! Stifles understanding We are in a world where even basic rules are being broken – Academia is about not buying into decrees e.g. “goto”s always harmful?

4 Why we should know memory models Clearly, success in multi-core programming depends on having high-level primitives Unfortunately nobody has a clue as to which high level primitives “work” – are safe and predictable – are efficient Offering an inefficient high-level primitive does more damage – People will swing clear back to a much lower primitive!

5 Why we should know memory models Till we form a good shared understanding of which high level primitives work well, we must be prepared to evaluate the low level effects of existing high level primitives The added surprises that compilers throw in can cause such non-intuitive outcomes that we had better know that they exist, and solve issues when they arise

6 Why we should know memory models Locks are expensive – Performance and energy If lock-free code works superbly faster, and there is an alternate (lock-free) reasoning to explain such behaviors, clearly one must entertain such thoughts – Need all tools in one’s kit HW costs are becoming very skewed – Attend Uri Weiser’s talk Feb 12 th Finally, we need to understand what tools such as Inspect are actually doing!

7 Where mem models mattered PCI bus ordering (producer/consumer broken) Holzmann’s experience in multi-core SPIN Our class experiments OpenMP mem model in conflict with Gcc mem model In understanding architectural consequences – Hit-under-miss optimization in speculative execution (in snoopy busses such as HP Runway)

8 On “HW / SW” split Till the dust settles (if at all) in multi-core computing, you had better be interested in HW and SW matters – HW matters – C-like low level behavior matters Later we will learn whether “comfortable” abstractions such as C# / Java are viable Of course when programming in the large, we will prefer such high level views; when understanding concepts, however, we need all the “nuts and bolts” exposed…

9 Boehm’s points Threads are going to be increasingly used We focus on languages such as C/C++ where threads are not built into the language – but are provided through add-on libraries Ability to program in C/Pthreads comes through ‘painful experience’ – not through strict adherence to standards This paper is an attempt to ameliorate that

10 Page 2: Thread lib, lang, compiler … Thread semantics cannot be argued purely within the context of the libraries They involve the – compiler semantics – language semantics (together the “software” or “language” mem model) Disciplined use of concurrency thru thread APIs is OK for 98% of the users But need to know the 2% uses outside.. esp in a world where we rely on MP systems for performance

11 P2 S3: Pthread Approach to Concur. Seq consistency is the intuitive model Too expensive to implement as such – x = 1 ; r1 = y; – y = 1 ; r2 = x; – final value of x=y=0 is allowed (and is what happens today) Compilers may reorder subject to intra-thread dependencies HW may reorder subject to intra-thread dependencies

12 P2 S3: Pthread silent on mem model semantics ; reasons: Many don’t understand So they preferred “simple rules” Instead, it “decrees” : – Synchronize thread execution using mutex_lock, mutex_unlock – then it is expected that no two threads race on a single location (Java is more precise even about racing semantics)

13 P2 S3: Pthread silent on mem model semantics ; reasons: In practice, mutex_lock etc contain memory barries (fences) that prevent HW reordering around the call Calls to mutex_lock etc treated as opaque function calls – No instructions can be moved across If f() calls mutex_lock(), even f() is treated as such Unfortunately, many real systems intentionally or unknowingly violate these rules

14 P4 S4: Correctness Issues Consider this program – if (x==1) ++y; – if (y==1) ++x; – Is (x==1, y==1) acceptable? Is there a race? Not under SC! However if the compiler transforms the code to – ++y ; if (x != 1) –y; – ++x ; if (y != 1) –x; – then there is a race / x==1, y==1 is allowed… is a possible conclusion (or say the semantics are undefined)

15 P5 S4.2 Rewriting of adjacent data Bugs of this type actually have arisen struct (int a:17, int b:15} x Now realise “x.a=42” as {tmp =x; tmp &= ~0x1fff; tmp |= 42; x=tmp; } Introduces an “unintended” write of b also! OK for sequential But in concurrent setting, a concurrent “b” update could now race !! Race is not “seen” at source level!

16 P5 : another example struct {char a; char b; … char h; } x x.b = ‘b’; x.c=‘c’; … ; x.h = ‘h’; can be realized as x = ‘hgfedcb\0’ | x.a Now if you protect “a” with one lock and “b thru h” with another lock”, you are hosed – there is a data race! C should define when adjacent data may be over- written

17 P5/P6 : register promotion Compilers must be aware of existence of threads Consider the code optimized to speed up for serial case for(..){ – if (mt) lock(…); – x =..x… – if (mt) unlock(..); – }

18 P5/P6 : register promotion for(..){ – if (mt) lock(…); – x =..x… – if (mt) unlock(..); } can be optimized according to Pthread rules to r=x; for(…) {.. if (mt) { x=r; lock(…); r=x; } r = …x…; if (mt) { x=r; unlock(..); r=x; } } x=r; Fully broken – reads/writes to x without holding lock!

19 avoiding expensive synch. for(mp = start; mp<10^4; ++mp) if (!get(mp)) { for (mult=mp; mult<10^8; mult +=mp) if (!get(mult)) set(mult) Sieve algo Benefits from races !!

Download ppt "Summary of Boehm’s “threads … as a library” + other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4."

Similar presentations

Ads by Google