Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster.

Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster

2 Table of Contents What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected? Feasible and Apparent Data-Races Complexity of Data-Race Detection Program Execution Model Complexity of Computing Ordering Relations Proof of NP/Co-NP Hardness

3 Table of Contents Cont. So How Data-Races Can be Detected? Lamport’s Happens-Before Approximation Approaches to Detection of Apparent Data- Races: Static Methods Dynamic Methods: Post-Mortem Methods On-The-Fly Methods

4 Table of Contents Cont. Closer Look at Dynamic Methods: DJIT Local Time Frames Vector Time Frames Predicate for Data-Race Detection Which Accesses to Check? Which Time Frames to Check? Access History First Data-Race Results

5 Table of Contents Cont. Lockset Locking Discipline The Basic Algorithm Improving Locking Discipline Initialization Read-Sharing Refinement for Read-Write Locks False Alarms Results Summary References

6 What is a Data-Race? Data-race is an anomaly of concurrent accesses by two or more threads to a shared variable and at least one is for writing. Example (variable X is global and shared): Thread 1Thread 2 X=1T=Y Z=2T=X

7 Why Data-Races are Undesired? Programs which contain data-races usually demonstrate unexpected and even nondeterministic behavior. The outcome might depend on specific execution order (A.K.A threads’ interleaving). Re-running the program may not always produce the same results. Thus, hard to debug and hard to write correct programs.

8 Why Data-Races are Undesired? - Example First Interleaving:Thread 1Thread 2 1.X=0 2.T=X 3.X++ Second Interleaving:Thread 1Thread 2 1.X=0 2.X++ 3.T=X T==0 or T==1?

9 Execution Order Each thread has a different execution speed, which may change over time. For an external observer of the time axis, instructions’ execution is ordered in execution order. Any order is legal. Execution order for a single thread is called program order. Time T1 T2

10 How Data-Races Can be Prevented? – Explicit Synchronization Idea: In order to prevent undesired concurrent accesses to shared locations, we must explicitly synchronize between threads. The means for explicit synchronization are: Locks, Mutexes and Critical Sections Barriers Binary Semaphores and Counting Semaphores Monitors Single-Writer/Multiple-Readers (SWMR) Locks Others

11 Synchronization – “Bad” Bank Account Example Thread 1 Thread 2 Deposit( amount ) { Withdraw( amount ) { balance+=amount;if (balance<amount); }print( “Error” ); else balance–=amount; } ‘Deposit’ and ‘Withdraw’ are not “atomic”!!! What is the final balance after a series of concurrent deposits and withdraws?

12 Synchronization – “Good” Bank Account Example Thread 1 Thread 2 Deposit( amount ) { Withdraw( amount ) {Lock( m ); balance+=amount;if (balance<amount) Unlock( m );print( “Error” ); }else balance–=amount; Unlock( m ); } Since critical sections can never execute concurrently, this version exhibits no data-races. Critical Sections

13 Is This Enough? Theoretically – YES. Practically – NO. What if programmer accidentally forgets to place correct synchronization? How all such data-race bugs can be detected in large program?

14 Can Data-Races be Easily Detected? – No! Unfortunately, the problem of deciding whether a given program contains potential data-races is computationally hard!!! There are a lot of execution orders. For t threads of n instructions each the number of possible orders is about t n*t. In addition to all different schedulings, all possible inputs should be tested as well. To compound the problem, inserting a detection code in a program can perturb its execution schedule enough to make all errors disappear.

15 Feasible Data-Races Feasible Data-Races: races that are based on the possible behavior of the program (i.e. semantics of the program’s computation). These are the actual (!) data-races that can possibly happen in any specific execution. Locating feasible data-races requires full analyzing of the program’s semantics to determine if the execution could have allowed a and b (accesses to same shared variable) to execute concurrently.

16 Apparent Data-Races Apparent Data-Races: approximations (!) of feasible data-races that are based on only the behavior of the explicit synchronization performed by some feasible execution (and not the semantics of the program’s computation, i.e. ignoring all conditional statements). Important, since data-races are usually a result of improper synchronization. Thus easier to detect, but less accurate.

17 Apparent Data-Races Cont. For example, a and b, accesses to same shared variable in some execution, are said to be ordered, if there is a chain of corresponding explicit synchronization events between them. Similarly, a and b are said to have potentially executed concurrently if no explicit synchronization prevented them from doing so.

18 Feasible vs. Apparent Example 1 Thread 1[F  false]Thread 2 X++; F=true; while (F==false) {}; X– –; Apparent data-races in the execution above – 1 & 2 (no synchronization chain between racing accesses) Feasible data-race – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose F is false at start). Note that protecting ‘F’ only will protect X as well. 2 1

19 Feasible vs. Apparent Example 2 Thread 1[F  false]Thread 2 X++;while( 1 ) { Lock( m ); F=true; if ( F == true ) break; Unlock( m ); Unlock( m ); } X– –; No feasible or apparent data-races exist under any execution order!!! F is protected by means of lock. The accesses to X are always ordered and properly synchronized.

20 Complexity of Data-Race Detection Exactly locating the feasible data-races is an NP-hard problem. Thus, the apparent races, which are simpler to locate, must be detected for debugging. Fortunately, apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution. Yet, the problem of exhaustively locating all apparent data-races still remains NP-hard.

21 Reminder: NP and Co-NP There is a set of NP problems for which: There is no polynomial solution. There is an exponential solution. Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem. Problem is NP-complete, if in addition it resides in NP. Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).

22 Reminder: NP and Co-NP Cont. There is also a set of Co-NP problems which is complementary to set of NP problems. For Co-NP-hard problem with answers ‘yes’ or ‘no’, we can only answer ‘no’. If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution). The problem of checking whether a boolean formula is satisfiable is NP-complete (answer ‘yes’ if satisfiable assignment for variables was found). Same, but not-satisfiable – Co-NP-complete.

23 Why Data-Race Detection is NP-Hard? How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent? Intuitively – we must check all execution orders of P and see. If we discover an execution order, in which a and b are concurrent, we can report on data-race and stop. Otherwise we should continue checking.

24 Program Execution Model Consider a class of multi-threaded programs that synchronize by counting semaphores. Program execution is described by collection of events and two relations over the events. Synchronization event – instance of some synchronization operation (e.g. signal, wait). Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).

25 Program Execution Model – Events’ Relations Temporal ordering relation – a T → b means that a completes before b begins (i.e. last action of a can affect first action of b). Shared data dependence relation - a D → b means that a accesses a shared variable that b later accesses and at least one of the accesses is a modification to variable. Indicates when one event causally affects another.

26 Program Execution Model – Program Execution Program execution P – a triple, where E is a finite set of events, and T → and D → are the above relations that satisfy the following axioms: A1: T → is an irreflexive partial order (a T ↛ a). A2: If a T → b T ↮ c T → d then a T → d. A3: If a D → b then b T ↛ a. Notes: ↛ is a shorthand for ¬(a → b). ↮ is a shorthand for ¬(a → b) ⋀ ¬(b → a). Notice that A1 and A2 imply transitivity of T → relation

27 Program Execution Model – Feasible Program Execution Feasible program execution for P – execution of a program that performs exactly the same events as P, but may exhibit different temporal ordering. Definition: P’= is a feasible program execution for P= (potentially occurred) if F1: E’=E (i.e. exactly the same events), and F2: P’ satisfies the axioms A1 - A3 of the model, and F3: a D → b ⇒ a D’ → b (i.e. same data dependencies) Note: Any execution that exhibits the same shared- data dependencies as P will execute exactly the same events as P.

28 Program Execution Model – Ordering Relations Given a program execution, P=, and the set, F(P), of feasible program executions for P, the following relations (that summarize the temporal orderings present in the feasible program executions) are defined: Must-HaveCould-Have Happened- Before a MHB → b ⇔ ∀ ∈ F(P), a T → b a CHB → b ⇔ ∃ ∈ F(P), a T → b Concurrent- With a MCW ↔ b ⇔ ∀ ∈ F(P), a T ↮ b a CCW ↔ b ⇔ ∃ ∈ F(P), a T ↮ b Ordered- With a MOW ↔ b ⇔ ∀ ∈ F(P), ¬(a T ↮ b) a COW ↔ b ⇔ ∃ ∈ F(P), ¬(a T ↮ b)

29 Program Execution Model – Ordering Relations - Explanation The must-have relations describe orderings that are guaranteed to be present in all feasible program executions in F(P). The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P). The happened-before relations show events that execute in a specific order, the concurrent-with relations show events that execute concurrently, and the ordered-with relations show events that execute in either order but not concurrently.

30 Complexity of Computing Ordering Relations The problem of computing any of the must- have ordering relations (MHB, MCW, MOW) is Co-NP-hard and the problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard. Theorem 1: Given a program execution, P=, that uses counting semaphores, the problem of deciding whether a MHB → b, a MCW ↔ b or a MOW ↔ b (any of the must-have orderings) is Co-NP- hard.

31 Proof of Theorem 1 – Notes The presented proof is only for the must-have- happened-before (MHB) relation. Proofs for the other relations are analogous. The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB → b for two events, a and b, defined in the reduction. The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete. The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).

32 Proof of Theorem 1 – 3CNFSAT An instance of 3CNFSAT is given by: A set of n variables, V={X 1,X 2, …,X n }. A boolean formula B consisting of conjunction of m clauses, B=C 1 ⋀ C 2 ⋀ … ⋀ C m. Each clause C j =(L 1 ⋁ L 2 ⋁ L 3 ) is a disjunction of three literals. Each literal L k is any variable from V or its negation - L k =X i or L k = ⌐ X i. Example: B=(X 1 ⋁ X 2 ⋁ ⌐ X 3 ) ⋀ ( ⌐ X 2 ⋁ ⌐ X 5 ⋁ X 6 ) ⋀ (X 1 ⋁ X 4 ⋁ ⌐ X 5 )

33 Proof of Theorem 1 – Idea of the Proof Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0). The execution of this program simulates a nondeterministic evaluation of B. Semaphores are used to represent the truth values of each variable and clause. The execution exhibits certain orderings iff B is not satisfiable.

34 Proof of Theorem 1 – The Construction per Variable For each variable, X i, the following three threads are constructed: wait( A i ) signal( X i ). signal( X i ) wait( A i ) signal( not-X i ). signal( not-X i ) signal( A i ) wait( Pass2 ) signal( A i ) “...” indicates as many signal(X i ) (or signal(not-X i )) operations as the number of occurrences of the literal X i (or ⌐ X i ) in the formula B.

35 Proof of Theorem 1 – The Construction per Variable The semaphores X i and not-X i are used to represent the truth value of variable X i. Signaling the semaphore X i (or not-X i ) represents the assignment of True (or False) to variable X i. The assignment is accomplished by allowing either signal(X i ) or signal(not-X i ) to proceed, but not both (due to concurrent wait(A i ) operations in two leftmost threads).

36 Proof of Theorem 1 – The Construction per Clause For each clause, C j, the following three threads are constructed: wait( L 1 ) signal( C j ) wait( L 2 ) signal( C j ) wait( L 3 ) signal( C j ) L 1, L 2 and L 3 are the semaphores corresponding to literals in clause C j (i.e. X i or not-X i ). The semaphore C j represents the truth value of clause C j. It is signaled iff the truth assignments to variables, cause the clause C j to evaluate to True.

37 Proof of Theorem 1 – Explanation of Construction The first 3n threads operate in two phases: The first pass is a non-deterministic guessing phase in which each variable used in the boolean formula B is assigned a unique truth value. Only one of the X i and not-X i semaphores is signaled. The second pass, which begins after semaphore Pass2 is signaled, is used to ensure that the program doesn’t deadlock – the semaphore operations that were not allowed to execute during the first pass are allowed to proceed.

38 Proof of Theorem 1 – The Final Construction Additional two threads are created: There are n ‘signal(Pass2)’ operations – one for each variable. There are m ‘wait(C j )’ operations – one for each clause. wait( C 1 ). wait( C m ) b:skip a:skip signal( Pass2 ). signal( Pass2 ) mn

39 Proof of Theorem 1 – Putting All Together Event b is reached only after semaphore C j, for each clause j, has been signaled. Since the program contains no conditional statements or shared variables, every execution of the program executes the same events and exhibits the same shared-data dependencies (i.e. none). Claim: For any execution a MHB → b iff B is not satisfiable.

40 Proof of Theorem 1 – Proving the “if” Part Assume that B is not satisfiable. Then there is always some clause, C j, that is not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass. Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass. The second pass doesn’t occur until after event a executes, so event a must precede event b. Therefore, a MHB → b.

41 Proof of Theorem 1 – Proving the “only if” Part Assume that a MHB → b. This means that there is no execution in which b either precedes a or executes concurrently with a. Assume by way of contradiction that B is satisfiable. Then some truth assignment can be guessed during the first pass that satisfies all of the clauses. Event b can then execute before event a, contradicting the assumption. Therefore, B is not satisfiable.

42 Complexity of Computing Ordering Relations – Cont. Since a MHB → b iff B is not satisfiable, the problem of deciding a MHB → b is Co-NP-hard. By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard. Theorem 2: Given a program execution, P=, that uses counting semaphores, the problem of deciding whether a CHB → b, a CCW ↔ b or a COW ↔ b (any of the could-have orderings) is NP-hard. Proof by similar reductions …

43 Complexity of Race Detection - Conditions, Loops and Input The presented model is too simplistic. What if conditional statements, like “if” and “while”, are used? What if an input from user is allowed? Thread 1Thread 2 Y = ReadFromInput( ); while ( Y < 0 ) Print( Y ); X--; X++; If Y≥0 there is a data-race on X. Otherwise it is not possible, since ‘X--’ is never reached.

44 Complexity of Race Detection - “NP-Harder”? The proof above does not use conditional statements, loops or input from outside. This suggests that the problem of data-race detection may be even harder than deciding an NP- complete problem. With loops and recursion, we do not know whether potentially concurrent accesses will indeed be executed, so the question becomes equivalent to the halting problem. Thus, in general case, race detection is undecidable.

45 So How Data-Races Can be Detected? – Approximations Since it is intractable problem to decide whether a CHB → b or a CCW ↔ b (needed to detect feasible data-races), the temporal ordering relation T → should be approximated and apparent data-races located instead. Recall that apparent data-races exist if and only if at least one feasible race exists. Yet, it remains a hard problem to locate all apparent data-races.

46 Approximation Example – Lamport’s Happens-Before The happens-before partial order, denoted hb →, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows: Program Order: If a and b are events performed by the same thread, with a preceding b in program order, then a hb → b. Release and Acquire: Let a be a release and b be an acquire. If a and b take part in the same synchronization event, then a hb → b. Transitivity: If a hb → b and b hb → c, then a hb → c. Shared accesses a and b are concurrent (denoted by a hb ↮ b) if neither a hb → b nor b hb → a holds.

47 Approaches to Detection of Apparent Data-Races – Static There are two main approaches to detection of apparent data-races (sometimes a combination of both is used): Static Methods – perform a compile-time analysis of the code. – Too conservative. Can’t know or understand the semantics of the program. Result in excessive number of false alarms that hide the real data-races. + Test the program globally – see the full code of the tested program and can warn about all possible errors in all possible executions.

48 Approaches to Detection of Apparent Data-Races – Dynamic Dynamic Methods – use tracing mechanism to detect whether a particular execution of a program actually exhibited data-races. + Detect only those apparent data-races that occur during a feasible execution. – Test the program locally - consider only one specific execution path of the program each time. Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found. On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.

49 Approaches to Detection of Apparent Data-Races No “silver bullet” exists. The accuracy is of great importance (especially in large programs). Yet, there is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms). The space and time overheads imposed by the techniques are significant as well.

50 Closer Look at Dynamic Methods We will see two dynamic methods for on-the- fly detection of apparent data-races in lock- based multi-threaded programs: DJIT – based on Lamport’s happens-before partial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and Multipage systems. Lockset – based on locking discipline and lockset refinement. Implemented in Eraser tool.

51 DJIT (1) Description Detects the first apparent data-race in a program when it actually occurs. It is enough to announce only the very first data- race, since later races can be after-effects of the first one. After the race (or it’s cause) is fixed, the search for other races can proceed. The main disadvantage of the technique is that it is highly dependent on the scheduling order.

52 DJIT(2) Logical Token Observation – each synchronization event involves some logical token. The token is released by one set of threads that reach a certain point in their execution and is acquired by another set of threads. Once all the members of the corresponding releasing set have released their tokens, members of the acquiring set are allowed to proceed their execution.

53 DJIT(3) Local Time Frames The execution of each thread is split into a sequence of time frames. A new time frame starts on each release. Note that according to the above observation concerning logical tokens: Lock  Acquire (or acq) Unlock  Release (or rel) ThreadTF X = 1 Lock( m1 ) Z = 2 Lock( m2 ) Y = 3 Unlock( m2 ) Z = 4 Unlock( m1 ) X = 5 1112311123

54 DJIT(4) Local Time Frames Claim 1: Let a in thread t a and b in thread t b be two accesses, where a occurs at time frame T a, and the release in t a, corresponding to the latest acquire in t b which precedes b, occurs at time frame T sync in t a. Then a hb → b iff T a < T sync. TF a tata tbtb T a T release T sync acq. a. rel. rel(m). acq. acq(m). b Possible sequence of release-acquire

55 DJIT(5) Local Time Frames Proof: - If T a < T sync then a hb → release and since release hb → acquire and acquire hb → b, we get a hb → b. - If a hb → b and since a and b are in distinct threads, then by definition there exists a pair of corresponding release an acquire, so that a hb → release and acquire hb → b. It follows that T a < T release ≤ T sync.

56 DJIT(6) Vector Time Frames (VTF) For each thread t a vector st t [.] exists, whose size is the maximum number of threads (maxthreads). st t [t] is the local time frame of thread t. It actually holds the number of ‘releases’ made by thread t. st t [u] stores the latest local time frame of u, whose release is known by t (to have happened before t’s latest acquire). If u is an acquirer of t’s release, then u’s vector is updated in the following way: for k = 0 to maxthreads – 1 st u [k] = max( st u [k], st t [k] )

57 DJIT(7) Vector Time Frames In such way, the vector of u is notified of: The latest time frame of t. The latest time frames of other threads according to the knowledge of t. Note that a thread can learn about a release performed by another thread through “gossip”, when this information is transferred through a chain of corresponding release- acquire pairs.

58 Thread 1Thread 2Thread 3 (1 1 1) write X release( m1 ) read Z(2 1 1) acquire( m1 ) read Y release( m2 ) write X (2 1 1) (2 2 1) acquire( m2 ) write X(2 2 1) DJIT(8) Vector Time Frames

59 DJIT(9) Vector Time Frames Claim 2: Let a and b be two accesses in respective threads t a and t b, which happened during respective local time frames T a and T b. Let f denote the value of st tb [t a ] at the time when b occurs. Then a hb → b iff T a < f. TF a tata tctc tbtb TF b TaTa a. rel. acq. rel. acq. bTbTb

60 DJIT(10) Vector Time Frames Proof: - If a hb → b and since a and b are in distinct threads, then there exists a chain of releases and corresponding acquires such that the first release in t a and the last acquire in t b, so that a hb → first release and first release hb → last acquire. The information on t a ’s local time frame is transferred through that chain, reaches t b and stored in st tb [t a ] (=f). Thus it follows that T a < T first release ≤ f. - If T a < f then there is a sequence of corresponding release-acquire pairs, which transfer the local time frame from t a to t b, finally resulting in t b “hearing” that t a entered a time frame which is later than T a. This same sequence can be used to transitively apply the hb → relation from a to b.

61 DJIT(11) Sequential Consistency The proposed algorithm assumes a sequential consistency model (SC), which is common in multi- threaded environments. This means that there exists a global order, R, on all the events in the execution, where R confirms with the view of all processes, and all reads see the most recent written values. The definition of the hb → partial relation is consistent with R, in the sense that if a hb → b then a precedes b in R (otherwise an acquire could precede its corresponding release in the global order, contradicting the view of the acquirer).

62 DJIT(12) Data-Race Detection Using VTF Theorem 1: Let a and b be two accesses to the same shared variable in respective threads t a and t b during respective local time frames T a and T b. Suppose that at least one of a or b is a write. Assume that a is performed in the global order R prior to b and that it doesn’t constitute a data race with any of the preceding accesses in R. Then a and b form a data-race iff at the time when b occurs it holds that st tb [t a ] ≤ T a.

63 DJIT(13) Data-Race Detection Using VTF Proof: - If st tb [t a ] ≤ T a then, by Claim 2, a hb → b doesn’t hold. Since a precedes b in R, it can not hold that b hb → a. Thus a and b are concurrent and form a data race (since at least one of them is for writing). - If a and b form a data race then a hb → b doesn’t hold. Thus, by Claim 2, st tb [t a ] ≤ T a.

64 DJIT(14) Predicate for Data-Race Detection The algorithmic aspect of Theorem 1 is encapsulated in the following predicate P: P(a,b) ≜ ( a.type = write ⋁ b.type = write ) ⋀ ⋀ ( a.time_frame ≥ st b.thread_id [a.thread_id] ) P gets two accesses, a and b, to same shared variable, where a occurred earlier (according to the global order R) and b is just performed. P returns True iff a and b form a data-race.

65 DJIT(15) Which Accesses to Check? We have assumed that there is a logging mechanism, which records all accesses. Logging all accesses in all threads and testing the predicate P for each pair of them will impose a great overhead on the system. Actually some of the accesses can be discarded.

66 Claim 3: Consider an access a in thread t a during time frame T a, and accesses b and c in thread t b =t c during time frame T b =T c. Assume that c precedes b in the program order. If a and b are concurrent, then a and c are concurrent as well. TF a tata tbtb TF b TaTa....a....a rel c. b TcTb TcTb TaTa a....a..... rel c. b TcTbTcTb DJIT(16) Which Accesses to Check?

67 DJIT(17) Which Accesses to Check? Proof: - Let f b and f c denote the respective values of st tb [t a ] when b and c happen. Since st tb [t a ] is monotonically increasing, and c precedes b, we know that f b ≥ f c. Since a hb → b does not hold, we know by Claim 2 that T a ≥ f b. Thus, T a ≥ f c and again by Claim 2 we get that a hb → c is false. - Let f a denote the value of st ta [t b ] when a happens. Since b hb → a does not hold, we know by Claim 2 that T b ≥ f a. Since T b =T c we get that T c ≥ f a. Thus by Claim 2, c hb → a is false.

68 DJIT(18) Which Accesses to Check? Recall that we are interested in recording only the first apparent data race which occurs during the execution. Claim 3 implies that for this purpose, it is sufficient to record only the first read access and the first write access to a variable in each time frame. In addition it’s sufficient to apply the predicate P to pairs of accesses which are the first in their respective time frames.

69 Thread 1Thread 2 acquire( m ) write X read X write X release( m ) read X acquire( m ) write X release( m ) acquire( m ) read X write X release( m ) DR DJIT(19) Which Accesses to Check? !!! Only the accesses marked with ‘!!!’ are checked. !!!

70 Assume that in thread t a an access a occurs and thread t b = t c performed a previous (according to the global order R) access b in time frame T b and another previous access c in time frame T c so that T b < T c. TF a tata t b =t c TF b TaTa.........a..........a. b. acq. rel. c. TbTcTbTc DJIT(20) Which Time Frames to Check?

71 DJIT(21) Which Time Frames to Check? We want to find only the very first data-race, when it actually occurs (assuming that all previous accesses didn’t form a data-race). Claim 4: If a is concurrent with b then it certainly concurrent with c. Proof: Easy, since T c > T b ≥ st ta [t b ] = st ta [t c ]. Thus, either pair (a-b or a-c) can be considered to be the first apparent data-race (since there were no races till a occurred). This also means that if there is no race between a and c, then there is also no data-race between a and b. Therefore, this pair should not be checked.

72 DJIT(22) Which Time Frames to Check? We want to support the common SWMR (Single Writer / Multiple Readers) semantics, allowing concurrent reads but not writes. Thus, developing the observation above, we need to check current write access to a shared variable v against the last time frame in each of the other threads which recently read from v, and the last time frame in a thread which recently wrote to v. For current read access to v, it is enough to check against the last time frame in a thread which recently wrote to v.

73 DJIT(23) Which Time Frames to Check? More formally - Let a be a current access to a shared variable v in thread t a : If there was a prior write to v in t a and since that write there were no accesses to v in other threads then there is no need to check anything. If there was a prior write to v in other thread t b (according to the global order R) and since that write there were no accesses to v in other threads besides t a and t b then it’s sufficient to check a only with the latest access to v in t b (since otherwise we would have found the race earlier according to Claims 3 & 4).

74 DJIT(24) Which Time Frames to Check? If there were prior reads from v in other threads t 1, t 2,…,t k (according to the global order R). Then, if a is a write, it should be checked with each of the most recent reads in t 1, t 2,…,t k. If a is a read then it should be checked with the most recent write to v (according to R).

75 DJIT(25) Access History Applying the above observations (concerning which accesses to check and which time frames to check), it is easy to see that the complexity of checking whether a given access races with previous accesses is small. Each variable v holds for each of the threads the last time frame in which they read from v and the last time frame in which any of the threads wrote to v. The IDs of the accessing threads are saved as well.

76 DJIT(26) Access History On each first read and first write to v in a time frame every thread updates the access history of v. If the access to variable v is a read, the thread checks the recent write to v. If the access is a write, the thread checks all reads from v by other threads and the recent write to v. tf 1 /id 1 tf 2 /id 2... tf n /id n tf k /id k VV Time frames of recent reads from v – one for each thread Time frame of recent write to v

77 DJIT(27) Coherency Actually, the presented algorithm uses only coherency guarantees. Coherency means that for each variable v there is a global order, R v, on all operations performed on it. Hence, the algorithm described above is correct also for coherent systems, which are not necessarily sequentially consistent. In fact, the algorithm may be also applied to systems with even more relaxed consistency (a.k.a. weakly ordered systems). Thread 1Thread 2 write v 1, 1 write v 2, 2 read v 2, 2 read v 1, 0 The history is coherent, but not sequentially consistent.

78 DJIT(28) “First” Apparent Data-Race Note, that if a and b race each other, then a might also race with accesses that occurred in t b previous to b (as shown in the example of Claim 4). It is impossible to find these data-races before a occurs. By the definitions, although the corresponding accesses in t b precede b, their races with a occur simultaneously to the race of b and a, and thus are not considered “earlier”. The definitions can be refined, defining the first apparent data-race to be the first access in t b with which a apparently races. This will clearly require a bigger access history.

79 DJIT(29) Why Only “First Data-Race”? Where in the proofs we used the fact that there were no prior data-races? Consider the following example: Since the access history for each variable consists of only one recent write, the data-race [1]-[3] is not detected (though the accesses are concurrent). This is due to a prior race [1]-[2] and the fact that [2] and [3] are in the same thread. Hint: In order to locate more than only first data-race for each variable, the write history should contain last time frames of all other threads (and not only the most recent). Thread 1Thread 2 write X [1] write X [2] release(m) write X [3]

80 DJIT(30) More Than One Data-Race Actually DJIT can be extended to detect more than only one data-race in a program. Still, there are some good reasons for not doing so: Recall that later data-races can be after effects of the first one (the program “goes crazy” after the first race). Only the first data race is guaranteed to be feasible (though it’s not necessarily a crucial bug). Later races can be apparent and hence irrelevant: Thread 1Thread 2 X=1; [1] F=true; while( !F ); X=2; [2] There is only one feasible data race – on F (it is false at start). Thus, if we announce on all possible races, false alarms are inevitable.

81 DJIT (31) Results The DJIT algorithm was implemented in several academic systems – Millipede and Multipage. +Currently DJIT detects the very first apparent data-race. After the race (or it’s cause) is fixed (or marked to ignore), the search for other races can proceed. The extended version of DJIT can detect all races that appear during the execution. –Very sensitive to differences in threads’ interleaving. Thus it’s recommended to apply the algorithm every time the program executes (and not only in debug mode). –Still requires enormous number of runs to ensure that the tested program is race free, yet can not prove it.

82 Lockset (1) Locking Discipline A locking discipline is a programming policy that ensures the absence of data-races. A simple, yet common locking discipline is to require that every shared variable is protected by a mutual-exclusion lock. The Lockset algorithm detects violations of locking discipline. The main drawback is a possibly excessive number of false alarms.

83 Lockset (2) What is the Difference? [1] hb → [2], yet there is a feasible data-race under different scheduling. Thread 1Thread 2 Y = Y + 1; [1] Lock( m ); V = V + 1; Unlock( m ); Lock( m ); V = V + 1; Unlock( m ); Y = Y + 1; [2] Thread 1Thread 2 Y = Y + 1; [1] Lock( m ); Flag = true; Unlock( m ); Lock( m ); T = Flag; Unlock( m ); if ( T == true ) Y = Y + 1; [2] No any locking discipline on Y. Yet [1] and [2] are ordered under all possible schedulings.

84 Lockset (3) The Basic Algorithm For each shared variable v let C(v) be as set of locks that have protected v for the computation so far. Let locks_held(t) at any moment be the set of locks held by the thread t at that moment. The Lockset algorithm: - for each v, init C(v) to the set of all possible locks - on each access to v by thread t: - C(v)  C(v) ∩ locks_held(t) - if C(v) = ∅, issue a warning

85 Lockset (4) Explanation Clearly, a lock m is in C(v) if in execution up to that point, every thread that has accessed v was holding m at the moment of access. The process, called lockset refinement, ensures that any lock that consistently protects v is contained in C(v). If some lock m consistently protects v, it will remain in C(v) till the termination of the program.

86 Lockset (5) Example The locking discipline for v is violated since no lock protects it consistently. Programlocks_heldC(v) Lock( m1 ); v = v + 1; Unlock( m1 ); Lock( m2 ); v = v + 1; Unlock( m2 ); { } {m1} { } {m2} { } {m1, m2} {m1} { } warning

87 Lockset (6) Improving the Locking Discipline The locking discipline described above is too strict. There are three very common programming practices that violate the discipline, yet are free from any data-races: Initialization: Shared variables are usually initialized without holding any locks. Read-Shared Data: Some shared variables are written during initialization only and are read-only thereafter. Read-Write Locks: Read-write locks allow multiple readers to access shared variable, but allow only single writer to do so.

88 Lockset (7) Initialization When initializing newly allocated data there is no need to lock it, since other threads can not hold a reference to it yet. Unfortunately, there is no easy way of knowing when initialization is complete. Therefore, a shared variable is initialized when it is first accessed by a second thread. As long as a variable is accessed by a single thread, reads and writes don’t update C(v).

89 Lockset (8) Read-Shared Data There is no need to protect a variable if it’s read-only. To support unlocked read-sharing, races are reported only after an initialized variable has become write-shared by more than one thread.

90 Lockset (9) Initialization and Read-Sharing Newly allocated variables begin in the Virgin state. As various threads read and write the variable, its state changes according to the transition above. Races are reported only for variables in the Shared-Modified state. The algorithm becomes more dependent on scheduler. Virgin Shared- Modified Exclusive Shared wr by any thr rd by any thr wr by first thr wr by new thr rd by new thr rd/wr by first thr

91 Lockset (10) Initialization and Read-Sharing The states are: Virgin – Indicates that the data is new and have not been referenced by any other thread. Exclusive – Entered after the data is first accessed (by a single thread). Subsequent accesses don’t update C(v) (handles initialization). Shared – Entered after a read access by a new thread. C(v) is updated, but data-races are not reported. In such way, multiple threads can read the variable without causing a race to be reported (handles read-sharing). Shared-Modified – Entered when more than one thread access the variable and at least one is for writing. C(v) is updated and races are reported as in original algorithm.

92 Lockset (11) Read-Write Locks Many programs use Single Writer/Multiple Readers (SWMR) locks as well as simple locks. The basic algorithm doesn’t support correctly such style of synchronization. Definition: For a variable v, some lock m protects v if m is held in write mode for every write of v, and m is held in some mode (read or write) for every read of v.

93 Lockset (12) Read-Write Locks – Final Refinement When the variable enters the Shared- Modified state, the checking is different: Let locks_held(t) be the set of locks held in any mode by thread t. Let write_locks_held(t) be the set of locks held in write mode by thread t.

94 Lockset (13) Read-Write Locks – Final Refinement The refined algorithm (for Shared-Modified): - for each v, initialize C(v) to the set of all locks - on each read of v by thread t: - C(v)  C(v) ∩ locks_held(t) - if C(v) = ∅, issue a warning - on each write of v by thread t: - C(v)  C(v) ∩ write_locks_held(t) - if C(v) = ∅, issue a warning Since locks held purely in read mode don’t protect against data-races between the writer and other readers, they are not considered when write occurs and thus removed from C(V).

95 The refined algorithm will still produce a false alarm in the following simple case: Thread 1Thread 2C(v) Lock( m1 ); v = v + 1; Unlock( m1 ); Lock( m2 ); v = v + 1; Unlock( m2 ); Lock( m1 ); Lock( m2 ); v = v + 1; Unlock( m2 ); Unlock( m1 ); {m1,m2} {m1} { } Lockset (14) Still False Alarms

96 Lockset (15) Additional False Alarms Additional possible false alarms are: Queue that implicitly protects its elements by accessing the queue through locked head and tail fields. Thread that passes arguments to a worker thread. Since the main thread and the worker thread never access the arguments concurrently, they do not use any locks to serialize their accesses. Privately implemented SWMR locks, which don’t communicate with Lockset. True data races that don’t affect the correctness of the program (for example “benign” races). if (f == 0) lock(m); if (f == 0) f = 1; unlock(m);

97 Lockset (16) Results Lockset was implemented in a full scale testing tool, called Eraser, which is used in industry (not “on paper only”). +Eraser was found to be quite insensitive to differences in threads’ interleaving (if applied to programs that are “deterministic enough”). –Since a superset of apparent data-races is located, false alarms are inevitable. –Still requires enormous number of runs to ensure that the tested program is race free, yet can not prove it. –The measured slowdowns are by a factor of 10 to 30.

98 Dynamic Data-Race Detection Summary There is no one, better solution. DJIT notifies on one apparent data-race, which is the very first in the execution. Lockset notifies on a bunch of apparent data- races, some or even all of them are false alarms. Maybe to combine both techniques? Maybe to combine with other known techniques? Maybe to combine with some static analysis? Maybe better approximations can be found...?

99 Dynamic Data-Race Detection Summary – Cont. The solutions are not universal. The data-races that are found, are apparent and not feasible. Still requires a large number of runs to check as much executions paths as possible. Since slowdowns can be large, a satisfying testing can take months. Different (or new) types of synchronization require different detection techniques. Inserting a detection code in a program can perturb the threads’ interleaving so that races will disappear (less sensitive in Lockset).

100 References S. Adve, M. Hill and R. Netzer. Detecting Data Races on Weak Memory Systems. In Proceedings of the 18th Annual Symposium on Computer Architectures, pp. 234- 243, May 1991. A. Itzkovitz, A. Schuster, and O. Zeev-Ben-Mordechai. Towards Integration of Data Race Detection in DSM System. In The Journal of Parallel and Distributed Computing (JPDC), 59(2): pp. 180-203, Nov. 1999 L. Lamport. Time, Clock, and the Ordering of Events in a Distributed System. In Communications of the ACM, 21(7): pp. 558-565, Jul. 1978 F. Mattern. Virtual Time and Global States of Distributed Systems. In Parallel & Distributed Algorithms, pp. 215226, 1989.

101 References Cont. R. H. B. Netzer and B. P. Miller. What Are Race Conditions? Some Issues and Formalizations. In ACM Letters on Programming Languages and Systems, 1(1): pp. 74-88, Mar. 1992. R. H. B. Netzer and B. P. Miller. On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions. In 1990 International Conference on Parallel Processing, 2: pp. 9397, Aug. 1990 R. H. B. Netzer and B. P. Miller. Detecting Data Races in Parallel Program Executions. In Advances in Languages and Compilers for Parallel Processing, MIT Press 1991, pp. 109-129.

102 References Cont. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T.E. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. In ACM Transactions on Computer Systems, 15(4): pp. 391-411, 1997 O. Zeev-Ben-Mordehai. Efficient Integration of On-The-Fly Data Race Detection in Distributed Shared Memory and Symmetric Multiprocessor Environments. Research Thesis, May 2001.

103 The End

Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster.

Similar presentations

Presentation on theme: "Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster.

Similar presentations

Presentation on theme: "Dynamic Data-Race Detection in Lock-Based Multi-Threaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster."— Presentation transcript:

Similar presentations

About project

Feedback