Download presentation
Presentation is loading. Please wait.
Published byAustin Hubbard Modified over 9 years ago
1
Analysis of Multithreaded Programs Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
2
What is a multithreaded program? Multiple Parallel Threads Of Control Shared Mutable Memory read write Lock Acquire and Release NOT general parallel programs No message passing No tuple spaces No functional programs No concurrent constraint programs NOT just multiple threads of control No continuations No reactive systems
3
Why do programmers use threads? Performance (parallel computing programs) Single computation Execute subcomputations in parallel Example: parallel sort Program structuring mechanism (activity management programs) Multiple activities Thread for each activity Example: web server Properties have big impact on analyses
4
Practical Implications Threads are useful and increasingly common POSIX threads standard for C, C++ Java has built-in thread support Widely used in industry Threads introduce complications Programs viewed as more difficult to develop Analyses must handle new model of execution Lots of interesting and important problems!
5
Outline Examples of multithreaded programs Parallel computing program Activity management program Analyses for multithreaded programs Handling data races Future directions
6
Parallel Sort
7
Example - Divide and Conquer Sort 47615382
8
82536147 47615382 Divide
9
28531674 82536147 47615382 Example - Divide and Conquer Sort Conquer Divide
10
Example - Divide and Conquer Sort 28531674 Conquer 82536147 Divide 47615382 41673258 Combine
11
Example - Divide and Conquer Sort 28531674 Conquer 82536147 Divide 47615382 41673258 Combine 21346578
12
Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Solve Subproblems in Parallel
13
Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel
14
Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel Combine Results in Parallel
15
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);
16
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Divide array into subarrays and recursively sort subarrays in parallel
17
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array 47615382 d d+n/4 d+n/2 d+3*(n/4)
18
“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 74165328 d d+n/4 d+n/2 d+3*(n/4) Sorted Results Written Back Into Input Array
19
“Merge Sorted Quarters of d Into Halves of t” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 74165328 41673258 d t t+n/2
20
“Merge Sorted Halves of t Back Into d” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 21346578 41673258 d t t+n/2
21
“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 47615382 d d+n
22
“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/4),t+2*(n/4),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 47165382 d d+n
23
Key Properties of Parallel Computing Programs Structured form of multithreading Parallelism confined to small region Single thread coming in Multiple threads exist during computation Single thread going out Deterministic computation Tasks update disjoint parts of data structure in parallel without synchronization May also have parallel reductions
24
Web Server
25
Accept new connection Start new client thread Main Loop Client Threads
26
Accept new connection Start new client thread Main Loop Client Threads
27
Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output
28
Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output
29
Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output Wait for input
30
Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output Wait for input
31
Accept new connection Start new client thread Main Loop Client Threads Wait for input Produce output Wait for input Produce output
32
Accept new connection Start new client thread Main Loop Wait for input Produce output Wait for input Produce output Wait for input Produce output Client Threads
33
Main Loop Class Main { static public void loop(ServerSocket s) { c = new Counter(); while (true) { Socket p = s.accept(); Worker t = new Worker(p,c); t.start(); } Accept new connection Start new client thread
34
Worker threads class Worker extends Thread { Socket s; Counter c; public void run() { out = s.getOutputStream(); in = s.getInputStream(); while (true) { inputLine = in.readLine(); c.increment(); if (inputLine == null) break; out.writeBytes(inputLine + "\n"); } Wait for input Increment counter Produce output
35
Synchronized Shared Counter Class Counter { int contents = 0; synchronized void increment() { contents++; } Acquire lock Increment counter Release lock
36
Simple Activity Management Programs Fixed, small number of threads Based on functional decomposition User Interface Thread Device Management Thread Compute Thread
37
Key Properties of Activity Management Programs Threads manage interactions One thread per client or activity Blocking I/O for interactions Unstructured form of parallelism Object is unit of sharing Mutable shared objects (mutual exclusion) Private objects (no synchronization) Read shared objects (no synchronization) Inherited objects passed from parent to child
38
Common Properties Dynamic thread creation Many threads execute same code Threads larger than procedures Data accessed via pointers or references Concept of data ownership Passed from parent thread to child thread Acquired with lock operations Private data that never escapes creator
39
Why analyze multithreaded programs? Discover or certify absence of errors (multithreading introduces new kinds of errors) Discover or verify application-specific properties (interactions between threads complicate analysis) Enable optimizations (new kinds of optimizations with multithreading) (complications with traditional optimizations)
40
Classic Errors in Multithreaded Programs Deadlocks Data Races
41
Deadlock Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)
42
Deadlock Threads 1 and 2 Start Execution Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)
43
Deadlock Thread 1 acquires lock l Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)
44
Deadlock Thread 2 acquires lock m Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)
45
Deadlock Thread 1 holds l and waits for m while Thread 2 holds m and waits for l Thread 1: lock(l); lock(m); x = x + y; unlock(m); unlock(l); Thread 2: lock(m); lock(l); y = y * x; unlock(l); unlock(m); Deadlock if circular waiting for resources (typically mutual exclusion locks)
46
Data Races A[i] = v; A[j] = w; || A[j] = w A[i] = v Data race Data race if two parallel threads access same memory location and at least one access is a write A[j] = w A[i] = v No data race
47
Synchronization and Data Races Thread 1: lock(l); x = x + 1; unlock(l); Thread 2: lock(l); x = x + 2; unlock(l); No data race if synchronization separates accesses Synchronization protocol: Associate lock with data Acquire lock to update data atomically
48
Why are data races errors? Exist correct programs which contain races But most races are programming errors Code intended to execute atomically Synchronization omitted by mistake Consequences can be severe Nondeterministic, timing-dependent errors Data structure corruption Complicates analysis and optimization
49
New Optimization Opportunities from Multithreading Lock Elimination Lock Coarsening Barrier Elimination Data Layout Communication Optimizations
50
Lock elimination for private data Integer i; lock(i); i.value++; unlock(i); Lock Elimination for Private Data Blanchet– OOPSLA99 Bogda, Hoelzle– OOPSLA99 Choi, Bupta, Serrano, Sreedhar, Midkiff – OOPSLA99 Whaley, Rinard – OOPSLA99 Ruf – PLDI 2000 i accessible to only one thread Integer i; i.value++; Lock Elimination for Nested Data Diniz, Rinard – JPDC 1998 Aldrich, Chambers, Sirer, Eggers – SAS 1999
51
Barrier Elimination Tseng – PPoPP 1995 Analysis Problem No interthread dependences across barrier
52
Lock Coarsening Integer i; lock(i); i.value++; unlock(i); … lock(i); i.value++; unlock(i); Integer i; lock(i); i.value++; … i.value++; unlock(i); Key Challenge Managing trade off between serialization and synchronization overhead Plevyak, Chien – POPL 1995 Diniz,Rinard – POPL 1997, PLDI 1997
53
Overview of Analyses for Multithreaded Programs Key problem: interactions between threads Flow-insensitive analyses Escape analyses Dataflow analyses Explicit parallel flow graphs Interference summary analysis State space exploration
54
Escape Analyses
55
void compute(d,e) ———— void multiplyAdd(a,b,c) ————————— void multiply(m) ———— void add(u,v) —————— void main(i,j) ——————— void evaluate(i,j) —————— void abs(r) ———— void scale(n,m) —————— Program With Allocation Sites
56
void compute(d,e) ———— void multiplyAdd(a,b,c) ————————— void multiply(m) ———— void add(u,v) —————— void main(i,j) ——————— void evaluate(i,j) —————— void abs(r) ———— void scale(n,m) —————— Program With Allocation Sites Correlate lifetimes of objects with lifetimes of computations
57
void compute(d,e) ———— void multiplyAdd(a,b,c) ————————— void multiply(m) ———— void add(u,v) —————— void main(i,j) ——————— void evaluate(i,j) —————— void abs(r) ———— void scale(n,m) —————— Program With Allocation Sites Correlate lifetimes of objects with lifetimes of computations Objects allocated at this site Do not escape computation of this method
58
Classical Approach Reachability analysis If an object is reachable only from local variables of current procedure, then object does not escape that procedure
59
Escape Analysis for Multithreaded Programs Extend analysis to recognize when objects do not escape to parallel thread – OOPSLA 1999 Blanchet Bogda, Hoelzle Choi, Bupta, Serrano, Sreedhar, Midkiff Whaley, Rinard Analyze interactions to recapture objects that do not escape multithreaded subcomputation Salcianu, Rinard – PPoPP 2001
60
Applications Synchronization elimination Stack allocation Region-based allocation Data race detection Eliminate accesses to captured objects as source of data races
61
Analysis via Parallel Flow Graphs
62
Parallel Flow Graphs p = &x *p = &y p = &z q = &a *q = &b xy pz qab Thread 1Thread 2 Intrathread control-flow edges Interthread control-flow edges Heap Basic Idea: Do dataflow analysis on parallel flow graph
63
Infeasible Paths Issue p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2 Infeasible Path xy pz qab Heap Infeasible paths cause analysis to lose precision Because of infeasible path, analysis thinks xz
64
Analysis Time Issue Potential Solutions Partial Order Approaches p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2
65
Analysis Time Issue Potential Solutions Partial Order Approaches – remove edges between statements in independent regions p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2
66
Analysis Time Issue Potential Solutions Partial Order Approaches – remove edges between statements in independent regions How to recognize independent regions? Seems like might need analysis… p = &x *p = &y p = &z q = &a *q = &b Thread 1Thread 2
67
Potential Solutions Partial Order Approaches Control flow/synchronization analysis Synchronization may prevent m from immediately preceding n in execution If so, no edge from m to n No edges between these statements y = 1 lock(a) y = y + w x = x + 1 unlock(a) x = 1 lock(a) x = x + v y = y + 1 unlock(a) Analysis Time Issue
68
Experience Lots of research in field over last two decades Deadlock detection Data race detection Control analysis for multithreaded programs (mutual exclusion, precedence properties) Finite-state properties Scope – simple activity management programs Inlinable programs Bounded threads and objects
69
References FLAVERS Dwyer, Clarke - FSE 1994 Naumovich, Avrunin, Clarke – FSE 1999 Naumovich, Clarke, Cobleigh – PASTE 1999 Masticola, Ryder ICPP 1990 (deadlock detection) PPoPP 1993 (control-flow analysis) Duesterwald, Soffa - TAV 1991 Handles procedures Blieberger, Burgstaller, Scholz – Ada Europe 2000 Symbolic analysis for dynamic thread creation Scope Inlinable programs Bounded objects and threads
70
Interference Approaches
71
Dataflow Analysis for Bitvector Problems Knoop, Steffen, Vollmer – TOPLAS 1996 Bitvector problems Dataflow information is a vector of bits Transfer function for one bit does not depend on values of other bits Examples Reaching definitions Available expressions As efficient and precise as sequential version!
72
Available Expressions Example a = x + y c = x + y x = b b = x + y d = x + y Where is x+y available? Available here! parbegin parend Available here! Not available here (killed by x = b) ???
73
Three Interleavings a = x + y c = x + y x = b b = x + y d = x + y Available here! a = x + y c = x + y x = b b = x + y d = x + y a = x + y c = x + y x = b b = x + y d = x + y Not available here (killed by x = b) Available here!
74
Available Expressions Example a = x + y c = x + y x = b b = x + y d = x + y Where is x+y available? Available here! Not available here (killed by x = b) Not available here (killed by x = b) parbegin parend Available here!
75
Key Concept: Interference x=b interferes with x+y x+y not available at any statement that executes in parallel with x=b Nice algorithm: Precompute interference Propagate information along sequential control- flow edges only! Handle parallel joins specially a = x + y c = x + y x = b b = x + y d = x + y parbegin parend
76
Limitations No procedures Bitvector problems only (no pointer analysis) But can remove these limitations Integrate interference into abstraction Adjust rules to flow information from end of thread to start of parallel threads Iteratively compute interactions Summary-based approach for procedures Lose precision for non-bitvector problems
77
k = j Pointer Analysis for Multithreaded Programs Dataflow information is a triple : C = current points-to information I = interference points-to edges from parallel threads E = set of points-to edges created by current thread Interference: I k = U E j where t 1 … t n are n parallel threads Invariant: I C Within each thread, interference points-to edges are always added to the current information
78
Analysis for Example parbegin parend p = &x; p = &y;*p = 1; *p = 2;
79
Analysis for Example parbegin parend p = &x; p = &y;*p = 1; Where does p point to at this statement? *p = 2;
80
Analysis for Example parbegin parend p = &x; p = &y;*p = 1; *p = 2; px, , ><px
81
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< px px ><px >< px
82
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< px px ><px >< px >< px
83
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< py ><py px px ><px >< px >< px
84
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< py ><py px px ><px >< px >< px
85
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< py ><py px px ><px p, >< x y py,
86
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< py ><py px px ><px p, >< x y py, p>< x y py,
87
Analysis of Parallel Threads parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< py ><py px px ><px p, >< x y py, p >< x y py,
88
Analysis of Thread Joins parbegin parend p = &x; p = &y;*p = 1; *p = 2; p, , >< x y py ><py py p, >< x y py, px, , ><px >< px p, >< x y py,
89
Analysis of Thread Joins parbegin parend p = &x; p = &y;*p = 1; *p = 2; p, , >< x y py ><py py p, >< x y py, px, , ><px >< px p, >< x y py,
90
Final Result parbegin parend p = &x; p = &y;*p = 1; *p = 2;, , >< p >< x y py ><py px py px ><px p, >< x y py, p >< x y py,
91
General Dataflow Equations parbegin parend Parent Thread Thread 2Thread 1 Parent Thread CE, I,>< C U E 2 , I U E 2,>< U C U E 1 , I U E 1,<> C1C1 E2E2 >< C 1 C 2 E U E 1 U E 2, I,>< C1C1 E1E1, I U E 2,><
92
General Dataflow Equations parbegin parend Thread 2Thread 1 Parent Thread C U E 2 , I U E 2,>< C U E 1 , I U E 1,<> C2C2 E2E2 >< C1C1 E1E1, I U E 2,>< U C 1 C 2 E U E 1 U E 2, I,>< Parent Thread CE, I,><
93
General Dataflow Equations parbegin parend Thread 2Thread 1 Parent Thread C U E 2 , I U E 2,>< C U E 1 , I U E 1,<> C2C2 E2E2 >< C1C1 E1E1, I U E 2,>< U C 1 C 2 E U E 1 U E 2, I,>< Parent Thread CE, I,><
94
Compositionality Extension Compositional at thread level Analyze each thread once in isolation Abstraction captures potential interactions Compute interactions whenever need information Combine with escape analysis to obtain partial program analysis
95
Experience & Expectations Limited implementation experience Pointer analysis (Rugina, Rinard – PLDI 2000) Compositional pointer and escape analysis (Salcianu, Rinard – PPoPP 2001) Small but real programs Promising approach Scales like analyses for sequential programs Partial program analyses
96
Issues Developing abstractions Need interference abstraction Need fork/join rules Need interaction analysis Analysis time Precision for richer abstractions
97
State Space Exploration
98
State Space Exploration for Multithreaded Programs Thread 1: lock(a) lock(b) t = x x = y y = t unlock(b) unlock(a) Thread 2: lock(b) lock(a) s = y y = x x = s unlock(a) unlock(b) /* a controls x, b controls y */ lock a, b; int x, y;
99
State Space Exploration 2: lock(b)1: lock(b)2: lock(b)1: lock(a) 2: lock(b) Deadlocked States
100
Strengths Conceptually simple (at least at first…) Harmony with other areas of computer science (simple search often beats more sophisticated approaches) Can test for lots of properties and errors Lots of technology and momentum in this area Packaged model checkers Big successes in hardware verification
101
Challenges Analysis time Unbounded program features Dynamic thread creation Dynamic object creation Potential solutions Sophisticated abstractions (increases complexity…) Cousot, Cousot - 1984 Chow, Harrison – POPL 1992 Yahav – POPL 2001 Granularity coarsening/partial-order techniques Chow, Harrison – ICCL 1994 Valmari – CAV 1990 Godefroid, Wolper – LICS 1991
102
Granularity Coarsening x = 1 y = 2 a = 3 b = 4 x = 1 y = 2 x = 1 y = 2 a = 3 b = 4 a = 3 b = 4 x = 1 y = 2 a = 3 b = 4 Basic Idea: Eliminate Analysis of Interleavings from Independent Statements
103
Issue: Aliasing x = 1*p = 3 Are these two statements independent? Depends… Potential Solution: Layered analysis (Ball, Rajamani - PLDI 2001) Potential Problem: Information from later analyses may be needed or useful in previous analyses Model Extraction Model Checking Pointer Analysis PropertiesProgram
104
Experience Program analysis style Has been used for very detailed properties Analysis time issues limit to tiny programs Explicit model extraction/model checking style Still exploring how to work for software in general, not just multithreaded programs No special technology required for multithreaded programs (at first …)
105
Expectations In principle, approach should be quite useful Multithreaded programs typically have sparse interaction patterns Just not obvious from code Need some way to target tool to only those that can actually occur/are interesting Pointer preanalysis seems like promising approach
106
Application to safety problems Deadlock detection Variety of existing approaches Complex programs can have very simple synchronization behavior Ripe for model extraction/model checking Data race detection More complicated problem Largely unsolved Very important in practice
107
Why data races are so important Inadvertent atomicity violations Timing-dependent data structure corruption Nondeterministic, irreproducible failures Architecture effects Data races expose weak memory consistency models Destroy abstraction of single shared memory Compiler optimization effects Data races expose effect of standard optimizations Compiler can change meaning of program Analysis complications
108
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4
109
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4 insert(5) insert(6) ||
110
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4 insert(5) insert(6) || 5 6
111
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4 insert(5) insert(6) || 5 6
112
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 1 length head 4 insert(5) insert(6) || 5 6
113
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 2 length head 4 insert(5) insert(6) || 5 6
114
Atomicity Violations class list { static int length=0; static list head = null; list next; int value; static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 3 length head 4 insert(5) insert(6) || 5 6
115
Atomicity Violation Solution class list { static int length=0; static list head = null; list next; int value; Synchronized static void insert(int i) { list n = new list(i); n.next = head; head = n; length++; } 2 length head 4 insert(5) insert(6) || 5 6
116
Analysis Complications Analysis unsound if does not take effect of data races into account Desirable to analyze program at granularity of atomic operations Reduces state space Required to extract interesting properties But must verify that operations are atomic! Complicated analysis problem Extract locking protocol Verify that program obeys protocol
117
Architecture Effects Weak Memory Consistency Models
118
y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z?
119
y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings
120
y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings z can be 0 or 1
121
y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings z can be 0 or 1 INCORRECT REASONING!
122
y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? z can be 0 or 1 OR 2! Memory system can reorder writes as long as it preserves illusion of sequential execution within each thread! z = x+y y=0 x=1 Different threads can observe different orders!
123
Analysis Complications Interleaving semantics is incorrect No soundness guarantee for current analyses Formal semantics of weak memory consistency models still under development Maessen, Arvind, Shen – OOPSLA 2000 Manson, Pugh – Java Grande/ISCOPE 2001 Unclear how to prove ANY analysis sound… State space is larger than one might think Complicates state space exploration Complicates human reasoning
124
How does one write a correct program? y=0 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? Operations not reordered across synchronizations x=1 If synchronization separates conflicting actions from parallel threads Then reorderings not visible Race-free programs can use interleaving semantics z is 1 lock(l) unlock(l) lock(l) unlock(l)
125
Compiler Optimization Effects Standard optimizations assume single thread With interleaving semantics, optimizations may change meaning of program Even if only apply optimizations within serial parts of program! Superset of reordering effects Midkiff, Padua – ICPP 1990
126
Options Rethink and reimplement all compilers Lee, Padua, Midkiff – PPoPP 1999 Transform program to restore sequential memory consistency model Shasha, Snir – TOPLAS 1998 Lee, Padua – PACT 2000 No optimizations across synchronizations Java memory model (Pugh - JavaGrande 1999) Semantics no longer interleaving semantics
127
Program Analysis Analyze program, verify absence of data races Appealing option Unlikely to be feasible for full range of programs Reconstruct association between locks, data that they protect, threads that access data Dynamic object and thread creation References and pointers Diversity of locking protocols Whole-program analysis Exception: simple activity management programs
128
Eliminate races at language level Type system formalizes sharing patterns Check accesses properly synchronized Not as difficult as fully automatic approach Separate analysis of each module No need to reconstruct locking protocol Types provide locking information Limits sharing patterns program can use Key question: Is limitation worth benefit? Depends on expressiveness, flexibility, intrusiveness, perceived value of system
129
Standard Sharing Patterns for Activity Management Programs Private data - single thread ownership Mutual exclusion data lock protects data, acquire lock to get ownership Migrating data Ownership moves between threads in response to data structure insertions and removals Published data - distributed for read-only access
130
General Principle of Ownership Formalize as ownership relation Relation between data items and threads Basic requirement for reads When a thread reads a data item Must own item (but can share ownership with other threads) Basic requirement for writes When a thread writes data item Must be sole owner of item
131
Typical Actions to Change Ownership Object creation (creator owns new object) Synchronization operations Lock acquire (acquire data that lock protects) Lock release (release data) Similarly for post/wait, Ada accept, … Thread creation (thread inherits data from parent) Thread termination (parent gets data back) Unique reference acquisition and release (acquire or release referenced data)
132
Proposed Systems Monitors + copy in/copy out Concurrent Pascal (Brinch Hansen TSE 1975) Guava (Bacon, Strom, Tarafdar – OOPSLA 2000) Mutual exclusion data + private data Flanagan, Abadi – ESOP 2000 Flanagan, Freund – PLDI 2000 Mutual exclusion data + private data + linear/ownership types de Line, Fahndrich – PLDI 2001 Boyapati, Rinard – OOPSLA 2001
133
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Copy model of communication Basic Approach
134
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Type system ensures at most one reference to this object Extension: Unique References
135
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step One: Grab Lock Extension: Unique References
136
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step One: Grab Lock Extension: Unique References
137
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step Two: Transfer Reference Extension: Unique References
138
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Step Three: Release Lock Extension: Unique References
139
Thread + Private Data Private data identified as such in type system Type system ensures reachable only from Local variables Other private data Lock + Shared Data Type system identifies correspondence Type system ensures Threads hold lock when access data Data accessible only from other data protected by same lock Result: Transferred Object Ownership Relation Changes Over Time Extension: Unique References
140
Prospects Remaining challenge: general data structures Objects with multiple references Ownership changes correlated with movements between to data structures Recognize insertions and deletions Language-level solutions are the way to go for activity management programs Tractable for typical sharing patterns Big impact in practice
141
Benefits of ownership formalization Identification of atomic regions Weak memory invisible to programmer Enables coarse-grain program analysis Promote lots of new and interesting analyses Component interaction analyses Object propagation analyses Better understanding of software structure Analysis and transformation Software engineering
142
What about parallel computing programs?
143
Parallel Computing Sharing Patterns Specialized Sharing Patterns Unsynchronized accesses to disjoint regions of a single aggregate structure Threads update disjoint regions of array Threads update disjoint subtrees Generalized reductions Commuting updates Reduction trees
144
Parallel Computing Prospects No language-level solution likely to be feasible Race freedom depends on arbitrarily complicated properties of updated data structures Impact of data races not as large Parallelism confined to specific algorithms Range of targeted analysis algorithms Parallel loops with dense matrices Divide and conquer programs Generalized reduction recognition
145
Future Directions
146
Integrating Specifications Past focus: discovering properties Future focus: verifying properties Understanding atomicity structure crucial Assume race-free programs Type system or previous analysis Enable Owicki/Gries style verification Assume property holds Show that each atomic action preserves it Consider only actions that affect property
147
Failure Containment Threads as unit of partial failure Partial executions of failed atomic actions Rollback mechanism Optimization opportunity New analyses and transformations Failure propagation analysis Failure response transformations
148
Model Checking Avalanche of model checking research Layered analyses for model extraction Flow-insensitive pointer analysis Initial focus on control problems Deadlock detection Operation sequencing constraints Checking finite-state properties
149
Steps towards practicality Java threads prompt experimentation Threads as standard part of safe language Available multithreaded benchmarks Open Java implementation platforms More implementations Interprocedural analyses Scalability emerges as key concern Directs analyses to relevant problems
150
Summary Multithreaded programs common and important Two kinds of multithreaded programs Parallel computing programs Activity management programs Data races as key analysis problem Programming errors Complicate analysis and transformation Different solutions for different programs Language solution for activity management Targeted analyses for parallel computing Future directions – specifications, failure containment, model checking, practical implementations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.