Download presentation
Presentation is loading. Please wait.
Published byRandell Jefferson Modified over 9 years ago
1
May/01/2000HIPS 20001 Online Computation of Critical Paths for Multithreaded Languages Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa University of Tokyo
2
May/01/2000HIPS 20002 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
3
May/01/2000HIPS 20003 What is a Critical Path (CP)? The longest execution path –Nodes: sequential program parts –Edges: fork/sync points 3 1 3 6 5 2 2 2 8 7 4 1 CP length: 31
4
May/01/2000HIPS 20004 Benefits of Getting CPs (1/2) CP info gives us –Performance upper bound = Exec. time lower bound = lim {exec. time} PE→∞ –Important parts in need of tuning
5
May/01/2000HIPS 20005 Benefits of Getting CPs (2/2) CP info is useful for –Tuning CP is short → Overhead should be reduced Otherwise → CP should be shortened –Performance prediction T P = T 1 / P + T ∞ (by Cilk group) Exec. time is close to CP length → More processors: futile
6
May/01/2000HIPS 20006 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
7
May/01/2000HIPS 20007 This Work Computing critical paths –Primary targets: Multithreaded languages Shared-memory machines –On-the-fly Not using tracefiles –Source code instrumentation
8
May/01/2000HIPS 20008 Background (Shortcoming of Existing Work) Cilk [Frigo et al. 98] –Provides online computation of CPs –Supports fork-join synchronization only –Unrealistic setting Fork: zero cost Join: zero cost
9
May/01/2000HIPS 20009 Contribution Developed algorithm for computing CPs –It deals with languages with threads and synchronization via first-class data Not limited to fork-join model –It takes fork / communication cost into account –It gives length of each subpath in a CP Helps us “pinpoint” important program parts Demonstrated its usefulness through experiments using SMP
10
May/01/2000HIPS 200010 CP Info Example Displaying a sequence of all subpaths in a CP frame entry point frame exit point time ============================================================= main() --- move_mols(mols,100) 741 usec spawn 10 usec move_mols(mols,n) --- spawn move_one_mol(mols[i]) 39 usec spawn 10 usec move_one_mol(molp) --- return 4982 usec communication 15 usec v = recv(r) --- send(s, v*2) 128 usec communication 15 usec u = recv(s) --- die 1207 usec ============================================================= critical path length 7147 usec
11
May/01/2000HIPS 200011 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
12
May/01/2000HIPS 200012 Target Language Sequential language (C, Scheme, …) + Threads spawn f(x1,…,xn) + Channels are first-class sync. media can express locks, barriers, and monitors r th2 v = recv(r) th1 send(r,8) 8 8
13
May/01/2000HIPS 200013 Sample Program main() { spawn sum(r,vec);... v = recv(r);... die; } sum(r,vec) {... send(r,ans); } End of Program Beginning of Program
14
May/01/2000HIPS 200014 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
15
May/01/2000HIPS 200015 Behavior of Sample Program sum(r,vec) v=recv(r)spawn sum(r,vec) send(r,ans) diemain Nodes: fork & sync. points Edges: inter-node dependencies DAG-structured execution
16
May/01/2000HIPS 200016 Three Kinds of Edges (Dependencies) Arithmetic edges Spawn edges Communication edges 8 3 5 142 9 sum(r,vec) v=recv(r)spawn sum(r,vec) send(r,ans) diemain
17
May/01/2000HIPS 200017 CP Computation Algorithm Basic Idea DAG not constructed –Each thread keeps only the longest path up to the current program point recv main Path2 Path1 thrown away
18
May/01/2000HIPS 200018 Key Questions How to determine edge values? How to compute CP without constructing DAG? –How to manage CP info? –How to keep the longest path?
19
May/01/2000HIPS 200019 Determining Edge Values Computing the amount of time that elapsed after leaving the previous node YZ X t1=time()t2=time()t3=time() 86
20
May/01/2000HIPS 200020 CP=( {…},{…},{…}, {L1,L2,8}) CP=({ …},{…},{…} ) Extending CP with Arithmetic Edge X L1: 8 Y L2: 6 Z L3: CP=({ …},{…},{…}, {L1,L2,8}) CP=( {…},{…},{…}, {L1,L2,8}, {L2,L3,6}) The amount of time in nodes: NOT accounted CP info = a sequence of edge info
21
May/01/2000HIPS 200021 Extending CP with Spawn Edge CP=( {…},{…},{…} ) X spawnY Z CP=( {…},{…},{…} ) CP=( {…},{…},{…}, {…,…, C spawn })
22
May/01/2000HIPS 200022 Extending CP with Communication Edge CP send =( {…},{…} ) send recv [ v, CP send ] Piggyback a sent value with CP CP send =( {…},{…}, {…,…, Ccomm }) CP send =( {…},{…} )
23
May/01/2000HIPS 200023 Keeping the Longest Path (Throwing Shorter Paths Away) send recv [ v, CP send ] CP = max ( CP send, CP recv ) CP send = … CP recv = … CP send =( {…},{…}, {…,…, Ccomm })
24
May/01/2000HIPS 200024 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
25
May/01/2000HIPS 200025 Instrumentation Source-to-source transformation –Independent of the implementation details Ex. management of activation frames –Instrumentation code is inserted into Sends, recvs, spawns Entry/exit points of functions
26
May/01/2000HIPS 200026 Transformation Rule Example l: v = recv( r ); t = time() - et ; [v, cp’] = recv( r ); cp’’ = addCommEdge( cp’ ) if( t + length( cp ) < length( cp’ )){ cp = cp’ el = l; et = time(); } else { et = time() - t ; } Compute CP up to recv Receive a value piggybacked with CP Compare the two CPs Extend CP with comm. edge Use the sender’s CP Use the receiver’s CP
27
May/01/2000HIPS 200027 DAG shape varies between different runs Discussion (1/2) -- Nondeterminism -- XY 28 XY 5 –The amounts of time for each part vary (e.g., cache effects) send recv send recv send recv send recv –Comm. edges may connect different pairs
28
May/01/2000HIPS 200028 Discussion (2/2) -- What we Compute as CP -- CP of a DAG created in an actual run –Programs may give different CPs in different runs –Other reasonable ways?
29
May/01/2000HIPS 200029 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
30
May/01/2000HIPS 200030 Experiments Schematic: concurrent OO language [Taura et al. 96] Sun Ultra Enterprise 10000 –UltraSPARC x 64 Apps: –Prime –Natural Language Parser –Raytrace Timer function: gethrtime()
31
May/01/2000HIPS 200031 Purpose of Experiments Checking that execution times get close to computed CPs Identifying how large instrumentation overhead is
32
May/01/2000HIPS 200032 Raytrace We could predict the best performance by using only one processor We could predict the best performance by using only one processor
33
May/01/2000HIPS 200033 Prime Small (< 5%) difference between the actual execution time and the predicted execution time Small (< 5%) difference between the actual execution time and the predicted execution time
34
May/01/2000HIPS 200034 Information Useful for Future Tuning of Prime Gathering primes into a list → 95 % of CP Dividing prime candidates by smaller primes → 5% of CP
35
May/01/2000HIPS 200035 Natural Language Parser
36
May/01/2000HIPS 200036 Information Useful for Future Tuning of NL Parser Application of lexical rules → 4 % of CP Application of production rules → 96% of CP
37
May/01/2000HIPS 200037 Instrumentation Overhead (Execution Time on One Processor)
38
May/01/2000HIPS 200038 Presentation Outline What is a critical path? Background & Overview Our work –Target language –Critical path computation algorithm –Instrumentation scheme Experimental results Related work
39
May/01/2000HIPS 200039 Related Work (1/2) % foo -nproc 10 20 Cilk –Breakdown of CP not shown CP info: not detailed enough for tuning Which function should we tune??? result: 524288 Running time on 10 procs: 416.33 ms Total work = 3.94 s Critical path = 1.08 ms Parallelism = 2800.92 %
40
May/01/2000HIPS 200040 Related Work (2/2) Paradyn [Hollingsworth 98] –Main target is message-passing programs –It does not display all subpaths in CP Tracefile-based offline scheme (Dimemas [Pallas] etc.) –Tracefile contains the parameters and the timings of all communication operations –Required memory/storage is very large
41
May/01/2000HIPS 200041 Summary (1/2) Scheme for online CP computation –Supports synchronization via first-class data Piggybacking communicated values with CP info Keeping the maximum of two paths in receives –Takes spawn/communication cost into account –Shows all subpaths in CP Attaching subpath info in each CP update
42
May/01/2000HIPS 200042 Summary (2/2) CP info we compute –Helps predict the MP performance Small (< 10%) difference between –Actual execution time –Predicted execution time –Gives a useful guide to tuning Prime: Tune list construction part! Parser: Tune production rule application part!
43
May/01/2000HIPS 200043 Future Work More precise performance prediction –Taking thread mapping into account Adaptive optimization using CP info –Time-consuming optimizations are applied to the parts included in CP
44
May/01/2000HIPS 200044 Any Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.