Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information.

Similar presentations


Presentation on theme: "MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information."— Presentation transcript:

1 MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

2 2 Basic Information Name: MPICL/ParaGraph Developer:  ParaGraph: University of Illinois, University of Tennessee  MPICL: ORNL Current versions:  Paragraph (no version number, but last available update 1999)  MPICL 2.0  Website: http://www.csar.uiuc.edu/software/paragraph/ http://www.csm.ornl.gov/picl/ Contacts:  ParaGraph Michael Heath (heath@cs.uiuc.edu) Jennifer Finger  MPICL Patrick Worley (worleyph@ornl.gov) Note: Paragraph last updated 1999, MPICL last updated 2001 [both seem dead]

3 3 MPICL/ParaGraph Overview MPICL  Trace file creation library  Uses MPI profiling interface  Only records MPI commands Support for “custom” events using manual instrumentation  Writes traces in documented ASCII PICL format ParaGraph  PICL trace visualization tool  Very old tool (first written during 1989-1991)  Offers a lot of visualizations Analog: MPICL -> MPE, Jumpshot -> ParaGraph

4 4 MPICL Overview Installation a nightmare  Requires knowledge of F2C symbol naming convention (!)  Had to edit and remove some code to work with new version of MPICH Hardcoded values for certain field sizes had to be updated One statement in the Fortran environment setup was causing a coredump of instrumented programs on startup Automatic instrumentation of MPI programs offered via profiling interface  Once installed, very easy to use  Have to add 3 lines of code to enable creation of trace files Calls to tracefiles(), tracelevel(), and tracenode() (see ParaGraph documentation) Minor annoyance, could be done automatically Manual instrumentation routines also available  Calls to tracedata() and traceevent() (see ParaGraph documentation)  Notion of program “phases” which allow crude form of source code correlation Also has extra code to ensure accurate clock synchronization  Extra work is done to ensure consistent ordering of events  Helps prevent “tachyons” (showing messages received before they are sent)  Delays startup by several seconds (but is not mandatory) After trace file is collected, it has to be sorted using tracesort

5 5 MPICL Overhead Instrumentation performed using MPI profiling interface  Used a 5MB buffer for trace files On average, instrumentation relatively intrusive, but within 20% Does not include overhead for synchronizing clocks Note: Benchmarks marked with * have high variability in runtimes

6 6 ParaGraph Overview Uses its own widget set  Probably necessary when it was first written in 1989  Widgets look extremely crude by today’s standards Button = square with text in the middle Uses its own conventions, takes a bit getting used to Once you adjust to interface, becomes less of an issue, but at times conventions used become cumbersome  Example: closing any child window shuts down entire application ParaGraph philosophy  Provide as many different types of visualizations as possible 4 categories: Utilization, communication, tasks, other  Use a tape player abstraction for viewing trace data Similar to Paraver, cumbersome for trying to maneuver to specific times  All visualizations use a form of animation  Trace data is drawn as fast as possible This creates problems on modern machines “Slow motion” option available, but doesn’t work that well Supports application-specific visualizations  Have to write custom code and link against it during ParaGraph compilation

7 7 ParaGraph Visualizations Utilization visualizations  Display rough estimate of processor utilization  Utilization broken down into 3 states: Idle – When program is blocked waiting for a communication operation (or it has stopped execution) Overhead – When a program is performing communication but is not blocked (time spent within MPI library) Busy – if execution part of program other than communication  “Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy Communication visualizations  Display different aspects of communication  Frequency, volume, overall pattern, etc.  “Distance” computed by setting topology in options menu Task visualizations  Display information about when processors start & stop tasks  Requires manually instrumented code to identify when processors start/stop tasks Other visualizations  Miscellaneous things Can load/save a visualization window set (does not work)

8 8 Utilization Visualizations – Utilization Count Displays # of processors in each state at a given moment in time Busy shown on bottom, overhead in middle, idle on top

9 9 Utilization Visualizations – Gantt Chart Displays utilization state of each processor as a function of time

10 10 Utilization Visualizations – Kiviat Diagram Shows our friend, the Kiviat diagram Each spoke is a single processor Dark green shows moving average, light green shows current high watermark  Timing parameters for each can be adjusted Metric shown can be “busy” or “busy + overhead”

11 11 Utilization Visualizations – Streak Shows “streak” of state  Similar to winning/losing streaks of baseball teams  Win = overhead or busy  Loss = idle Not sure how useful this is

12 12 Utilization Visualizations – Utilization Summary Shows percentage of time spent in each utilization state up to current time

13 13 Utilization Visualizations – Utilization Meter Shows percentage of processors in each utilization state at current time

14 14 Utilization Visualizations – Concurrency Profile Shows histograms of # processors in a particular utilization state Ex: Diagram shows  Only 1 processor was busy ~5% of the time  All 8 processors were busy ~90% of the time

15 15 Communication Visualizations – Color Code Color code controls colors used on most communication visualizations Can have color indicate message sizes, message distance, or message tag  Distance computed by topology set in options menu

16 16 Communication Visualizations – Communication Traffic Shows overall traffic at a given time  Bandwidth used, or  Number of messages in flight Can show single node or aggregate of all nodes

17 17 Communication Visualizations – Spacetime Diagram Shows standard space-time diagram for communication  Messages sent from node to node at which times

18 18 Communication Visualizations – Message Queues Shows data about message queue lengths  Incoming/outgoing  Number of bytes queued/number of messages queued Colors mean different things  Dark color shows current moving average  Light color shows high watermark

19 19 Communication Visualizations – Communication Matrix Shows which processors sent data to which other processors

20 20 Communication Visualizations – Communication Meter Show percentage of communication used at the current time Message count or bandwidth 100% = max # of messages / max bandwidth used by the application at a specific time

21 21 Communication Visualizations – Animation Animates messages as they occur in trace file Can overlay messages over topology Available topologies  Mesh  Ring  Hypercube  User-specified Can layout each node as you want Can store to a file and load later on

22 22 Communication Visualizations – Node Data Shows detailed communication data Can display  Metrics Which node Message tag Message distance Message length  For a single node, or aggregate for all nodes

23 23 Task Visualizations – Task Count Shows number of processors that are executing a task at the current time At end of run, changes to show summary of all tasks

24 24 Task Visualizations – Task Gantt Shows Gantt chart of which task each processor was working on at a given time

25 25 Task Visualizations – Task Speed Similar to Gantt chart, but displays “speed” of each task Must record work done by task in instrumentation call (not done for example shown above)

26 26 Task Visualizations – Task Status Shows which tasks have started and finished at the current time

27 27 Task Visualizations – Task Summary Shows % time spent on each task Also shows any overlap between tasks

28 28 Task Visualizations – Task Surface Shows time spent on each task by each processor Useful for seeing load imbalance on a task-by-task basis

29 29 Task Visualizations – Task Work Displays work done by each processor Shows rate and volume of work being done Example doesn’t show anything because no work amounts recorded in trace being visualized

30 30 Other Visualizations – Clock, Coordinates Clock  Shows current time Coordinate information  Shows coordinates when you click on any visualization

31 31 Other Visualizations – Critical Path Highlights critical path in space-time diagram in red  Longest serial path shown in red  Depends on point-to-point communication (collective can screw it up)

32 32 Other Visualizations – Phase Portrait Shows relationship between processor utilization and communication usage

33 33 Other Visualizations – Statistics Gives overall statistics for run Data  % busy, overhead, idle time  Total count and bandwidth of messages  Max, min, average Message size Distance Transit time Shows max of 16 processors at a time

34 34 Other Visualizations – Processor Status Shows  Processor status  Which task each processor is executing  Communication (sends & receives) Each processor is a square in the grid (8- processor example shown)

35 35 Other Visualizations – Trace Events Shows text output of all trace file events

36 36 Bottleneck Identification Test Suite Testing metric: what did visualizations tell us (no manual instrumentation)? Programs correctness not affected by instrumentation CAMEL: PASSED  Space-time diagram & bandwidth utilization visualizations showed large number of small messages at beginning  Utilization graphs showed low overhead, few idle states LU: PASSED  Space-time diagram showed large number of small messages  Kiviat diagram showed moving average of processor utilization low  Phase portrait showed large correlation between communication and low processor utilization Big messages: PASSED  Utilization Gantt and space-time diagrams showed large amount of overhead at time of each send Diffuse procedure: PASSED  Utilization Gantt showed one processor busy & rest idle  Need manual instrumentation to determine that one routine takes too long

37 37 Bottleneck Identification Test Suite (2) Hot procedure: FAILED  Purely sequential code, so ParaGraph could not distinguish between idle and busy states Intensive server: PASSED  Utilization Gantt chart showed all processors except first idle  Space-time chart showed processor 0 being inundated with messages Ping-pong: PASSED  Space-time chart showed large # of small messages dependent on each other Random barrier: TOSS-UP  Utilization count showed one processor busy through execution  Utilization Gantt chart showed busy processor randomly dispersed  However, “waiting for barrier” state shown as idle, so difficult to track down to barrier without extra manual instrumentation

38 38 Bottleneck Identification Test Suite (3) Small messages: PASSED  Utilization Gantt chart showed lots of time spent in MPI code (overhead)  Space-time diagram showed large numbers of small messages System time: FAILED  All processes show as busy, no distinction of user vs. system time  No communication = classification of processor states not really done at all, everything just gets attributed to busy time Wrong order: PASSED  Space-time diagram showed messages being received in the reverse order they were sent  But, have to pay close attention to how the diagram is drawn

39 39 How to Best Use ParaGraph/MPICL Don’t use MPICL  Better trace file formats and libraries are available now  We probably should look over the clock synchronization code, but this probably isn’t useful if high-resolution timers are available Especially for shared-memory machines Don’t use ParaGraph’s code directly  But, has a lot of neat visualizations we could copy  At the most we should scan the code to see how a visualization is calculated In summary: just take the best ideas & visualizations

40 40 Evaluation (1) Available metrics: 2/5  Only records communication, task entrance and exit  Approximates processor state by equating not communication = busy Cost: 5/5  Free! Documentation quality: 2/5  ParaGraph has excellent manual  Very hard to find information on MPICL  MPICL installation instructions woefully inadequate Extensibility: 2/5  Can add custom visualizations, but must write code and recompile ParaGraph  Open source, but uses old X-Windows API & it’s own widget set  Dead project (no updates since 1999) Filtering and aggregation: 1/5  Not really performed  A few visualizations can be restricted to a certain processor  Can output summary statistics (other visualization -> stats)

41 41 Evaluation (2) Hardware support: 5/5  Cray X1, AlphaServer (Tru64), IBM SP (AIX), SGI Altix, 64-bit Linux clusters (Opteron & Itanium)  Support for a large number of vendor-specific MPI libraries  Would probably need a lot of effort to port to more modern architectures though Heterogeneity support: 0/5 (not supported) Installation: 1.5/5  ParaGraph relatively easy to compile and install  MPICL installation is extremely difficult, especially with modern versions of MPIC/LAM Interoperability: 0/5  Does not interoperate with other tools Learning curve: 2.5/5  MPICL library easy to use  ParaGraph interface unintuitive, can get in the way

42 42 Evaluation (3) Manual overhead: 1/5  Can record all MPI calls by linking, but this requires the addition of trace control instructions in source code  Task visualizations depend on manual instrumentation Measurement accuracy: 2/5  CAMEL: ~18% overhead  Instrumentation adds a bit of runtime overhead, especially when many messages are sent Multiple executions: 0/5 (not supported) Multiple analyses & views: 5/5  Many, many ways of looking at trace data Performance bottleneck identification: 4/5  Bottleneck identification must be performed manually  Many visualizations help with bottleneck detection, but no guidance is provided on which one you should examine first

43 43 Evaluation (4) Profiling/tracing support: 3/5  Only tracing supported  Profiling data can be shown in ParaGraph after processing trace file Response time: 2/5  Nothing reported until after program runs  Also need (computationally expensive) trace sort to be performed before you can view trace file  Large trace files take a while to load (ParaGraph must pass over entire trace before displaying anything) Searching: 0/5 (not supported) Software support: 3/5  Can link against any library using MPI profiling interface, but will not be instrumented  Only MPI and some (very old, obsolete) vendor-specific message- passing libraries are supported

44 44 Evaluation (5) Source code correlation: 0/5  Not supported  Can do indirectly via manual instrumentation of tasks, but still hard to figure out exactly where things occur in source code System stability: 3.5/5  MPICL relatively stable after bugs were fixed during compilation  ParaGraph stable as long as you don’t try to do weird things (load the wrong file) Not very robust with error handling  ParaGraph’s load/save window set doesn’t work Technical support: 0/5  Dead project  Project email addresses still seem valid, but not sure how much help we could get from the developers now


Download ppt "MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information."

Similar presentations


Ads by Google