Download presentation
Presentation is loading. Please wait.
Published byDarren Phillips Modified over 5 years ago
1
Projections Overview Ronak Buch & Laxmikant (Sanjay) Kale
Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign
2
Manual Full reference for Projections, contains more details than these slides.
3
Projections Performance analysis/visualization tool for use with Charm++ Works to limited degree with MPI Charm++ uses runtime system to log execution of programs Trace-based, post-mortem analysis Configurable levels of detail Java-based visualization tool for performance analysis
4
Instrumentation Enabling Instrumentation Basics Customizing Tracing
Tracing Options
5
How to Instrument Code Build Charm++ with the --enable-tracing flag
Select a -tracemode when linking That’s all! Runtime system takes care of tracking events
6
Basics Traces include variety of events: Entry methods
Methods that can be remotely invoked Messages sent and received System Events Idleness Message queue times Message pack times etc.
7
Basics - Continued Traces logged in memory and incrementally written to disk Runtime system instruments computation and communication Generates useful data without excessive overhead (usually)
8
Custom Tracing - User Events
Users can add custom events to traces by inserting calls into their application. Register Event: int traceRegisterUserEvent(char* EventDesc, int EventNum=-1) Track a Point-Event: void traceUserEvent(int EventNum) Track a Bracketed-Event: void traceUserBracketEvent(int EventNum, double StartTime, double EndTime)
9
Custom Tracing - Annotations
Annotation supports allows users to easily customize the set of methods that are traced. Annotating entry method with notrace avoids tracing and saves overhead Adding local to non-entry methods (not traced by default) adds tracing automatically
10
Custom Tracing - API API allows users to turn tracing on or off:
Trace only at certain times Trace only subset of processors Simple API: void traceBegin() void traceEnd() Works at granularity of PE.
11
Custom Tracing - API Often used at synchronization points to only instrument a few iterations Reduces size of logs while still capturing important data Allows analysis to be focused on only certain parts of the application
12
Tracing Options Two link-time options: -tracemode projections
Full tracing (time, sending/receiving processor, method, object, …) -tracemode summary Performance of each PE aggregated into time bins of equal size Tradeoff between detail and overhead
13
Tracing Options - Runtime
+traceoff disables tracing until a traceBegin() API call. +traceroot <dir> specifies output folder for tracing data +traceprocessors RANGE only traces PEs in RANGE
14
Tracing Options - Summary
+sumdetail aggregate data by entry method as well as time-intervals. (normal summary data is aggregated only by time-interval) +numbins <k> reserves enough memory to hold information for <k> time intervals. (default is 10,000 bins) +binsize <duration> aggregates data such that each time-interval represents <duration> seconds of execution time. (default is 1ms)
15
Tracing Options - Projections
+logsize <k> reserves enough buffer memory to hold <k> events. (default is 1,000,000 events) +gz-trace, +gz-no-trace enable/disable compressed (gzip) log files
16
Memory Usage What happens when we run out of reserved memory?
-tracemode summary: doubles time-interval represented by each bin, aggregates data into the first half and continues. -tracemode projections: asynchronously flushes event log to disk and continues. This can perturb performance significantly in some cases.
17
Projections Client Scalable tool to analyze up to 300,000 log files
A rich set of tool features : time profile, time lines, usage profile, histogram, extrema tool Detect performance problems: load imbalance, grain size, communication bottleneck, etc Multi-threaded, optimized for memory efficiency
18
Visualizations and Tools
Tools of aggregated performance viewing Time profile Histogram Communication Tools of processor level granularity Overview Timeline Tools of derived/processed data Outlier analysis: identifies outliers
19
Analysis at Scale Fine grain details can sometimes look like one big solid block on timeline. It is hard to mouse-over items that represent fine-grained events. Other times, tiny slivers of activity become too small to be drawn.
20
Analysis Techniques Zoom in/out to find potential problem spots.
Mouseover graohs for extra details. Load sufficient but not too much data. Set colors to highlight trends. Use the history feature in dialog boxes to track time-ranges explored.
21
Dialog Box
22
Select processors: 0-2,4-7:2 gives 0,1,2,4,6
Dialog Box
23
Select time range Dialog Box
24
Add presets to history Dialog Box
25
Aggregate Views
26
Time Profile
27
Time spent by each EP summed across all PEs in time interval
28
Histogram
29
Shows statistics in “frequency” domain.
How many methods took a certain amount of time? What percentage of total execution time was composed of methods taking a certain execution duration? Shows statistics in “frequency” domain.
30
Communication vs. Time
31
Shows communication over all PEs in the time domain.
Y-axis in units of bytes sent, shows number of bytes sent from each method in each time interval Shows communication over all PEs in the time domain.
32
Communication per Processor
33
Shows how much each PE communicated over the whole job.
34
Processor Level Views
35
Overview
36
Time on X, different PEs on Y
37
Intensity of plot represents PE’s utilization at that time
38
Timeline
39
Most common view. Much more detailed than overview.
40
Clicking on EPs traces messages, mouseover shows EP details.
41
Colors are different EPs
Colors are different EPs. White ticks on bottom represent message sends, red ticks on top represent user events.
42
Processed Data Views Will only mention one, other processed data views exist (noise miner) but are of utility in only narrow domain
43
Outlier Analysis
44
k-Means to find “extreme” processors
45
Global Average
46
Non-Outlier Average
47
Outlier Average
48
Cluster Representatives and Outliers
49
Advanced Features Live Streaming Online Extrema Analysis
Run server from job to send performance traces in real time Online Extrema Analysis Perform clustering during job; only save representatives and outliers Multirun Analysis Side by side comparison of data from multiple runs
50
Future Directions PICS - expose application settings to RTS for on the fly tuning End of run analysis - use remaining time after job completion to process performance logs Simulation - Increased reliance on simulation for generating performance logs Most analysis is currently done on users’ workstations, for large jobs, this becomes untenable; end of run analysis would improve this
51
Conclusions Projections has been used to effectively solve performance woes Constantly improving the tools Scalable analysis is become increasingly important Most analysis is currently done on users’ workstations, for large jobs, this becomes untenable; end of run analysis would improve this
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.