Download presentation
Presentation is loading. Please wait.
Published byDerick Shaw Modified over 9 years ago
1
Towards Scalable Performance Analysis and Visualization through Data Reduction Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign
2
Motivation Event trace-based performance tools help applications scale well. As applications scale, so must performance tools. Why?
3
Nature of Event Traces Tend to be thread or processor-centric. Volume of data per thread proportional to number of performance events encountered. Number of performance events per thread depends on duration of run and frequency of events. Strong Scaling : More threads, more communication events. Weak Scaling : More threads, more communication events, more work per thread. More events = more work for Performance Tools.
4
Reducing the data: Part 1 Baseline: Record events of the entire run. What are simple ways of reducing the volume of performance data? Cut inconsequential event-blocks (e.g. initialization/end)Keep important snapshots (e.g. important iteration blocks) NAMDStartup First 300 steps with Load Balancing Steps 300-500 with a load refinement
5
Quantifying the Problem 92k Atoms327k Atoms1000k Atoms 512 cores827 MB1,800 MB2,800 MB 1024 cores938 MB2,200 MB3,900 MB 2048 cores1,200 MB2,800 MB4,800 MB 4096 cores5,700 MB NAMD molecular dynamics simulations and event trace volume as generated by Projections performance tool over 200 (“interesting”) time steps. Weak ScalingStrong Scaling
6
Reducing the data: Part 2 Drop “uninteresting” or some specific classes of events. Compress and/or characterize event patterns. Our Approach: Drop “uninteresting” processors (Threads)
7
Our Approach Choose a subset of processors: Representatives Outliers Employ k -Means Clustering for Equivalence- Class discovery. Chosen processors’ performance data are written to disk at end of run. Which? Why?How?
8
Equivalence Class Discovery Metric Y Metric X Euclidean Distance Outliers Representatives
9
Things to Consider Distance measures may require normalization. Whether certain metrics are strongly correlated to one another. Number of initial seeds. Placement of initial seeds. Number of representatives chosen. Number of outliers chosen.
10
Experimental Methodology NAMD (NAnoscale Molecular Dynamics) task grain- size performance problem (2002). Roll-back a performance improvement we made in 2002 to address this problem. Tuned NAMDProblem Injected
11
Experimental Methodology (2) 1 million atom simulation of the Satellite Tabacco Mosaic Virus. 512 processors to 4096 processors on PSC’s Bigben Cray XT3 supercomputer. Two criteria for validation: Amount of data reduced. Quality of the reduced dataset.
12
Histogram Quality Measure Bar i orig …… Bar i r educed …… Original Data: 1000 pe Reduced Data: 100 pe HoiHoi HriHri How close is H r i /H o i to 0.100 on average?
13
Results: Data Reduction Orig # CoresOriginal DataReduced # CoresReduced Dataset 5122,800 MB51275 MB 10243,900 MB102402 MB 20484,800 MB204551 MB 40965,700 MB409667 MB
14
Results: Data Reduction
15
Results: Quality PoPo PrPr P r /P o Average HStd Dev 512 250.04880.06410.00732 510.09960.11800.00768 1020.19920.22370.00732 1024 510.04980.05110.00168 1020.09960.10080.00157 2040.19920.19210.00264 2048 1020.04980.04870.00122 2040.09960.09770.00216 4080.19920.18830.00575 4096 2040.04980.05010.00170 4090.09980.09810.00203 8180.19970.19750.00163
16
Conclusion Approach offers a potential way of controlling volume of performance data generated. Heuristics have been reasonably good at capturing performance characteristics of the NAMD grain-size problem.
17
Future Work Conduct experiments on more problem types and classes for verification. Find better (more practical) ways for equivalence class discovery.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.