Towards Scalable Performance Analysis and Visualization through Data Reduction Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Advertisements

PARTITIONAL CLUSTERING
The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
Parallel Research at Illinois Parallel Everywhere
1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
Visualization of dynamic power and synchrony changes in high density EEG A. Alba 1, T. Harmony2, J.L. Marroquín 2, E. Arce 1 1 Facultad de Ciencias, UASLP.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Automating Topology Aware Task Mapping on Large Parallel Machines Abhinav S Bhatele Advisor: Laxmikant V. Kale University of Illinois at Urbana-Champaign.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
Techniques in Scalable and Effective Performance Analysis Thesis Defense - 11/10/2009 By Chee Wai Lee 1.
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
Adaptive MPI Milind A. Bhandarkar
Data Cleansing for Predictive Models: The Next Level Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Ratemaking & Product Management Seminar Philadelphia, PA.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
1 Characterizing Botnet from Spam Records Presenter: Yi-Ren Yeh ( 葉倚任 ) Authors: L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten,
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Chapter 10 Verification and Validation of Simulation Models
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
SAN DIEGO SUPERCOMPUTER CENTER Advanced User Support Project Overview Thomas E. Cheatham III University of Utah Jan 14th 2010 By Ross C. Walker.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab University of Illinois at Urbana-Champaign.
Projections - A Step by Step Tutorial By Chee Wai Lee For the 2004 Charm++ Workshop.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Software Architecture in Practice
Tamas Szalay, Volker Springel, Gerard Lemson
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab
Chapter 10 Verification and Validation of Simulation Models
Machine Learning Feature Creation and Selection
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Integrated Runtime of Charm++ and OpenMP
Projections Overview Ronak Buch & Laxmikant (Sanjay) Kale
Dataset: Time-depth-recorder (TDR) raw data 1. Date 2
Case Studies with Projections
BigSim: Simulating PetaFLOPS Supercomputers
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
Group 9 – Data Mining: Data
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Support for Adaptivity in ARMCI Using Migratable Objects
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Towards Scalable Performance Analysis and Visualization through Data Reduction Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign

Motivation  Event trace-based performance tools help applications scale well.  As applications scale, so must performance tools. Why?

Nature of Event Traces  Tend to be thread or processor-centric.  Volume of data per thread proportional to number of performance events encountered.  Number of performance events per thread depends on duration of run and frequency of events.  Strong Scaling : More threads, more communication events.  Weak Scaling : More threads, more communication events, more work per thread.  More events = more work for Performance Tools.

Reducing the data: Part 1  Baseline: Record events of the entire run.  What are simple ways of reducing the volume of performance data? Cut inconsequential event-blocks (e.g. initialization/end)Keep important snapshots (e.g. important iteration blocks) NAMDStartup First 300 steps with Load Balancing Steps with a load refinement

Quantifying the Problem 92k Atoms327k Atoms1000k Atoms 512 cores827 MB1,800 MB2,800 MB 1024 cores938 MB2,200 MB3,900 MB 2048 cores1,200 MB2,800 MB4,800 MB 4096 cores5,700 MB NAMD molecular dynamics simulations and event trace volume as generated by Projections performance tool over 200 (“interesting”) time steps. Weak ScalingStrong Scaling

Reducing the data: Part 2  Drop “uninteresting” or some specific classes of events.  Compress and/or characterize event patterns. Our Approach: Drop “uninteresting” processors (Threads)

Our Approach  Choose a subset of processors:  Representatives  Outliers  Employ k -Means Clustering for Equivalence- Class discovery.  Chosen processors’ performance data are written to disk at end of run. Which? Why?How?

Equivalence Class Discovery Metric Y Metric X Euclidean Distance Outliers Representatives

Things to Consider  Distance measures may require normalization.  Whether certain metrics are strongly correlated to one another.  Number of initial seeds.  Placement of initial seeds.  Number of representatives chosen.  Number of outliers chosen.

Experimental Methodology  NAMD (NAnoscale Molecular Dynamics) task grain- size performance problem (2002).  Roll-back a performance improvement we made in 2002 to address this problem. Tuned NAMDProblem Injected

Experimental Methodology (2)  1 million atom simulation of the Satellite Tabacco Mosaic Virus.  512 processors to 4096 processors on PSC’s Bigben Cray XT3 supercomputer.  Two criteria for validation:  Amount of data reduced.  Quality of the reduced dataset.

Histogram Quality Measure Bar i orig …… Bar i r educed …… Original Data: 1000 pe Reduced Data: 100 pe HoiHoi HriHri How close is H r i /H o i to on average?

Results: Data Reduction Orig # CoresOriginal DataReduced # CoresReduced Dataset 5122,800 MB51275 MB 10243,900 MB MB 20484,800 MB MB 40965,700 MB MB

Results: Data Reduction

Results: Quality PoPo PrPr P r /P o Average HStd Dev

Conclusion  Approach offers a potential way of controlling volume of performance data generated.  Heuristics have been reasonably good at capturing performance characteristics of the NAMD grain-size problem.

Future Work  Conduct experiments on more problem types and classes for verification.  Find better (more practical) ways for equivalence class discovery.