Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion: How to Address Tools Scalability
IBM Petascale Tools2 Scale and Scaling What is meant by scale? Processors execution concurrency/parallelism Memory memory behavior, problem size Network concurrent communications File system parallel file operations / data size Scaling in the physical size / concurrency of the system What else? Program code size / interacting modules Power electrical power consumption Performance potential computational power Dimension Terascale … Petascale … and beyond
Discussion: How to Address Tools ScalabilityIBM Petascale Tools3 Tools Scalibility Types of tools Performance analytical / simulation / empirical Debugging detect / correct concurrency errors Programming parallel languages / computation Compiling parallel code / libraries Scheduling systems allocation and launching What does it mean for a tool to be scalable? Tool dependent (different problems and scaling aspects) What changes about the tool? Naturally scalable vs. change in function / operation Is a paradigm shift required? To what extent is portability important? What tools would you say are scalable? How? Why?
Discussion: How to Address Tools ScalabilityIBM Petascale Tools4 Focus – Parallel Performance Tools/Technology Tools for performance problem solving Empirical-based performance optimization process Performance technology concerns characterization Performance Tuning Performance Diagnosis Performance Experimentation Performance Observation hypotheses properties Instrumentation Measurement Analysis Visualization Performance Technology Experiment management Performance data storage Performance Technology
Discussion: How to Address Tools ScalabilityIBM Petascale Tools5 Large Scale Performance Problem Solving How does our view of this process change when we consider very large-scale parallel systems? What are the significant issues that will affect the technology used to support the process? Parallel performance observation is required In general, there is the concern for intrusion Seen as a tradeoff with performance diagnosis accuracy Scaling complicates observation and analysis Nature of application development may change What will enhance productive application development? Paradigm shift in performance process and technology?
Discussion: How to Address Tools ScalabilityIBM Petascale Tools6 Instrumentation and Scaling Make events visible to the measurement system Direct instrumentation (code instrumentation) Static instrumentation modifies code prior to execution does not get removed (always will get executed) source instrumentation may alter optimization Dynamic instrumentation modifies code at runtime can be inserted and deleted at runtime incurs runtime cost Indirect instrumentation generates events outside of code Does scale affect the number of events? Runtime instrumentation is more difficult with scale Affected by increased parallelism
Discussion: How to Address Tools ScalabilityIBM Petascale Tools7 Measurement and Scaling What makes performance measurement not scalable? More parallelism more performance data overall performance data specific to each thread of execution possible increase in number interactions between threads Harder to manage the data (memory, transfer, storage) Issues of performance intrusion Performance data size Number of event generated X metrics per event Are there really more events? Which are important? Control number of events generated Control what is measured (to a point) Need for performance data versus cost of obtaining it Portability!
Discussion: How to Address Tools ScalabilityIBM Petascale Tools8 Measurement and Scaling (continued) Consider “traditional” measurement methods Profiling: summary statistics calculated during execution Tracing: time-stamped sequence of execution events Statistical sampling: indirect triggers, PC + metrics Monitoring: access to performance data at runtime How does the performance data grow? How does per thread profile / trace size grow? Consider communication Strategies for scaling Control performance data production and volume Change in measurement type or approach Event and/or measurement control Filtering, throttling, and sampling
Discussion: How to Address Tools ScalabilityIBM Petascale Tools9 Concern for Performance Measurement Intrusion Performance measurement can affect the execution Perturbation of “actual” performance behavior Minor intrusion can lead to major execution effects Problems exist even with small degree of parallelism Intrusion is accepted consequence of standard practice Consider intrusion (perturbation) of trace buffer overflow Scale exacerbates the problem … or does it? Traditional measurement techniques tend to be localized Suggests scale may not compound local intrusion globally Measuring parallel interactions likely will be affected Use accepted measurement techniques intelligently
Discussion: How to Address Tools ScalabilityIBM Petascale Tools10 Analysis and Visualization Scalability How to understand all the performance data collected? Objectives Meaningful performance results in meaningful forms Want tools to be reasonably fast and responsive Integrated, interoperable, portable, … What does “scalability” mean here? Performance data size Large data size should not impact analysis tool use Data complexity should not overwhelm interpretation Results presentation should understandable Tool integration and usability
Discussion: How to Address Tools ScalabilityIBM Petascale Tools11 Analysis and Visualization Scalability (continued) Online analysis and visualization Potential interference with execution Single experiment analysis versus multiple experiments Strategies Statistical analysis data dimension reduction, clustering, correlation, … Scalable and semantic presentation methods statistical, 3D relate metrics to physical domain Parallelization of analysis algorithms (e.g., trace analysis) Increase system resources for analysis / visualization tools Integration with performance modeling Integration with parallel programming environment
Discussion: How to Address Tools ScalabilityIBM Petascale Tools12 Role of Intelligence and Specificity How to make the process more effective (productive)? Scale forces performance observation to be intelligent Standard approaches deliver a lot of data with little value What are the important performance events and data? Tied to application structure and computational mode Tools have poor support for application-specific aspects Process and tools can be more application-aware Will allow scalability issues to be addressed in context More control and precision of performance observation More guided performance experimentation / exploration Better integration with application development
Discussion: How to Address Tools ScalabilityIBM Petascale Tools13 Role of Automation and Knowledge Discovery Even with intelligent and application-specific tools, the decisions of what to analyze may become intractable Scale forces the process to become more automated Performance extrapolation must be part of the process Build autonomic capabilities into the tools Support broader experimentation methods and refinement Access and correlate data from several sources Automate performance data analysis / mining / learning Include predictive features and experiment refinement Knowledge-driven adaptation and optimization guidance Address scale issues through increased expertise
Discussion: How to Address Tools ScalabilityIBM Petascale Tools14 ParaProf – Histogram View (Miranda) 8k processors 16k processors
Discussion: How to Address Tools ScalabilityIBM Petascale Tools15 ParaProf – 3D Full Profile (Miranda) 16k processors
Discussion: How to Address Tools ScalabilityIBM Petascale Tools16 ParaProf – 3D Scatterplot (Miranda) Each point is a “thread” of execution A total of four metrics shown in relation ParaVis 3D profile visualization library JOGL
Discussion: How to Address Tools ScalabilityIBM Petascale Tools17 Hierarchical and K-means Clustering (sPPM)
Discussion: How to Address Tools ScalabilityIBM Petascale Tools18 Vampir Next Generation (VNG) Architecture Merged Traces Analysis Server Classic Analysis: monolithic sequential Worker 1 Worker 2 Worker m Master Trace 1 Trace 2 Trace 3 Trace N File System Internet Parallel Program Monitor System Event Streams Visualization Client Segment Indicator 768 Processes Thumbnail Timeline with 16 visible Traces Process Parallel I/O Message Passing