Summary Presentation (3/24/2005) UPC Group HCS Research Laboratory University of Florida.

Summary Presentation (3/24/2005) UPC Group HCS Research Laboratory University of Florida

Performance Analysis Strategies

3 Methods Three general performance analysis approaches  Performance modeling Mostly predictive methods Useful to examine in order to extract important performance factors Could also be used in conjunction with experimental performance measurement  Experimental performance measurement Strategy used by most modern PATs Uses actual event measurement to perform the analysis  Simulation We will probably not use this approach See Combined Report - Section 4 for details

Performance Modeling

5 Purpose and Method Purpose  Review of existing performance models and its applicability to our project Method  Perform literature search on performance modeling techniques  Categorize the techniques Give weight to methods that are easily adapted to UPC & SHMEM Also more heavily consider methods that give accurate performance predictions or that help the user pick between different design strategies See Combined Report - Section 5 for details

6 Performance Modeling Overview Why do performance modeling? Several reasons  Grid systems: need a way to estimate how long a program will take (billing/scheduling issues)  Could be used in conjunction with optimization methods to suggest improvements to user Also can guide user on what kind of benefit can be expected from optimizing aspects of code  Figure out how far code is from optimal performance Indirectly detect problems: if a section of code is not performing as predicted, it probably has cache locality problems/etc Challenge  Many models already exist, with varying degrees of accuracy and speed  Choose best model to fit in UPC/SHMEM PAT Existing performance models categorized into different categories  Formal models (process algebras, petri nets)  General models that provide “mental pictures” of hardware/performance  Predictive models that try to estimate timing information

7 Formal Performance Models Least useful for our purposes  Formal methods are strongly rooted in math  Can make strong statements and guarantees  However, difficult to adapt and automate for new programs Examples include  Petri nets (specialized graphs which represent processes and systems)  Process algebras (formal algebra for specifying how parallel processes interact)  Queuing theory (strongly rooted in math)  PAMELA (C-style language to model concurrency and time- related operations) For our purposes, formal models are too abstract to be directly useful

8 General Performance Models Provide user with “mental picture”  Rules of thumb for cost of operations  Guides strategies used while creating programs  Usually analytical in nature Examples include  PRAM (classical model, unit cost operations)  BSP (breaks execution into communication and computation phases)  LogP (analytical model of network operations)  Many more… (see report for details) For our purposes, general models can be useful  Created to be easily understood by programmer  But, may need lots of adaptation (and model fitting) to be directly useful

9 Predictive Performance Models Models that specifically predict performance of parallel codes  Similar to general models, except meant to be used with existing systems  Usually a combination of mathematical models/equations and very simple simulation Examples include  Lost cycles (samples program state to see if useful work is being done)  Task graphs (algorithm structure represented with graphs)  Vienna Fortran Compilation System (uses an analytical model to parallelize code by examining “cost” of operations)  PACE (Geared towards grid applications)  Convolution (Snaveley’s method; uses a combination of existing tools to predict performance based on memory traces and network traces)  Many more… (see report, section 5 for details) Lost cycles very promising  Provides very easy way to quantify performance & scalability  Needs extension for greater correlation with source code

Experimental Performance Measurement

11 Overview Instrumentation – insertion of instrumentation code (in general) Measurement – actual measuring stage Analysis – filtering, aggregation, analysis of data gathered Presentation – display of analyzed data to the user. The only phase that deals directly with user Optimization – process of resolving bottleneck

Profiling/Tracing Methods

13 Purpose and Method Purpose  Review on existing profiling and tracing methods (instrumentation stage) based on experimental performance measurement  Evaluate the various methods and their applicability to our PAT Method  Literature search on profiling and tracing (include some review of existing tools)  Categorize the methods  Evaluate the applicability of each method toward design of UPC/SHMEM PAT Quick overview of method and recommendations included here See Combined Report - Section 6.1 for complete description and recommendations

14 Summary (1) Overhead  Manual – amount of work needed from user  Performance – overhead added by tool to program Profiling / Tracing  Profiling – collecting of statistical event data. Generally refers to filtering and aggregating a subset of event data after program terminates  Tracing – Use to record the majority of events possible in logical order (generally with timestamp). Can use to reconstruct accurate program behavior. Require large amount of storage 2 ways to lower tracing cost – (1) compact tract file format (2) Smart tracing system that turns on and off Manual vs. Automatic – user/tool that is responsible for the instrumentation of original code. Categorization of which event is better suited for which method is desirable

15 Summary (2) Number of passes – The number of times a program need to be executed to get performance data. One pass is desirable for long running program, but multi-pass can provide more accurate data (ex: first pass=profiling, later pass=tracing using profiling data to turn on and off tracing). Hybrid method is available but might not be as accurate as multi-pass Levels - need at least source and binary to be useful (some event more suited for source level and other binary level)  Source level – manual, pre-compiler, instrumentation language  System level – library or compiler  Operating system level  Binary level – statically or dynamically

Performance Factors

17 Purpose and Method Purpose  Provide a formal definition of the term performance factor  Present motivation for calculating performance factors  Discuss what constitutes a good performance factor  Introduce a three step approach to determine if a factor is good Method  Review and provide a concise summary of the literature in the area of performance factors for parallel systems See Combined Report - Section 6.2 for more details

18 Features of Good Performance Factors Characteristics of a good performance factor  Reliability  Repeatability  Ease of Measurement  Consistency Testing  On each platform, determine ease of measurement  Determine repeatability  Determine reliability and consistency by one of the following Modify the factor using real hardware Find justification in the literature Derive the information from performance models

Analysis Strategies

20 Purpose and Method Purpose  Review of existing analysis and bottleneck detection methods Method  Literature search on existing analysis strategies  Categorize the strategies Examine methods that are applied before, during, or after execution Weight post-mortem & runtime analysis (most useful for a PAT)  Evaluate the applicability of each method toward design of UPC/SHMEM PAT See Analysis Strategies report for details

21 Analysis Methods Performance analysis methods  The “why” of performance tools  Make sense of data collected from tracing or profiling  Classically performed after trace collection, before visualization (see right) But, some strategies choose to do it at other times and in different ways Bottleneck detection  Another form of analysis!  Bottleneck detection methods are also shown in this report  Optimizations also closely related, but discussed in combined report Combined Report - Section 6.5

22 When/How to Perform Analysis Can do at different times  Post-mortem: after a program runs Usually performed in conjunction with tracing  During runtime: must be quick, but can guide data collection  Beforehand: work on abstract syntax trees from parsing source code But hard to know what will happen at runtime! Only one existing strategy fit in this category Also manual vs. automatic  Manual: Rely on user to perform actions e.g., manual post-mortem analysis = look at visualizations and manually determine bottlenecks User is clever, but hard to scale this analysis technique  Semi-automatic: Perform some work to make user’s job easier e.g., filtering, aggregation, pattern matching Most techniques try to strike a balance  Too automated: can miss stuff (computer is dumb)  Too manual: high overhead for user Can also be used to guide data collection at runtime  Automatic: No existing systems are really fully automatic

23 Post-mortem Manual techniques  Types Let the user figure it out based on visualizations  Data can be very overwhelming! Simulation based on collected data at runtime “Traditional” analysis techniques (Amdahl’s law, isoefficiency)  De-facto standard for most existing tools  Tools: Jumpshot, Paraver, VampirTrace, mpiP, SvPablo Semi-automated techniques  Let the machine do the hard work  Types Critical path analysis, phase analysis (IPS-2) Sensitivity analysis (S-Check) Automatic event classification (machine learning) Record overheads + predict effect of removing (Scal-tool, SCALEA) Knowledge based (Poirot, KAPPA-PI, FINESSE, KOJAK/EXPERT)  Knowledge representation techniques (ASL, EDL, EARL)

24 On-line Manual techniques  Make the user perform analysis during execution  Not a good idea! Too many things going on Semi-automated techniques  Try to reduce overhead of full tracing Look at a few metrics at a time Most use dynamic+dynamic instrumentation  Types “Paradyn-like” approach  Start with hypotheses  Use refinements based on data collected at runtime  Paradyn, Peridot (not implemented?), OPAL (incremental approach) Lost cycles (sample program state at runtime) Trace file clustering

25 Pre-execution Manual techniques  Simulation & modeling (FASE approach at UF, etc.)  Can be powerful, but Computationally expensive to do for accuracy High user overhead in creating models Semi-automated techniques  Hard to analyze a program automatically!  One existing system: PPA Parallel program analyzer Works on source code’s abstract syntax tree Requires compiler/parsing support Vaporware?

Presentation Methodology

27 Purpose and Method Purpose  Discuss visualization concepts  Present general approaches for performance visualization  Summarize a formal user interface evaluation technique  Discuss the integration of user-feedback into a graphical interface Methods  Review and provide a concise summary of the literature in the area of visualization for parallel performance data See Presentation Methodology report for details

28 Summary of Visualizations Visualization NameAdvantagesDisadvantagesInclude in the PATUsed For AnimationAdds another dimension to visualizations CPU intensiveYesVarious Program Graphs (N-ary tree) Built-in zooming; Integration of high and low-level data Difficult to see inter- process data MaybeComprehensive Program Visualization Gantt Charts (Time histogram; Timeline) Ubiquitous; IntuitiveNot as applicable to shared memory as to message passing YesCommunication Graphs Data Access Displays (2D array) Provide detailed information regarding the dynamics of shared data Narrow focus; Users may not be familiar with this type of visualization MaybeData Structure Visualization Kiviat DiagramsProvides an easy way to represent statistical data Can be difficult to understand MaybeVarious statistical data (processor utilization, cache miss rates, etc.) Event Graph Displays (Timeline) Can be used to display multiple data types (event-based) Mostly provides only high-level information MaybeInter-process dependency

29 Evaluation of User Interfaces General Guidelines  Visualization should guide, not rationalize  Scalability is crucial  Color should inform, not entertain  Visualization should be interactive  Visualizations should provide meaningful labels  Default visualization should provide useful information  Avoid showing too much detail  Visualization controls should be simple GOMS  Goals, Operators, Methods, and Selection Rules  Formal user interface evaluation technique  A way to characterize a set of design decisions from the point of view of the user  A description of what the user must learn; may be the basis for reference documentation  The knowledge is described in a form that can actually be executed (there have been several fairly successful attempts to implement GOMS analysis in software, ie GLEAN)  There are various incarnations of GOMS with different assumptions useful for more specific analyses (KVL, CMN-GOMS, NGOMSL, CPM-GOMS, etc.)

30 Conclusion Plan for development  Develop a preliminary interface that provides the functionality required by the user while conforming to visualization guidelines presented previously  After the preliminary design is complete, elicit user feedback  During periods where user contact is unavailable, we may be able to use GOMS analysis or another formal interface evaluation technique

Usability

32 Purpose and Method Purpose  Provide a discussion on the factors influencing the usability of performance tools  Outline how to incorporate user-centered design into the PAT  Discuss common problems seen in performance tools  Present solutions to these problems Method  Review and provide a concise summary of the literature in the area of usability for parallel performance tools See Combined Report - Section 6.4.1 for complete description and reasons behind inclusion of various criteria

33 Usability Factors Ease-of-learning  Discussion Important for attracting new users A tool’s interface shapes the user’s understanding of its functionality Inconsistency leads to confusion Example: Providing defaults for some object but not all  Conclusions We should strive for internally and externally consistent tool Stick to established conventions Provide as uniform an interface as possible Target as many platforms as possible so the user can amortize the time invested over many uses Ease-of-use  Discussion Amount of effort required to accomplish work with the tool  Conclusions Don’t force the user to memorize information about the interface. Use menus, mnemonics, and other mechanisms Provide a simple interface Make all user-required actions concrete and logical Usefulness  Discussion How directly the tool helps the user achieve their goal  Conclusion Make the common case simple even if that makes the rare case complex Throughput  Discussion How does the tool contribute to user productivity in general  Conclusions Keep in mind that the inherent goal of the tool is to increase user productivity

34 User-Centered Design General Principles  Usability will be achieved only if the software design process is user-driven  Understand the target users  Usability should be the driving factor in tool design Four-step model to incorporate user feedback (Chronological)  Ensure initial functionality is based on user needs Solicit input directly from the user  MPI users  UPC/SHMEM users  Meta-user We can’t just go by what we think is useful  Analyze how users identify and correct performance problems UPC/SHMEM users primarily Gain a better idea of how the tool will actually be used on real programs Information from users is then presented to the meta-user for critique/feedback  Develop incrementally Organize the interface so that the most useful features are the best supported User evaluation of preliminary/prototype designs Maintain a strong relationship with the users with whom we have access  Have users evaluate every aspect of the tool’s interface structure and behavior Alpha/Beta testing User tests should be performed at many points along the way Feature-by-feature refinement in response to specific user feedback

UPC/SHMEM Language Analysis

36 Purpose and Method Purpose  Determine performance factors purely from the language’s perspective  Correlate performance factors to individual UPC/SHMEM construct Method  Come up with a complete and minimal factor list  Analyze the UPC and SHMEM (Quadrics and SGI) spec  Analyze the various implementations Berkeley & Michigan UPC: translated file + system code HP UPC: pending until NDA process is completed GPSHMEM: based on system code See Language Analysis report for complete details

Tool Evaluation Strategy

38 Purpose and Method Purpose:  Provide the basis for evaluation of existing tool Method:  Literature search on existing evaluation methods  Categorize, adding and filtering of applicable criterion  Evaluate the importance of these criterion Summary table of the final 23 criteria See Combined Report - Section 9 for complete description and reasons behind inclusion of various criteria

39 Feature (section)DescriptionInformation to gatherCategoriesImportance Rating Available metrics (9.2.1.3)Kind of metric/events the tool can tract (ex: function, hardware, synchronization) Metrics it can provide (function, hw …) ProductivityCritical Cost (9.1.1)Physical cost for obtaining software, license, etc. How muchMiscellaneousAverage Documentation quality (9.3.2) Helpfulness of the document in term of understanding the tool design and its usage (usage more important) Clear document? Helpful document?MiscellaneousMinor Extendibility (9.3.1)Ease of (1) add new metrics (2) extend to new language, particularly UPC/SHMEM 1.Estimating of how easy it is to extend to UPC/SHMEM 2.How easy is it to add new metrics MiscellaneousCritical Filtering and aggregation (9.2.3.1) Filtering is the elimination of “noise” data, aggregation is the combining of data into a single meaningful event. Does it provide filtering? Aggregation? To what degree Productivity, Scalabilit y Critical Hardware support (9.1.4)Hardware support of the toolWhich platforms?Usability, Portabilit y Critical Heterogeneity support (9.1.5) Heterogeneity deals with the ability to run the tool in a system where nodes have different HW/SW configuration. Support running in a heterogeneous environment? MiscellaneousMinor

40 Installation (9.1.2)Ease of installing the tool1.How to get the software 2.How hard to install the software 3.Components needed 4.Estimate number of hours needed for installation UsabilityMinor Interoperability (9.2.2.2)Ease of viewing result of tool using other tool, using other tool in conjunction with this tool, etc. List of other tools that can be used with this PortabilityAverage Learning curve (9.1.6)Learning time required to use the tool Estimate learning time for basic set of features and complete set of features Usability, Productiv ity Critical Manual overhead (9.2.1.1)Amount of work needed by the user to instrument their program 1.Method for manual instrumentation (source code, instrumentation language, etc) 2.Automatic instrumentation support Usability, Productiv ity Average Measurement accuracy (9.2.2.1) Accuracy level of the measurementEvaluation of the measuring methodProductivity, Portabilit y Critical Multiple analyses (9.2.3.2)The amount of post measurement analysis the tool provides. Generally good to have different analyses for the same set of data Provide multiple analyses? Useful analyses? UsabilityAverage Multiple executions (9.3.5)Tool support for executing multiple program at once Support multiple executions?ProductivityMinor  Aver age Multiple views (9.2.4.1)Tool’s ability to provide different view/presentation for the same set of data Provide multiple views? Intuitive views? Usability, Productiv ity Critical

41 Performance bottleneck identification (9.2.5.1) Tool’s ability to identify the point of performance bottleneck and it’s ability to help resolving the problem Support automatic bottleneck identification? How? ProductivityMinor  Aver age Profiling / tracing support (9.2.1.2) Method of profiling/tracing the tool utilize 1.Profiling? Tracing? 2.Trace format 3.Trace strategy 4.Mechanism for turning on and off tracing Productivity, Portabilit y, Scalabilit y Critical Response time (9.2.6)Amount of time needed before any useful information is feed back to the user after program execution How long does it take to get back useful information ProductivityAverage Searching (9.3.6)Tool support for search of particular event or set of events Support data searching?ProductivityMinor Software support (9.1.3)Software support of the tool1.Libraries it supports 2.Languages it supports Usability, Productiv ity Critical Source code correlation (9.2.4.2) Tool’s ability to correlate event data back to the source code Able to correlate performance data to source code? Usability, Productiv ity Critical System stability (9.3.3)Stability of the toolCrash rateUsability, Productiv ity Average Technical support (9.3.4)Responsiveness of the tool developer 1.Time to get a response from developer. 2.Quality/usefulness of system messages UsabilityMinor  Aver age

Tool Evaluations

43 Purpose and Method Purpose  Evaluation of existing tools Method  Pick a set of modern performance tools to evaluate Try to pick most popular tools Also pick tools that are innovative in some form  For each tool, evaluate and score using the standard set of criteria  Also Evaluate against a set of programs with known bottlenecks to test how well each tool helps improve performance Attempt to find out which metrics are recorded by a tool and why Tools: TAU, PAPI, Paradyn, MPE/Jumpshot-4, mpiP, Vampir/VampirTrace (now Intel cluster tools), Dynaprof, KOJAK, SvPablo [in progress], MPICL/Paragraph [in progress] See Tool Evaluation presentations for complete evaluation of each tool

44 Instrumentation Methods Instrumentation methodology  Most tools use the MPI profiling interface Reduces instrumentation overhead for user and tool developer We are exploring ways to create and use something similar for UPC, SHMEM  A few tools use dynamic, binary instrumentation Paradyn, Dynaprof examples Makes things very easy for user, but very complicated for tool developer  Tools that rely entirely on manual instrumentation can be very frustrating to use! We should avoid this by using existing instrumentation libraries and code from other projects Instrumentation overhead  Most tools achieved less than 20% overhead for default set of instrumentation  Seems to be a likely target we should aim for in our tool

45 Visualizations Many tools provide one way of looking at things  “Do one thing, but do it well”  Can cause problems if performance is hindered due to something not being shown Gantt-chart/timeline visualizations most prevalent  Especially in MPI-specific tools Tools that allow multiple ways of looking things can ease analysis  However, too many methods can become confusing  Best to use a few visualizations that display different information In general, creating good visualizations not trivial  Some visualizations that look neat aren’t necessarily useful  We should try to export to known formats (Vampir, etc) to leverage existing tools and code

46 Bottleneck Detection To test, used PPerfMark  Extension of GrindStone benchmark suite for MPI applications  Contains short (<100 lines C code) applications with “obvious” bottlenecks Most tools rely on user to pick out bottlenecks from visualization  This affects scalability of tool as size of system increases  Notable exceptions: Paradyn, KOJAK In general, most tools faired well  “System time” benchmark was hardest to pick out  Tools that lack source code correlation also make it hard to track down where bottleneck occurs Best strategy seems to be combination of trace visualization and automatic analysis

47 Conclusions and Status [1] Completed tasks  Programming practices Mod 2 n inverse, convolution, CAMEL cipher, concurrent wave equation, depth-first search  Literature searches/preliminary research Experimental performance measurement techniques Language analysis for UPC (spec, Berekely, Michigan) and SHMEM (spec, GPSHMEM, Quadrics SHMEM, SGI SHMEM) Optimizations Performance analysis strategies Performance factors Presentation methodologies Performance modeling and prediction  Creation of tool evaluation strategy  Tool evaluations Paradyn, TAU, PAPI/Perfometer, MPE/Jumpshot, Dimemas/Paraver/MPITrace, mpiP, Intel cluster tools, Dynaprof, KOJAK

48 Conclusions and Status [2] Tasks currently in progress  Finish tool evaluations SvPablo and MPICL/ParaGraph  Finish up language analysis Waiting on NDAs for HP UPC Also on access to a Cray machine  Write tool evaluation and language analysis report  Creation of high-level PAT design documents (starting week of 3/28/2005) Creating a requirements list Generating a specification for each requirement Creating a design plan based on the specifications and requirements For more information, see PAT Design Plan on project website

Summary Presentation (3/24/2005) UPC Group HCS Research Laboratory University of Florida.

Similar presentations

Presentation on theme: "Summary Presentation (3/24/2005) UPC Group HCS Research Laboratory University of Florida."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Summary Presentation (3/24/2005) UPC Group HCS Research Laboratory University of Florida.

Similar presentations

Presentation on theme: "Summary Presentation (3/24/2005) UPC Group HCS Research Laboratory University of Florida."— Presentation transcript:

Similar presentations

About project

Feedback