Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony, Department of Computer and Information.

Similar presentations


Presentation on theme: "Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony, Department of Computer and Information."— Presentation transcript:

1 Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony, matt}@cs.uoregon.edu Department of Computer and Information Science University of Oregon beckman@mcs.anl.gov l Mathematics and Computer Science Division Argonne National Laboratory The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance

2 2 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Outline of the Talk  Part A: Operating System Interference & Us  Introduction to OS/R Interference  Research Questions & Our Contribution  Contrasting to State-of-the-Art Noise Measurement  Part B  The (K)TAU Performance System  Fine-grained OS/App performance Correlation  The Noise-Effect Estimation Technique Described  Part C  Evaluating accuracy of Noise-Estimation @ Scale  Demonstrating Noise-Estimation  Conclusion & Future Plans

3 3 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Parallel Applications’ Noise Propagation Introduction to Operating System Interference Waiting time due to OS Overhead accumulates Local Cycles: 3 Global Cycles: 9 Compute w w Collective Compute w w Collective Compute w w Collective os Simple Parallel Application Phases of Alternating Communication & Reduction What is the cause of variability?

4 4 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Is OS Noise a problem? When?  Earlier work has shown significant OS/R interference problems  Petrini et. al. SC’03  ASCI Q, P=4,096 | SAGE took twice as long predicted by performance model  Jones et. al. SC’03,  IBM SP | Degraded linear scaling performance of allreduce()  Cause identified as OS/R Interference  Noise Distribution Matters  Agarwal et. al. HIPC’05 - Use theoretical approach  Heavy-tailed & Bernoulli noise most detrimental  Beckman et. al. CCJ’07 - Noise Emulation (Injection)  Max Noise duration, for some noise-ratio, determines noise effects on collectives  Our own noise-injection experiments show noise can be a problem Introduction to Operating System Interference

5 5 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Is OS Noise a problem? When? Introduction to Operating System Interference Effect of Noise Duration on Allreduce() P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, A. Nataraj “Benchmarking the Effects of OS Interference on Extreme-Scale Parallel Machines” Cluster Computing Journal ’07, Special Issue from CLUSTER’06 (to appear). Noise Ratio (or %Noise) Not good indicator by itself Noise-event Duration Significantly influences effect => Allreduce() tight-loop => %Noise Fixed: 0.15% => Increase Noise-Event Duration

6 6 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Is OS Noise a problem? When? Introduction to Operating System Interference Parallel Ocean Program (POP) - Effect of Noise at Scale Yes. Noise does affect (some) real applications at scale. Different parts of application have different sensitivities to noise Total: 23% @ 1024 | 32% @ 2048 Baroclinic: 1% @ 1024 | 3% @ 2048 Barotropic: 75% @ 1024 | 86% @ 2048 POP, 1x Problem BG/L, VN Mode P = 1024, 2048 Noise: 1000HZ, 16 usec 1.6% Noise Total = Baroclinic + Barotropic Percent Increase in Runtime No. of Processors

7 7 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Motivating a Direct Measurement Approach to Noise Estimation  OS/R Interference of Parallel Applications is a complex phenomenon  Depends on the underlying platform (e.g. interconnect latencies, [*others?])  Depends on the OS/R configuration & resulting noise distributions  Depends on parallel application behavior (comp. grain, global interactions)  Simulation, Emulation, Micro-benchmarking, Modeling [*what else?]  All important parts of the performance tool-box  But need to be carefully parameterized to match the above dependencies  Not all complex parallel applications have been modeled  And models/simulations can be too simple or wrongly parameterized  Micro-benchmarks do not capture application complexity (intentionally)  What is App.X’s delay due to existing/intrinsic noise on Platform.Y?  Need general measurement technique to answer that question  Complementary to the other approaches in the tool-box Introduction to Operating System Interference

8 8 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Who cares about noise? Their concerns?  System administrators / Site Managers:  What are noise characteristics of new tera/peta-scale platform we are purchasing?  Do the site’s “champion” applications suffer significant noise-slowdowns on it?  Can these effects be mitigated by changes to configuration (OS/R, H/W)?  Application Developers:  Is my parallel application sensitive to noise? On a particular platform?  Can I change application characteristics to reduce that sensitivity?  What parts and call-sequences are most sensitive?  OS/R Developers:  What are the noise sources in the OS/R?  What are the effects of different sources? Do all need to be removed?  Not all are comparative - cannot simply look at difference in runtime  No “normal” (or baseline) noise-less candidate to compare with Introduction to Operating System Interference

9 9 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Our Key Contribution  A general noise-effect estimation technique based on direct measurement  To answer the performance questions  Q.1: Where is the OS/R noise on Platform.Y originating from (sources)?  Q.2: What is App.X’s delay D, due to existing OS/R Interference on Platform.Y?  Q.3: How do different OS/R noise sources contribute to delay D?  Q.4: Which parts (operations, call-seq) of the applications are sensitive?  Applies to any X & Y, (as long as X & Y allow the needed direct measurement)  Without requiring “baseline” or “noise-less” candidate. Why?  If “noiseless” candidate exists ==> then where is the noise-problem?  A “baseline” will make answering Q.2 trivial. No special technique. Just Runtime.  Can only answer comparative questions - effect of adding/removing noise.  Wait a sec... Can’t existing work/methods already answer these questions?  For Q.1, yes. But no general approach for Q.2 - Q.4. Lets look at some. Introduction to Operating System Interference

10 10 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Contrasting with the State-of-the-Art  Petrini et. al. SC’03 - Micro-benchmark, simulation, modeling to close the loop  Very specific to application (SAGE). Accurate model difficult to produce.  Gioiosa et. al. ISSIPIT’04 - Microbenchmark & measurement  Identify noise sources using OProfile sampling. Quantifies only local-noise.  Agarwal et. al. HiPC’05 - Modeling  Theoretical model used to study noise effects under different distributions  Assumptions include: Balanced Load, Stationary, Balanced Noise, Identical noise  Beckman et. al. CLUSTER’06 - Noise emulation  Injected artificial noise into micro-benchmarks to understand effects  Approach not applicable to measurement of real, existing noise  Sottile et. al. IPDPS’06 - Trace-based Noise Simulation  Modify application traces by adding artificial noise. Compare effect on runtime.  Nataraj et. al. CLUSTER’06 / Europar’06 - Integrated Measurement of OS / Application using KTAU / TAU. Detect & Quantify Per-Process OS interactions. Introduction to Operating System Interference

11 11 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Part B

12 12 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 The (K)TAU Performance System [* TODO]  What is TAU? Features? Who uses it?  What is KTAU? Why? ZeptoOS?  Tight Integration with TAU - why? Introduction to Operating System Interference

13 13 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 KTAU OS Metrics - How it works... schedule timer_interrupt schedule timer_interrupt sys_read MPI Rank 0 Application User Kernel KTAU Performance State PID: 1423 User-level Double-Buffered Metric Container On schedule() update counter get_shared_counter() 14 schedule timer_interrupt schedule timer_interrupt sys_read MPI Rank 1 Application User KTAU Performance State PID: 1430 User-level Double-Buffered Metric Container On timer_interrupt update counter get_shared_counter() sys_write do_IRQ

14 14 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Performance of KTAU OS Metrics & TAU MPI Tracing

15 15 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Integrated User/OS Profiling Virualized OS Metrics Per-MPI Rank | Detects & Quantifies OS/R Interactions

16 16 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Integrated User/OS Phase Profiling [* TODO]  Can attribute OS Noise to specific Phases in MPI Ranks  Again Can detect & quantify, but at finer grain within phases.  ToDo collect data on garuda (or neuronic) easy to do.

17 17 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Noise-Effect Estimation Technique Outlined - Example A time Measured Timeline Approximated Timeline Noise Amplification with Wait time Process 1 Sends; Process 2 Receives Noise-Effect Estimation Technique Outlined

18 18 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Noise-Effect Estimation Technique Outlined - Example B time Measured Timeline Approximated Timeline Noise Absorption with Wait time Process 1 Sends; Process 2 Receives Noise-Effect Estimation Technique Outlined

19 19 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Noise-Effect Estimation Technique Outlined - Example C  Multiple Sources - Combined Noise-Effect  Have the pic. must place here. ToDo

20 20 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Prior Work on Trace-based Timeline Approximation  Wolf, Malony, Shende, Morris | HPCC’06  Use timeline approximation from application traces.  Context: Measurement perturbation compensation, not noise-effect estimation.  Sottile, Chandu, Bader | IPDPS’06  Trace-based simulation  Inject artificial noise into existing application traces  Reverse of our current work - we remove real noise from the trace.  Current work inspired by and builds upon the above two works

21 21 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 [* ToDo] Part C


Download ppt "Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony, Department of Computer and Information."

Similar presentations


Ads by Google