Download presentation
Presentation is loading. Please wait.
Published byCorey Webb Modified over 9 years ago
1
Aroon Nataraj, Alan Morris, Allen Malony, Matthew Sottile, Pete Beckman l {anataraj, amorris, malony, matt}@cs.uoregon.edu Department of Computer and Information Science University of Oregon beckman@mcs.anl.gov l Mathematics and Computer Science Division Argonne National Laboratory The Ghost in the Machine Observing the Effects of Kernel Operation on Parallel Application Performance
2
2 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Outline of the Talk Part A: Operating System Interference & Us Introduction to OS/R Interference Research Questions & Our Contribution Contrasting to State-of-the-Art Noise Measurement Part B The (K)TAU Performance System Fine-grained OS/App performance Correlation The Noise-Effect Estimation Technique Described Part C Evaluating accuracy of Noise-Estimation @ Scale Demonstrating Noise-Estimation Conclusion & Future Plans
3
3 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Parallel Applications’ Noise Propagation Introduction to Operating System Interference Waiting time due to OS Overhead accumulates Local Cycles: 3 Global Cycles: 9 Compute w w Collective Compute w w Collective Compute w w Collective os Simple Parallel Application Phases of Alternating Communication & Reduction What is the cause of variability?
4
4 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Is OS Noise a problem? When? Earlier work has shown significant OS/R interference problems Petrini et. al. SC’03 ASCI Q, P=4,096 | SAGE took twice as long predicted by performance model Jones et. al. SC’03, IBM SP | Degraded linear scaling performance of allreduce() Cause identified as OS/R Interference Noise Distribution Matters Agarwal et. al. HIPC’05 - Use theoretical approach Heavy-tailed & Bernoulli noise most detrimental Beckman et. al. CCJ’07 - Noise Emulation (Injection) Max Noise duration, for some noise-ratio, determines noise effects on collectives Our own noise-injection experiments show noise can be a problem Introduction to Operating System Interference
5
5 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Is OS Noise a problem? When? Introduction to Operating System Interference Effect of Noise Duration on Allreduce() P. Beckman, K. Iskra, K. Yoshii, S. Coghlan, A. Nataraj “Benchmarking the Effects of OS Interference on Extreme-Scale Parallel Machines” Cluster Computing Journal ’07, Special Issue from CLUSTER’06 (to appear). Noise Ratio (or %Noise) Not good indicator by itself Noise-event Duration Significantly influences effect => Allreduce() tight-loop => %Noise Fixed: 0.15% => Increase Noise-Event Duration
6
6 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Is OS Noise a problem? When? Introduction to Operating System Interference Parallel Ocean Program (POP) - Effect of Noise at Scale Yes. Noise does affect (some) real applications at scale. Different parts of application have different sensitivities to noise Total: 23% @ 1024 | 32% @ 2048 Baroclinic: 1% @ 1024 | 3% @ 2048 Barotropic: 75% @ 1024 | 86% @ 2048 POP, 1x Problem BG/L, VN Mode P = 1024, 2048 Noise: 1000HZ, 16 usec 1.6% Noise Total = Baroclinic + Barotropic Percent Increase in Runtime No. of Processors
7
7 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Motivating a Direct Measurement Approach to Noise Estimation OS/R Interference of Parallel Applications is a complex phenomenon Depends on the underlying platform (e.g. interconnect latencies, [*others?]) Depends on the OS/R configuration & resulting noise distributions Depends on parallel application behavior (comp. grain, global interactions) Simulation, Emulation, Micro-benchmarking, Modeling [*what else?] All important parts of the performance tool-box But need to be carefully parameterized to match the above dependencies Not all complex parallel applications have been modeled And models/simulations can be too simple or wrongly parameterized Micro-benchmarks do not capture application complexity (intentionally) What is App.X’s delay due to existing/intrinsic noise on Platform.Y? Need general measurement technique to answer that question Complementary to the other approaches in the tool-box Introduction to Operating System Interference
8
8 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Who cares about noise? Their concerns? System administrators / Site Managers: What are noise characteristics of new tera/peta-scale platform we are purchasing? Do the site’s “champion” applications suffer significant noise-slowdowns on it? Can these effects be mitigated by changes to configuration (OS/R, H/W)? Application Developers: Is my parallel application sensitive to noise? On a particular platform? Can I change application characteristics to reduce that sensitivity? What parts and call-sequences are most sensitive? OS/R Developers: What are the noise sources in the OS/R? What are the effects of different sources? Do all need to be removed? Not all are comparative - cannot simply look at difference in runtime No “normal” (or baseline) noise-less candidate to compare with Introduction to Operating System Interference
9
9 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Our Key Contribution A general noise-effect estimation technique based on direct measurement To answer the performance questions Q.1: Where is the OS/R noise on Platform.Y originating from (sources)? Q.2: What is App.X’s delay D, due to existing OS/R Interference on Platform.Y? Q.3: How do different OS/R noise sources contribute to delay D? Q.4: Which parts (operations, call-seq) of the applications are sensitive? Applies to any X & Y, (as long as X & Y allow the needed direct measurement) Without requiring “baseline” or “noise-less” candidate. Why? If “noiseless” candidate exists ==> then where is the noise-problem? A “baseline” will make answering Q.2 trivial. No special technique. Just Runtime. Can only answer comparative questions - effect of adding/removing noise. Wait a sec... Can’t existing work/methods already answer these questions? For Q.1, yes. But no general approach for Q.2 - Q.4. Lets look at some. Introduction to Operating System Interference
10
10 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Contrasting with the State-of-the-Art Petrini et. al. SC’03 - Micro-benchmark, simulation, modeling to close the loop Very specific to application (SAGE). Accurate model difficult to produce. Gioiosa et. al. ISSIPIT’04 - Microbenchmark & measurement Identify noise sources using OProfile sampling. Quantifies only local-noise. Agarwal et. al. HiPC’05 - Modeling Theoretical model used to study noise effects under different distributions Assumptions include: Balanced Load, Stationary, Balanced Noise, Identical noise Beckman et. al. CLUSTER’06 - Noise emulation Injected artificial noise into micro-benchmarks to understand effects Approach not applicable to measurement of real, existing noise Sottile et. al. IPDPS’06 - Trace-based Noise Simulation Modify application traces by adding artificial noise. Compare effect on runtime. Nataraj et. al. CLUSTER’06 / Europar’06 - Integrated Measurement of OS / Application using KTAU / TAU. Detect & Quantify Per-Process OS interactions. Introduction to Operating System Interference
11
11 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Part B
12
12 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 The (K)TAU Performance System [* TODO] What is TAU? Features? Who uses it? What is KTAU? Why? ZeptoOS? Tight Integration with TAU - why? Introduction to Operating System Interference
13
13 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 KTAU OS Metrics - How it works... schedule timer_interrupt schedule timer_interrupt sys_read MPI Rank 0 Application User Kernel KTAU Performance State PID: 1423 User-level Double-Buffered Metric Container On schedule() update counter get_shared_counter() 14 schedule timer_interrupt schedule timer_interrupt sys_read MPI Rank 1 Application User KTAU Performance State PID: 1430 User-level Double-Buffered Metric Container On timer_interrupt update counter get_shared_counter() sys_write do_IRQ
14
14 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Performance of KTAU OS Metrics & TAU MPI Tracing
15
15 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Integrated User/OS Profiling Virualized OS Metrics Per-MPI Rank | Detects & Quantifies OS/R Interactions
16
16 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Integrated User/OS Phase Profiling [* TODO] Can attribute OS Noise to specific Phases in MPI Ranks Again Can detect & quantify, but at finer grain within phases. ToDo collect data on garuda (or neuronic) easy to do.
17
17 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Noise-Effect Estimation Technique Outlined - Example A time Measured Timeline Approximated Timeline Noise Amplification with Wait time Process 1 Sends; Process 2 Receives Noise-Effect Estimation Technique Outlined
18
18 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Noise-Effect Estimation Technique Outlined - Example B time Measured Timeline Approximated Timeline Noise Absorption with Wait time Process 1 Sends; Process 2 Receives Noise-Effect Estimation Technique Outlined
19
19 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Noise-Effect Estimation Technique Outlined - Example C Multiple Sources - Combined Noise-Effect Have the pic. must place here. ToDo
20
20 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 Prior Work on Trace-based Timeline Approximation Wolf, Malony, Shende, Morris | HPCC’06 Use timeline approximation from application traces. Context: Measurement perturbation compensation, not noise-effect estimation. Sottile, Chandu, Bader | IPDPS’06 Trace-based simulation Inject artificial noise into existing application traces Reverse of our current work - we remove real noise from the trace. Current work inspired by and builds upon the above two works
21
21 Observing the Effects of Kernel Operation on Parallel Application PerformanceSC 2007 [* ToDo] Part C
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.