Tom Furlani, Director September 19, 2015 XDMoD Overview
Outline Motivation XDMoD Portal Measuring QoS Job Level Performance
XDMoD: Comprehensive HPC Management 5-year NSF Grant (XD Net Metrics Service – XMS) XDMoD: XD Metrics on Demand* Analytics framework developed for XSEDE On demand, responsive, access to job accounting data Comprehensive Framework for HPC Management Support for several resource managers (Slurm, PBS, LSF, SGE) Utilization metrics across multiple dimensions Measure QoS of HPC Infrastructure (App Kernels) Job-level performance data (SUPReMM) Open XDMoD: Open Source version for HPC Centers 100+ academic & industrial installations worldwide 1.J.T. Palmer, S.M. Gallo, T. R. Furlani, M. D. Jones, R. L. DeLeon, J. P. White, N. Simakov, A. K. Patra, J. Sperhac, T. Yearke, R. Rathsam, M. Innus, C. D. Cornelius, J. C. Browne, W. L. Barth, R.T. Evans, “Open XDMoD: A Tool for the Comprehensive Management of High Performance Computing Resources”, Computing in Science and Engineering, 17, No. 4, 52-62, July – August DOI: /MCSE
Motivation Improve User Experience User shouldn’t be the “canary in the coal mine” identifying problems Example: Log File Analysis Discovers Two Malfunctioning Nodes
Motivation Improve User Throughput Software tools to automatically identify poorly performing jobs Job ran very inefficiently After HPC specialist user support, a similar job was vastly improved Before CPU efficiency below 35% After CPU efficiency near 100%
XDMoD Portal: XD Metrics on Demand Display Metrics – GUI Interface Utilization, performance, publications Role Based: View tailored to role of user Public, End user, PI, Center Director, Program Officer Custom Report Builder Multiple File Export Capability - Excel, PDF, XML, RSS, etc
CPU Hours Delivered by Decanal Unit
Drill Down: CPU Hours and Jobs by Engineering Dept
Computationally lightweight Run continuously and on demand to actively measure performance Utilize open source codes such as GAMESS, NWChem, NAMD, OpenFOAM, etc., as well as customized kernels Measure system performance from User’s perspective Local scratch, global filesystem performance, local processor-memory bandwidth, allocatable shared memory, processing speed, network latency and bandwidth QoS: Application Kernels
Application Kernel Use Case Application kernels help detect user environment anomaly at CCR Example: Performance variation of NWChem due to bug in commercial parallel file system (PanFS) that was subsequently fixed by vendor vendor patch installed
Application Kernel Use Case Uncovered performance issue with CCR’s Panasas parallel file system Timing coincided with recent core switch software upgrade
Measuring Job Level Performance Collaboration with Texas Advanced Computing Center Integration of XDMoD with Monitoring Frameworks TACC_Stats/Lariat, Performance CoPilot, Ganglia, etc Supply XDMoD with job performance data – applications run, memory, local I/O, network, file-system, and CPU usage Identify poorly performing jobs (users) and applications
Metrics Gathered Metrics gathered per node: Anything available, really – cpu, i/o, memory, filesystem Extensible – measurable quantities can be included with some development work (e.g. CUDA, MIC, panFS, gpfs, script capture, etc.) Overhead: so far we have not been able to measure it compared to the variability inherent in running jobs (order of percent), but keep in mind the potential for overhead when extending metrics
Why Collect Job Level Performance Data User Report Card Identify Underperforming User Codes Need an automated process Thousands of jobs run per day – not possible to manually search for poorly performing codes Jobs can be flagged for: Idle nodes Node failure High Cycles per Instruction (CPI) Performance plots and data from Web interface Command-line HPC consultants can use tools to identify/diagnose problems Single job viewer
Single Job Viewer Job Information on Stampede job (# ), displays accounting data, application classification, SUPReMM metrics with custom analysis
Single Job Viewer Job Information on Stampede job (# ), displays accounting data, application classification, SUPReMM metrics with custom analysis
Single Job Viewer Job Information on Stampede job (# ), displays accounting data, application classification, SUPReMM metrics with custom analysis
Single Job Viewer Job Information on Stampede job (# ), displays accounting data, application classification, SUPReMM metrics with custom analysis
Improving Job Level Performance - Success Stories MILC Code One project using MILC found to be running higher than expected CPI (1.1 vs 0.7) Members were not using available vectorized intrinsics 11% reduction in runtime DNS Turbulence Code CPI of 1.1 w/ lots of SUs Line-level profiling revealed MPI/OpenMP hot spots Converted OpenMP workshare block to parallel do block Improvement: 7% overall, 10% in main loop, 76% in code block Singularity Code (General Relativity) CPI of 1.15 w/ lots of SUs: 239,631 for 1st quarter 2014 Code was making many extraneous calls to cat and rm Code was not using any optimization flags (-O3 or –xhost) 26% decrease in run time after simple changes made
Recovering Wasted CPU Cycles TACC Stampede Job ran for 48 hours on one node out of seven
Recovering Wasted CPU Cycles TACC Stampede (via TAS Single Job Viewer) Job ran effectively on 45 nodes but did a serial write for 3 hours After user support, a similar job ( ) using parallel write fixed this Savings: 3 hours on 45 nodes recovered… Before: serial writeAfter: Parallel write CPU User Serial write Parallel write
Underperforming Job Notification All jobs running on cluster XDMoD/SUPReMM Collects all job data (prolog, epilog & 10 min intervals) XDMoD/SUPReMM Collects all job data (prolog, epilog & 10 min intervals) Automatically process all jobs analyze for: low CPU usage, drop in cache use, etc. Automatically process all jobs analyze for: low CPU usage, drop in cache use, etc. Flag “bad” jobs by category Notify: User support, Send message to user Notify: User support, Send message to user User or HPC Support can analyze and improve job performance thru Single Job Viewer
Broader Impact Open XDMoD Targeted at Academic and Industrial HPC Centers Based on XDMoD Source Code NCAR Beta test site Collaborating on developing storage reporting schema Blue Waters NSF’s largest supercomputer Many Academic HPC Centers Rice, Cambridge, UGA, UF, Chicago, CERN, New Mexico, Southampton, Manitoba, Leibniz U, Univ Medical Center Utrecht, Liverpool, Illinois, Kansas, RIT, U Leuven – Belgium, U Geneva, SciNet – U Toronto, Case Western, NY Genome Center, U Buffalo, ……… Known Industrial HPC Centers Rolls Royce, Dow, Lockheed Martin, Hess Energy, ……..
Acknowledgement TAS/SUPReMM UB: Tom Furlani, Matt Jones, Steve Gallo, Bob DeLeon, Ryan Rathsum, Jeff Palmer, Tom Yearke, Joe White, Jeanette Sperhac, Abani Patra, Nikolay Simakov, Cynthia Cornelius, Martins Innus, Ben Plessinger Indiana: Gregor von Laszewski, Fugang Wang University of Texas: Jim Browne TACC: Bill Barth, Todd Evans, Weijia Xu NICS: Shiquan Su NSF TAS: OCI , SUPReMM: OCI XMS: ACI
Colella’s 7 Dwarfs “Seven Dwarfs” of algorithms for simulation in the physical sciences. ( “Dwarfs” mine compute cycles for golden results
Colella’s 7 Dwarfs Application Area Structured Grids Unstructured Grids FFTDense Linear Algebra Sparse Linear Algebra N-BodyMonte Carlo Molecular Physics XX XX Nanoscale Science X X XX ClimateXXX XXX EnvironmentXX XXX CombustionX X X FusionXXXXXXX Nuclear Energy X XX AstrophysicsXX XXX Nuclear Physics X Accelerator Physics X X QCDX X AerodynamicsXX XX Table 1. Algorithms that play a key role within select scientific applications as characterized according to a seven dwarfs classification* *SCIENTIFIC APPLICATION REQUIREMENTS FOR LEADERSHIP COMPUTING AT THE EXASCALE, ORNL/TM-2007/238