Performance of CMAQ on a Mac OS X System Tracey Holloway, John Bachan, Scott Spak Center for Sustainability and the Global Environment University of Wisconsin-Madison A presentation to the 3rd annual CMAS Models-3 conference October 19, 2004
Thinking different. Motivation Methods Performance Hardware Release Ongoing Improvements
Motivations. Simplified operation Easier development Easy clustering Improved performance
Motivation: Operation. Single platform for all research and academic computing User-friendly interface UNIX OS Open source software, hardware support Today’s cluster node = tomorrow’s desktop
Motivation: Development. Better Developer Tools Xcode (Interface Builder) CHUD performance & debugging suite Distribution Tools standardized profiles PackageMaker FAT binaries automated installation
Operation & Development.
Motivation: Performance. Unique Hardware Advantages powerful PPC 970 vector chip auto-vectorizing compilers 2000 NASA Langley report Populist Parallelization mix dedicated cluster nodes with free cycles on personal & lab machines off-the-shelf solutions simple GUI and command-line tools
Methods. IBM XL Fortan v8.1 compiler auto-vectorization equivalent to AIX Modifications flag conversion build settings array passing > 400 man-hours
Performance. 2 Test Machines dual 2 GHz G5, 5 GB RAM, 1 GHz bus stock dual 1 GHz G4, 1.5 GB RAM, 133 MHz bus Mac OS X Test Run First day of CMAQ 4.3 tutorial 1 day, 32 km x 32 km, 38 x 38, 6 layers default EBI CB4 chemistry
Benchmarks. Tutorial Runtime by Hardware and Compiler (seconds) IFC = Intel Fortan Compiler 7.1 PGF = Portland Group Compiler Intel machines running CMAQ 4.22 on 2 processors with mpich parallelization. Source: Gail Tonnesen, “Benchmarks for CPUs and Compilers for the CMAQ release.” Macs running CMAQ 4.3 on 1 processor (XLF) or 2 processors (XLF SMP) with OpenMP parallelization seconds
Chemistry. Species Mean | | from reference Max | | from reference (% of cells >1 ppb) O3O ppb 4.52 ppb (0.43) NO ppb 0.72 ppb (0) NO ppb 2.05 ppb (0.02) NH ppb 1.67 ppb (0.0002) SO 4 (I + J) g/m g/m 3 Source: ACONC.nc output from Day 1 of CMAQ 4.3 tutorial Dual 2 GHz G5 running CMAQ 4.3 on 1 processor
Good Chemistry. Small difference from reference set greater than difference among Intel machines and compilers Noise, floating point calculations, initialization greatest at surface level, early in run ambient concentrations only random distribution no bias does not propagate in time or space not correlated to high or low concentrations Consistent G4/G5 chemistry modules compiler flags
Better Chemistry. Tutorial Runtime by Chemistry Module (seconds) Dual 2 GHz G5 running CMAQ 4.3 on 1 processor
Models-3 on Mac, 10/04. Core Platform MM5 (Fovell) MCIP v2.2 Smoke v2.1 CMAQ v4.3 Libraries & Add-Ons netCDF v3.5.1 mpich v I/O API v2.2 MCPL Currently no PAVE, but Vis5d, VisAd, GrADS, NCL, and
Hardware.
Dedicated Cluster XServe G5 Dual 2 GHz, 2 GB RAM Xserve RAID 3.5 TB 8 Power Mac G5 Dual 2GHz, 5 GB RAM Distributed Capacity student lab eMacs personal G4 desktops 60 processor vector cluster 0 Full-time Sys-admins 18 G5 processors 42 G4 processors
Cost Competitive. Apple Xserve Dual G5 2GHz < $3500 RAID storage at $3 per GB G5 Desktop $ Compare to Dell PowerVault RAID at $5 per GB Dell Precision dual Xeon 2.8 GHz, $ Sysadmin costs
JOHN SCOTT
Release. Following input from the CMAS Center alpha code to CMAS by November, 2004 CMAS testing potential support Following CMAS Testing, preliminary code, scripts, binaries, instructions available for download at Scott Spak will answer questions for early users:
Ongoing improvements. Our planned activities g95 - GNU compilation parallel implementations Condor Xgrid Pooch/Appleseed further optimization Dual 2.5 GHz benchmarks CMAQ MADRID A community effort? CMAQ Unified MIMS PAVE
Acknowledgements. Mary Sternitzky, UW Seth Price, UW Hans Vahlenkamp and NOAA GFDL Zac Adelman and the CMAS Help Desk Dr. Gail Tonnesen and Glen Kaukola, UCR Models-3 Listserv All funding provided by the University of Wisconsin- Madison.