Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services & (919) HiCLAS1 HiPERiSM Consulting, LLC Linking with AS1MET Services
Copyright, HiCLAS1 Topics Introduction Choice of hardware & OS Benchmark timings Hardware performance events Why is AERMOD-HPC faster? Conclusions Next steps and community responses
Copyright, HiCLAS1 Introduction HiCLAS1 Mission Why AERMOD? AERMOD-HPC development process QA process AERMOD-HPCS v1.8 release
Copyright, HiCLAS1 HiCLAS1 Mission HiCLAS1 is dedicated to bringing High Performance Computing (HPC) capability to Environmental Modeling. The HiCLAS1 mission is to develop (or enhance) software and improve performance on current and future computers for legacy Air Quality Models (AQM).
Copyright, HiCLAS1 Why AERMOD? Large/dedicated user community Long model runs Low efficiency Regulatory model Linux and Windows platforms
Copyright, HiCLAS1 AERMOD-HPC development process U.S. EPA source as baseline Progressive source modification Branching structure reduction Vector instruction enhancement Extensive testing/benchmarking of four case studies Parallel potential realized Code structure modifications for efficiency only: no changes in the science
Copyright, HiCLAS1 QA process A & B team source validation Line-by-line code inspection Tests with multiple compilers Tests on multiple platforms Comparison against U.S. EPA version: Line-by-line source inspection Numerical differences inspected
Copyright, HiCLAS1 AERMOD-HPCS v1.8 release Windows 2K and XP in three steps: Run installer package Request a license Run license extractor application Linux Available but not yet shipping Download pages at
Copyright, HiCLAS1 Choice of hardware & OS 32-bit Linux 64-bit Linux 32-bit MS Windows Pentium 4 Xeon (or AMD)
Copyright, HiCLAS1 Benchmark timings: vs EPA executable
Copyright, HiCLAS1 Benchmark timings: vs EPA source
Copyright, HiCLAS1 Hardware performance events Operations and instructions Memory footprint Branching instructions TLB Cache usage L1 cache usage
Copyright, HiCLAS1 Mflops
Copyright, HiCLAS1 Vector Mips
Copyright, HiCLAS1 Memory footprint: Mem instructions per flop
Copyright, HiCLAS1 Branching instructions
Copyright, HiCLAS1 TLB cache misses: Data (DM) vs. Instr. (IM)
Copyright, HiCLAS1 L1 cache misses: Data (DM) vs. Instr. (IM)
Copyright, HiCLAS1 Why is AERMOD-HPC faster? Higher Mflops rates Lower number of memory instructions per floating point instruction Lower mispredicted branch instruction rates Lower instruction TLB miss rates Lower L1 instruction cache miss rates
Copyright, HiCLAS1 Conclusions A much faster AERMOD is available as AERMOD-HPCS Current serial performance is 1.9 to 3.4 times faster than EPA distribution. Simple code transformations gave improved efficiency Much more left to do
Copyright, HiCLAS1 Next steps at HiCLAS1 Next release v1.9 features: Streamlined memory model More serial code speed-up Parallel version in progress Target is the quad-core CPU 10x speed-up is feasible: ~ 3x from serial improvements ~ 3x from parallelization
Copyright, HiCLAS1 Community responses “Let me be one of the first air dispersion modelers to congratulate you on this achievement. I most sincerely hope that you succeed on this important speed improvement on AERMOD.” –CEO of a major environmental software company. “Modifying air quality models to make use of parallel processing is a much needed improvement to the air quality community, and I commend the staff at High Performance Algorism Consulting that have made this possible” –Group leader of a State Department of Environmental Quality A major hardware & software vendor has offered services and support to HiCLAS1 for the AERMOD-HPC initiative