Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC HiPERiSM Consulting, LLC.
Copyright, HiPERiSM Consulting, LLC, CHOOSING A COMPILER FOR AQM APPLICATIONS ON LINUX George Delic, Ph.D. Models-3 User’s Workshop October 27-29, 2003 RTP, NC
Copyright, HiPERiSM Consulting, LLC, Overview 1.Introduction 2.Choice of Hardware 3.Choice of Compilers 4.Choice of Benchmarks 5.Comparing Execution Times 6.Evaluation of SSE Results 7.Tests for AQM’s 8.Conclusions
Copyright, HiPERiSM Consulting, LLC, Introduction Motivation AQM’s are migrating to COTS hardware Linux is preferred Rich choice of compilers is now available Need to learn about portability issues What is known about compilers for IA-32? CMAQ releases switch compilers w/o comment Where is the analysis of differences in Performance? Numerical accuracy & stability? Portability problems?
Copyright, HiPERiSM Consulting, LLC, Choice of Hardware & Compilers Hardware Intel Pentium III (933 MHz, dual processor) with SSE extensions and 256MB L2 cache Linux kernel Fortran compilers for IA-32 Absoft 8.0 Intel 7.1 Lahey 5.6 Portland CDK 4.0
Copyright, HiPERiSM Consulting, LLC, Choice of Benchmarks Kallman Integer and Logical Algorithm Uses only I & L operations with bit intrinsics Negligible I/O and memory operations Six cases with problem size scaling Stommel Ocean Model sp Floating Point Algorithm Jacobi iteration sweep over 2-D physical domain Regular loops optimal for testing vectorization Six cases in the range N=2x10 3 to 7x10 3 with N 2 =4 to 49 million data points
Copyright, HiPERiSM Consulting, LLC, Choice of Benchmarks (cont.) Princeton Ocean Model dp FP Algorithm Example of “real-world” code that is numerically unstable with sp arithmetic! 500+ vectorizable loops to exercise compilers 9 procedures account for 85% of CPU time 2-Day simulation for two cases: Small problem: 65 x 49 x 21 Large problem: 100 x 40 x 15
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman compiler switches Compiler and version Compiler command and selected switches Absoft 8.0f90 –O3 –ffixed Intel 7.1ifc –O3 –tpp6 -FI Lahey 5.6lf95 –tpp –fix Portland 4.0pgf90 –fast
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman (seconds) NAbsoftIntelLaheyPortland
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman (log10 seconds)
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: Kallman (ratio to Absoft time)
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (POM) compiler switches (without SSE) Compiler and version Compiler command and selected switches Absoft 8.0f90 –s –cpu:p6–O3 (-N113) – ffixed Intel 7.1ifc –O3 (-r8) –tpp6 -FI Lahey 5.6lf95 –tpp (-dbl) –fix Portland 4.0pgf90 –fast (-r8) –Mvect
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM without SSE (seconds) NAbsoftIntelLaheyPortland
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (without SSE)
Copyright, HiPERiSM Consulting, LLC, Statistics for four compilers: SOM (without SSE)
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: POM (without SSE) CaseAbsoftIntelLaheyPortland
Copyright, HiPERiSM Consulting, LLC, Statistics for four compilers: Variability vs. problem size
Copyright, HiPERiSM Consulting, LLC, Evaluation of SSE Results IA-32 Hardware Intel Pentium III+ supports Streaming- Single-Instruction-Multiple-Data Extensions (SSE) Linux kernel supports SSE Fortran compilers that enable SSE Intel 7.1 Portland CDK 4.0
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (POM) compiler switches (with SSE) Compiler and version Compiler command and selected switches Intel 7.1ifc –O3 -xK (-r8) –tpp6 -FI Portland 4.0pgf90 –fast (-r8) –Mvect=sse
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: SOM (with SSE)
Copyright, HiPERiSM Consulting, LLC, Comparing Execution Times: POM (with SSE)
Copyright, HiPERiSM Consulting, LLC, Evaluation of SSE Results Fortran compilers with SOM (sp) Intel 7.1 Average speed up of 1.44 Portland CDK 4.0 Average speed up of 1.70 Fortran compilers with POM (dp) Intel 7.1 Average speed up of 1.25 Portland CDK 4.0 Average speed up of 1.19
Copyright, HiPERiSM Consulting, LLC, Tests for AQM’s Next steps for CMAQ with four compilers: Report on portability issues Re-compilation of all libraries Performance instrumentation & analysis Numerical & stability analysis OpenMP performance study Please propose scenarios worthwhile using for these tests!
Copyright, HiPERiSM Consulting, LLC, Conclusions Hardware: COTS is the way to go but ……. Linux: Operating System is popular but ….. Programming Environment: rich in choices Consequences for AQM: the combination of hardware, Linux, and programming environment needs careful on-going evaluation. HiPERiSM is ready for this task!
Copyright, HiPERiSM Consulting, LLC, HiPERiSM’s URL Talk to us about your requirements