Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services

Similar presentations


Presentation on theme: " Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services"— Presentation transcript:

1  Copyright, HiCLAS1 http://www.hiclas1.com George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services george@hiclas1.comgeorge@hiclas1.com & arney@hiclas1.comarney@hiclas1.com (919)484-9803 HiCLAS1 HiPERiSM Consulting, LLC Linking with AS1MET Services

2  Copyright, HiCLAS1 http://www.hiclas1.com Topics  Introduction  Choice of hardware & OS  Benchmark timings  Hardware performance events  Why is AERMOD-HPC faster?  Conclusions  Next steps and community responses

3  Copyright, HiCLAS1 http://www.hiclas1.com Introduction  HiCLAS1 Mission  Why AERMOD?  AERMOD-HPC development process  QA process  AERMOD-HPCS v1.8 release

4  Copyright, HiCLAS1 http://www.hiclas1.com HiCLAS1 Mission HiCLAS1 is dedicated to bringing High Performance Computing (HPC) capability to Environmental Modeling. The HiCLAS1 mission is to develop (or enhance) software and improve performance on current and future computers for legacy Air Quality Models (AQM).

5  Copyright, HiCLAS1 http://www.hiclas1.com Why AERMOD?  Large/dedicated user community  Long model runs  Low efficiency  Regulatory model  Linux and Windows platforms

6  Copyright, HiCLAS1 http://www.hiclas1.com AERMOD-HPC development process U.S. EPA source as baseline Progressive source modification Branching structure reduction Vector instruction enhancement Extensive testing/benchmarking of four case studies Parallel potential realized Code structure modifications for efficiency only: no changes in the science

7  Copyright, HiCLAS1 http://www.hiclas1.com QA process A & B team source validation Line-by-line code inspection Tests with multiple compilers Tests on multiple platforms Comparison against U.S. EPA version:  Line-by-line source inspection  Numerical differences inspected

8  Copyright, HiCLAS1 http://www.hiclas1.com AERMOD-HPCS v1.8 release  Windows 2K and XP in three steps:  Run installer package  Request a license  Run license extractor application  Linux  Available but not yet shipping  Download pages at http://www.hiclas1.com

9  Copyright, HiCLAS1 http://www.hiclas1.com Choice of hardware & OS  32-bit Linux  64-bit Linux  32-bit MS Windows  Pentium 4 Xeon (or AMD)

10  Copyright, HiCLAS1 http://www.hiclas1.com Benchmark timings: vs EPA executable

11  Copyright, HiCLAS1 http://www.hiclas1.com Benchmark timings: vs EPA source

12  Copyright, HiCLAS1 http://www.hiclas1.com Hardware performance events Operations and instructions Memory footprint Branching instructions TLB Cache usage L1 cache usage

13  Copyright, HiCLAS1 http://www.hiclas1.com Mflops

14  Copyright, HiCLAS1 http://www.hiclas1.com Vector Mips

15  Copyright, HiCLAS1 http://www.hiclas1.com Memory footprint: Mem instructions per flop

16  Copyright, HiCLAS1 http://www.hiclas1.com Branching instructions

17  Copyright, HiCLAS1 http://www.hiclas1.com TLB cache misses: Data (DM) vs. Instr. (IM)

18  Copyright, HiCLAS1 http://www.hiclas1.com L1 cache misses: Data (DM) vs. Instr. (IM)

19  Copyright, HiCLAS1 http://www.hiclas1.com Why is AERMOD-HPC faster? Higher Mflops rates Lower number of memory instructions per floating point instruction Lower mispredicted branch instruction rates Lower instruction TLB miss rates Lower L1 instruction cache miss rates

20  Copyright, HiCLAS1 http://www.hiclas1.com Conclusions A much faster AERMOD is available as AERMOD-HPCS Current serial performance is 1.9 to 3.4 times faster than EPA distribution. Simple code transformations gave improved efficiency Much more left to do

21  Copyright, HiCLAS1 http://www.hiclas1.com Next steps at HiCLAS1  Next release v1.9 features:  Streamlined memory model  More serial code speed-up  Parallel version in progress  Target is the quad-core CPU  10x speed-up is feasible:  ~ 3x from serial improvements  ~ 3x from parallelization

22  Copyright, HiCLAS1 http://www.hiclas1.com Community responses “Let me be one of the first air dispersion modelers to congratulate you on this achievement. I most sincerely hope that you succeed on this important speed improvement on AERMOD.” –CEO of a major environmental software company. “Modifying air quality models to make use of parallel processing is a much needed improvement to the air quality community, and I commend the staff at High Performance Algorism Consulting that have made this possible” –Group leader of a State Department of Environmental Quality A major hardware & software vendor has offered services and support to HiCLAS1 for the AERMOD-HPC initiative


Download ppt " Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services"

Similar presentations


Ads by Google