So you want to use GCHP? IGC9 GEOS-Chem High Performance Model Clinic Sebastian Eastham and Lizzie Lundgren May 6, 2019
A note about this presentation Check the notes! Anything said during this presentation which isn’t on the slides should be in the presentation notes. A glossary of terms is included at the end! This presentation assumes use of GCHP 12.3.2. Always check GCHP documentation for the most up-to-date information. http://wiki.seas.harvard.edu/geos-chem/index.php/GCHP_Main_Page
GCHP works now! This presentation will give you information on where we are, where we are going, what you need, and how to get started. But remember: you can use GCHP today: GCHP undergoes the same version control as GCC (stable) GCHP is benchmarked by the GCST (every X.Y.0 version)
GCHP Today is GCC but with... Grid resolution flexibility, without recompiling Cubed-sphere transport Improved scalability/speed Takes advantage of: FV3 offline advection ESMF for core infrastructure MAPL layer between GEOS-Chem and ESMF Eastham et al. (2018, GMD)
Stretched (“nested”) grids GCHP Tomorrow Now Mid-2019 2020 2021? TBD New MAPL: gfortran compatibility Faster I/O Lower memory footprint Cloud GCHP CMake Improved error handling Containers Stretched (“nested”) grids Flux-based transport Ongoing integration of GMAO improvements throughout...
What we will cover in this clinic What GCHP needs Hardware Software Workflow: GCHP versus GCC Download and create run directory Compile Configure run Run Develop Analyze Resources
What we will NOT cover in this clinic Software library installation but... Environment setup Basic information covered in the GCHP wiki tutorial (http://wiki.seas.harvard.edu/geos-chem/index.php/Getting_Started_With_GCHP) For any of this information: Stop by the GCST Help Desk during IGC9 M/Tu/W 1-3 pm Geological Museum 103A
What GCHP needs: hardware concepts Memory requirements grow with resolution* All requirements can be distributed across all nodes.. ..but bigger simulations will want better network fabric Think carefully about data needs - the closer your I/O, the better Node A 24 cores 128 GB mem. Infiniband 56+ Gb/s TCP-IP 1 Gb/s 10 Gb/s GCHP can be run across a network, using multiple nodes There are two main considerations: Do I have enough cores and memory for this resolution? In current version of GCHP, memory requirements grow with both resolution and the number of cores. You might find yourself “memory limited” rather than “core limited” - asking for more cores than you need so that you can get access to more memory. This is expected to be improved with the update to the new version of MAPL. How fast is my network? Your network fabric may be limiting you. If you are thinking about upgrading your cluster or buying a new one, consider a high-speed network fabric This could be a higher-performance ethernet (TCP-IP here), or one of the Infiniband versions. You really do want at least 10 Gb/s. Note that faster networks are generally also good for GC-Classic! If your compute nodes can’t talk to your data nodes because of slow network speeds, your simulation speed could be affected. However, you are more likely to fall foul of slow read/write at the disk level. Node B 24 cores 128 GB mem.
What GCHP needs: hardware concepts GCHP can be run across a network, using multiple nodes Two main considerations: Do I have enough cores and memory for this resolution? In current version of GCHP, memory requirements grow with both resolution and the number of cores. You might find yourself “memory limited” rather than “core limited” - asking for more cores than you need so that you can get access to more memory. This is expected to be improved with the update to the new version of MAPL. How fast is my network? Your network fabric may be limiting you. If you are thinking about upgrading your cluster or buying a new one, consider a high-speed network fabric This could be a higher-performance ethernet (TCP-IP here), or one of the Infiniband versions. You really do want at least 10 Gb/s. Note that faster networks are generally also good for GC-Classic! If your compute nodes can’t talk to your data nodes because of slow network speeds, your simulation speed could be affected. However, you are more likely to fall foul of slow read/write at the disk level.
What GCHP needs: hardware examples Resolution Requirements Minimum Recommended* C24 (~4°) 6 cores (1 node) 60 GB memory 24 cores (1 node) 80 GB memory ... C90 (~1°) 48 cores (2 nodes) 256 GB memory 96 cores (4 nodes) 512 GB memory C360 (~0.25°) 100-1000 cores (10+) 2048 GB memory The number in parentheses is the number of nodes These numbers are recommendations only *Recommended values based on 12.3.2 where memory requirements grow with core count
What GCHP needs: software Requirement ...for GC-Classic? ...for GCHP? GNU or Intel Fortran compiler Yes HDF-5 and zlib NetCDF-C NetCDF-Fortran MPI implementation No EMSF v7+, MAPL, FV3 Included in GCHP
Workflow Comparison Step 1: Download and create run directory Two nested source code repositories Create GCHP run directory from source code, not Unit Tester Use interactive script, not edit a text file GCC will do this in a future update geos-chem gchp (includes MAPL, ESMF, FV3) See slides at end of presentation for more detailed explanations of steps.
Workflow Comparison Step 2: Compile Like GCC, build from run directory Multiple build options Initial full build required; subsequent build is change-dependent Type make or make help to display options and use cases ESMF MAPL FV3 GEOS-Chem Core make build_all ✓ make build_mapl make build_core
Workflow Comparison Step 3: Configure run More config files in GCHP than GCC Set common run-time settings in driver file runConfig.sh Simulation start/end time Cubed-sphere grid resolution Diagnostics collection frequency, duration, mode # cores Configure emissions in HEMCO_Config.rc, but also ExtData.rc Unlike GCC diagnostics: GCHP HISTORY.rc includes emissions Diagnostic frequency is HHmmSS, not YYYYMMDD HHmmSS
Workflow Comparison Step 4: Run Assess resource needs - # cores, # nodes, memory Adapt run script from runScriptSamples/ Non-system-specific in gchp.local.run SLURM-specific in gchp.run Advanced multi-run option for monthly diagnostics Resource management tips: # cores used (runConfig.sh) can be less than # cores requested (gchp.run) to maximize memory per core
Workflow Comparison Step 5: Develop Develop source code same as GCC Key exceptions: main.F: driver files in GCHP (Chem_GridCompMod, gigc_chunk_mod) HEMCO I/O and History/: I/O handling in MAPL (ExtData, History) tpcore: advection in FVdycoreCubed_GridComp/ Key concepts: Same GEOS-Chem states as GCC (State_Chm/Met/Diag) State array grids different per core (regional subset) GCHP also has MAPL states (Imports, Exports, Internal State) GEOS-Chem and MAPL states exchange data via pointers
Workflow Comparison Step 6: Analyze Diagnostics data in OutputDir/ Restart and emissions output are vertically flipped Use gcpy benchmark code for data comparisons Same functions for GCC and GCHP Plot on cubed-sphere, or regrid to any grid/resolution See gcpy/examples/compare_diagnostics.ipynb Use NASA GISS Panoply netcdf viewer for raw data Format not compatible until after upcoming MAPL version update
Relax, you are not alone! Create issues on GitHub to get help from GCST and beyond https://github.com/geoschem/gchp/issues Join the GCHP Working Group and post your project http://wiki.seas.harvard.edu/geos-chem/index.php/GEOS-Chem_High_Performance_Working_Group Join the Slack workspace to chat with users Read and contribute to the GCHP wiki http://wiki.seas.harvard.edu/geos-chem/index.php/GEOS-Chem_HP Contact the GEOS-Chem Support Team http://wiki.seas.harvard.edu/geos-chem/index.php/GEOS-Chem_Support_Team
Questions/Comments/Discussion
Top 10 list of time-saving tips Only recompile code you changed (use make help) Changing simulations does not require recompilation (copy geos) runConfig.sh overwrites settings in other config files Do not simply run the executable; use or adapt run scripts provided Run-time MAPL/ESMF errors are nearly always from bad config CAP error is usually bad simulation dates; check runConfig.sh ExtData error is input; set MAPL_DEBUG_LEVEL to 20 in runConfig.sh MAPL History error is problem with diagnostics; check HISTORY.rc Search for “making install” in compile.log to find MAPL build errors Create GCHP GitHub issue if you think you need to edit MAPL or ESMF
Glossary of terms GCHP GEOS-Chem High Performance GCC GEOS-Chem Classic FV3 Finite Volume for Cubed Sphere Transport. Performs transport calculations on a cubed-sphere grid. From NASA GMAO. MAPL Model Analysis and Prediction Layer; connects components (e.g. transport, I/O, chemistry) together. From NASA GMAO. MPI Message Passing Interface. System underlying high-speed communication between computational nodes. ESMF Earth System Modeling Framework. The code framework on which MAPL is based. Maintained independently. SLURM Open-source job scheduler for Linux and Unix-like kernels Network fabric The hardware on which your network is based. Can be ethernet (typically 1-10 Gb/s) or Infiniband (typically 56 Gb/s or higher). gcpy Python package for GEOS-Chem data analysis. https://github.com/geoschem/gcpy/
Detailed Workflow Comparison Step 1: Download and create run directory GCC GCHP clone geoschem/geos-chem clone geoschem/geos-chem-unittest clone geoschem/gchp (as subdirectory ‘GCHP’) go to UT/perl directory go to GCHP directory edit CopyRunDirs.input: source path, target, etc type ./createRunDir.sh for interactive rundir creation type ./gcCopyRunDir go to run directory type ./setEnvironment path/to/envfile
Detailed Workflow Comparison Step 2: Compile GCC GCHP Go to run directory source path/to/envfile type ‘make build_all’ type make command, e.g. ‘make –j4 mpbuild’
Detailed Workflow Comparison Step 3: Configure run GCC GCHP go to run directory edit input.geos edit runConfig.sh -> when sourced at run-time, updates many files, including input.geos edit HEMCO_Config.rc edit HEMCO_Config.rc and ExtData.rc edit HISTORY.rc for non-emissions diagnostics -> collections, fields, frequency, duration, etc edit HISTORY.rc for ALL diagnostics -> collections and fields only! (freq/dur/mode set from runConfig.sh!) Edit HEMCO_Diagn.rc for emissions diagnostics -> all uncommented entries are output -> only entries in HISTORY.rc are output
Detailed Workflow Comparison Step 4: Run GCC GCHP Go to run directory Check that config files are correct Option to run interactively: source environment file type ./geos copy runScriptSamples/gchp.local.run to rundir type ./gchp.local.run NOTE: You must have enough memory and cores available! Option to submit job: Copy an existing run script to rundir Edit as needed to request resources Sumit to your local cluster job resource manager, e.g. sbatch gcc.run Copy an existing run script to rundir, or adapt an example from runScriptSamples/ Submit to your local cluster job resource manager, e.g. sbatch gchp.run
Detailed Workflow Comparison Step 5: Develop GCC GCHP High-level calls and order of components in: main.F GCHP/Chem_GridCompMod.F90 GCHP/gigc_chunk_mod.F90 Edit state variables and non-transport components in GEOS-Chem repository Same as GCC Edit transport in GEOS-Chem repository Edit transport in GCHP/FVdycoreCubed_GridComp, main file AdvCore_GridCompMod.F90 Edit core I/O handling in GEOS-Chem repository Edit core I/O handling in GCHP/Shared/MAPL_Base
Detailed Workflow Comparison Step 6: Analyze GCC GCHP Verify output present: All output in run directory Restart: GEOSChem.Restart.*.nc4 Emissions: HEMCO_diagnostics.*.nc bpch: trac_avg.* netcdf: GEOSChem.collection.*.nc4 Restart in run directory: gcchem*.nc All diagnostic output in OutputDir/ Emissions: GCHP.Emissions.*.nc4 Other diagnostics: GCHP.collection.*.nc4 Analyze GCC output data (bpch or nc) with python; IDL GAMAP no longer supported Analyze GCHP output data with python; IDL GAMAP does not support cubed-sphere Use gcpy benchmark comparison tools for GCC: Regrid lat-lon to any resolution Level maps and zonal mean Use gcpy benchmark comparison tools for GCHP: Regrid cubed-sphere to any resolution Regrid cubed-sphere to lat-lon Many python libraries available for data analysis. The same libraries can be used for GCHP data.