11/18/08 1 An Inconvenient Question: Are We Going to Get the Algorithms and Computing Technology We Need to Make Critical Climate Predictions in Time?

Slides:



Advertisements
Similar presentations
DOE Global Modeling Strategic Goals Anjuli Bamzai Program Manager Climate Change Prediction Program DOE/OBER/Climate Change Res Div
Advertisements

Computation of High-Resolution Global Ocean Model using Earth Simulator By Norikazu Nakashiki (CRIEPI) Yoshikatsu Yoshida (CRIEPI) Takaki Tsubono (CRIEPI)
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel Research at Illinois Parallel Everywhere
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Scientific Grand Challenges Workshop Series: Challenges in Climate Change Science and the Role of Computing at the Extreme Scale Warren M. Washington National.
GFDL’s unified regional-global weather and climate modeling system is designed for all temporal-spatial scales Examples: Regional cloud-resolving Radiative-Convective.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Global Climate Modeling Research John Drake Computational Climate Dynamics Group Computer.
Phil’s Promised Presentation on POP’s Present Progress, Performance and Penultimate Post-present Plan POP People P. Malone, P. Smith, P. Maltrud, P. Jones,
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Climate modeling Current state of climate knowledge – What does the historical data (temperature, CO 2, etc) tell us – What are trends in the current observational.
Challenges in analyzing a High-Resolution Climate Simulation John M. Dennis, Matthew Woitaszek
Mesoscale & Microscale Meteorological Division / NCAR ESMF and the Weather Research and Forecast Model John Michalakes, Thomas Henderson Mesoscale and.
B1 -Biogeochemical ANL - Townhall V. Rao Kotamarthi.
Climate Initiatives and Opportunities J. J. Hack (NCCS)
Lecture 1: Introduction to High Performance Computing.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Computer System Architectures Computer System Software
Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.
2 Chip-Multiprocessors & You John Dennis March 16, 2007 John Dennis March 16, 2007.
ESMF Development Status and Plans ESMF 4 th Community Meeting Cecelia DeLuca July 21, 2005 Climate Data Assimilation Weather.
CESM/RACM/RASM Update May 15, Since Nov, 2011 ccsm4_0_racm28:racm29:racm30 – vic parallelization – vic netcdf files – vic coupling mods and “273.15”
5/5/08 1 (Some) Answers to the Challenges of Petascale Computing Rich Loft Director, Technology Development Computational and Information Systems Laboratory.
Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Report on March Crystal City Workshop to Identify Grand Challenges in Climate Change Science By its cochair- Robert Dickinson For the 5 Sept
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Mathematics and Computer Science & Environmental Research Divisions ARGONNE NATIONAL LABORATORY Regional Climate Simulation Analysis & Vizualization John.
John Dennis ENES Workshop on HPC for Climate Models 1.
CSEG Update Mariana Vertenstein CCSM Software Engineering Group Mariana Vertenstein CCSM Software Engineering Group.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
● Institution Contacts ● Institution Contacts Prof. Minghua Zhang, ITPA Prof. Minghua Zhang, ITPA Stony Brook University Stony Brook University
CESM/ESMF Progress Report Mariana Vertenstein NCAR Earth System Laboratory CESM Software Engineering Group (CSEG) NCAR is sponsored by the National Science.
PetaApps: Update on software engineering and performance J. Dennis M. Vertenstein N. Hearn.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Earth System Modeling Framework Status Cecelia DeLuca NOAA Cooperative Institute for Research in Environmental Sciences University of Colorado, Boulder.
Climate-Weather modeling studies Using a Prototype Global Cloud-System Resolving Model Zhi Liang (GFDL/DRC)
Petascale –LLNL Appro AMD: 9K processors [today] –TJ Watson Blue Gene/L: 40K processors [today] –NY Blue Gene/L: 32K processors –ORNL Cray XT3/4 : 44K.
IDC HPC USER FORUM Weather & Climate PANEL September 2009 Broomfield, CO Panel questions: 1 response per question Limit length to 1 slide.
Computer Organization & Assembly Language © by DR. M. Amer.
Climate Simulation for Climate Change Studies D.C. Bader 1, J. Hack 2, D. Randall 3 and W. Collins 2 1 Lawrence Livermore National Laboratory 2 National.
Adrianne Middleton National Center for Atmospheric Research Boulder, Colorado CAM T340- Jim Hack Running the Community Climate Simulation Model (CCSM)
“Very high resolution global ocean and Arctic ocean-ice models being developed for climate study” by Albert Semtner Extremely high resolution is required.
CCSM Portability and Performance, Software Engineering Challenges, and Future Targets Tony Craig National Center for Atmospheric Research Boulder, Colorado,
1 Scaling CCSM to a Petascale system John M. Dennis: June 22, 2006 John M. Dennis: June 22,
12/5/20151 CCSM4 - A Flexible New Infrastructure for Earth System Modeling Mariana Vertenstein NCAR CCSM Software Engineering Group.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
The evolution of climate modeling Kevin Hennessy on behalf of CSIRO & the Bureau of Meteorology Tuesday 30 th September 2003 Canberra Short course & Climate.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
Exascale climate modeling 24th International Conference on Parallel Architectures and Compilation Techniques October 18, 2015 Michael F. Wehner Lawrence.
February 25, 2008 The Emerging Front Range HPC Collaboratory Dr. Rich Loft: Director, Technology Development Computational.
On the Road to a Sequential CCSM Robert Jacob, Argonne National Laboratory Including work by: Mariana Vertenstein (NCAR), Ray Loy (ANL), Tony Craig (NCAR)
Running CESM An overview
Presented by The SciDAC CCSM Consortium Project John B. Drake Computer Science and Mathematics Division Computational Earth Sciences Group.
Welcome to the PRECIS training workshop
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Data Requirements for Climate and Carbon Research John Drake, Climate Dynamics Group Computer.
Presented by LCF Climate Science Computational End Station James B. White III (Trey) Scientific Computing National Center for Computational Sciences Oak.
Development of an Atmospheric Climate Model with Self-Adapting Grid and Physics Joyce E. Penner 1, Michael Herzog 2, Christiane Jablonowski 3, Bram van.
Presented by Ricky A. Kendall Scientific Computing and Workflows National Institute for Computational Sciences Applications National Institute for Computational.
The Community Climate System Model (CCSM): An Overview Jim Hurrell Director Climate and Global Dynamics Division Climate and Ecosystem.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
SciDAC CCSM Consortium: Software Engineering Update Patrick Worley Oak Ridge National Laboratory (On behalf of all the consorts) Software Engineering Working.
Overview of the CCSM CCSM Software Engineering Group June
Software Practices for a Performance Portable Climate System Model
High Resolution Ocean and Ice Models for Climate
CCSM3’s IPCC Simulations and Lessons Learned
Highlights from CCSM3’s Special Issue of the Journal of Climate
Department of Computer Science, University of Tennessee, Knoxville
Presentation transcript:

11/18/08 1 An Inconvenient Question: Are We Going to Get the Algorithms and Computing Technology We Need to Make Critical Climate Predictions in Time? Rich Loft Director, Technology Development Computational and Information Systems Laboratory National Center for Atmospheric Research

Main Points Nature of the climate system makes it a grand challenge computing problem. We are at a critical juncture: we need regional climate prediction capabilities! Computer clock/thread speeds are stalled: massive parallelism is the future of supercomputing. Our best algorithms, parallelization strategies and architectures are inadequate to the task. We need model acceleration improvements in all three areas if we are to meet the challenge. 11/18/08 2

Options for Application Acceleration Scalability –Eliminate bottlenecks –Find more parallelism –Load balancing algorithms Algorithmic Acceleration –Bigger Timesteps Semi-Lagrangian Transport Implicit or semi-implicit time integration – solvers –Fewer Points Adaptive Mesh Refinement methods Hardware Acceleration –More Threads CMP, GP-GPU’s –Faster threads device innovations (high-K) –Smarter threads Architecture - old tricks, new tricks… magic tricks –Vector units, GPU’s, FPGA’s 11/18/08 3

4 Viner (2002) A Very Grand Challenge: Coupled Models of the Earth System Typical Model Computation: - 15 minute time steps - 1 peta-flop per model year ~150 km There are 3.5 million timesteps in a century air column water column

11/18/08 5 Multicomponent Earth System Model Atmosphere Ocean Coupler Sea IceLand C/N Cycle Dyn. Veg. Ecosystem & BGC Gas chem. Prognostic Aerosols Upper Atm. Land Use Ice Sheets Software Challenges: Increasing Complexity Validation and Verification Understanding the Output Key concept: A flexible coupling framework is critical!

Climate Change Credit: Caspar Amman NCAR 11/18/08 6

7 oIPCC AR4: “Warming of the climate system is un- equivocal” … o…and it is “very likely” caused by human activities. oMost of the observed changes over the past 50 years are now simulated by climate models adding confidence to future projections. oModel Resolutions: O(100 km) IPCC AR

Climate Change Research Epochs Reproduce historical trends Investigate climate change Run IPCC Scenarios Assess regional impacts Simulate adaptation strategies Simulate geoengineering solns Before IPCC AR4 After Curiosity DrivenPolicy Driven 11/18/

11/18/08 9 ESSL - The Earth & Sun Systems Laboratory Where we want to go: The Exascale Earth System Model Vision Coupled Ocean-Land-Atmosphere Model ~1 km x ~1 km (cloud- resolving) 100 levels, whole atmosphere Unstructured, adaptive grids ~100 m 10 levels Landscape-resolving ~10 km x ~10 km (eddy- resolving) 100 levels Unstructured, adaptive grids Requirement: Computing power enhancement by as much as a factor of YIKES!

Compute Factors for ultra-high resolution Earth System Model 11/18/08 10 Spatial resolutionProvide regional details Model completenessAdd “new” science10 2 New parameterizationsUpgrade to “better” science 10 2 Run lengthLong-term implications10 2 Ensembles, scenariosRange of model variability 10 Total Compute Factor (courtesy of John Drake, ORNL)

Why run-length: global thermohaline circulation timescale: 3,000 years 11/18/08 11

11/18/08 12 Why resolution: Atmospheric convective (cloud) scales in the : O(1 km)

11/18/08 13 Why High Resolution in the Ocean? Ocean component of CCSM (Collins et al, 2006) Eddy-resolving POP (Maltrud & McClean,2005) 1˚0.1˚

11/18/08 14 High Resolution and the Land Surface

11/18/08 15 Performance Improvements are not coming fast enough! …suggests to improvement will take 40 years

ITRS Roadmap: feature size dropping 14%/year By 2050 reaches the size of an atom – oops! 11/18/08 16

11/18/08 17 National Security Agency - The power consumption of today's advanced computing systems is rapidly becoming the limiting factor with respect to improved/increased computational ability."

11/18/08 18 Chip Level Trends: Stagnant Clock Speed Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond) Chip density is continuing increase ~2x every 2 years –Clock speed is not –Number of cores are doubling instead There is little or no additional hidden parallelism (ILP) Parallelism must be exploited by software

11/18/08 19 Moore’s Law -> More’s Law: Speed-up through increasing parallelism How long can we double the number of cores per chip?

11/18/08 20 Dr. Henry Tufo and myself with “frost” (2005) Characteristics: 2048 Processors/5.7 TF PPC 440 (750 MHz) Two processors/node 512 MB memory per node 6 TB file system NCAR and University Colorado Partner to Experiment with Blue Gene/L

11/18/08 21 Status and immediate plans for high resolution Earth System Modeling

11/18/08 22 Current high resolution CCSM runs 0.25  ATM,LND  OCN,ICE [ATLAS/LLNL] –3280 processors –0.42 simulated years/day (SYPD) –187K CPU hours/year 0.50  ATM,LND  OCN,ICE [FRANKLIN/NERSC] –Current 5416 processors 1.31 SYPD 99K CPU hours/year –“Efficiency Goal 4932 processors 1.80 SYPD 66K CPU hours/year

11/18/0823 Current 0.5  CCSM “fuel efficient” configuration [franklin] 5416 processors 168 sec. OCN [np=3600] 120 sec. ATM [np=1664] 52 sec. CPL [np=384] 21 sec. LND [np=16] ICE [np=1800] 91 sec.

11/18/0824 Efficiency issues in current 0.5  CCSM configuration 5416 processors 168 sec. OCN [np=3600] 120 sec. ATM [np=1664] 52 sec. CPL [np=384] 21 sec. LND [np=16] ICE [np=1800] 91 sec. Use Space Filling Curves (SFC) in POP, reduce processor count by 13%.

11/18/08 25 Load Balancing: Partitioning with Space Filling Curves Partition for 3 processors

11/18/08 26 Space-filling Curve Partitioning for Ocean Model running on 8 Processors Key concept: no need to compute over land! Static Load Balancing…

11/18/08 27 Ocean Model 1/10 Degree performance Key concept: You need routine access to > 1k procs to discover true scaling behaviour!

11/18/0828 Efficiency issues in Current CCSM 0.5  configuration 5416 processors 168 sec. OCN [np=3600] 120 sec. ATM [np=1664] 52 sec. CPL [np=384] 21 sec. LND [np=16] ICE [np=1800] 91 sec. Use wSFC in CICE, reduce Execution time by 2x.

11/18/08 29 Static, Weighted Load Balancing Example: Sea Ice Model 1° on 20 processors Small high latitudes Large low latitudes Courtesy of John Dennis

11/18/0830 Efficiency issues in current 0.5  CCSM configuration: Coupler 5416 processors 168 sec. OCN [np=3600] 120 sec. ATM [np=1664] 52 sec. CPL [np=384] 21 sec. LND [np=16] ICE [np=1800] 91 sec. Unresolved scalability issues in Coupler – Options: Better interconnect, Nested grids, PGAS language paradigm

11/18/0831 Efficiency issues in current 0.5  CCSM configuration: atmospheric component 5416 processors 168 sec. OCN [np=3600] 120 sec. ATM [np=1664] 52 sec. CPL [np=384] 21 sec. LND [np=16] ICE [np=1800] 91 sec. Scalability limitation in 0.5° fv-CAM [MPI] – shift to hybrid OpenMP/MPI version

11/18/0832 Projected 0.5  CCSM “capability” configuration: 3.8 years/day processors 62 sec. OCN [np=6100] 62 sec. ATM [np=5200] 31 sec. CPL [np=384] 21 sec. LND [np=40] ICE [np=8120] 10 sec. Action: Run hybrid atmospheric model

11/18/0833 Projected 0.5  CCSM “capability” configuration - version 2: 3.8 years/day processors 62 sec. OCN [np=6100] 62 sec. ATM [np=5200] 31 sec. CPL [np=384] 21 sec. LND [np=40] ICE [np=8120] 10 sec. Action: Thread ice model

11/18/08 34 Scalable Geometry Choice: Cube-Sphere Sphere is decomposed into 6 identical regions using a central projection (Sadourny, 1972) with equiangular grid (Rancic et al., 1996). Avoids pole problems, quasi- uniform. Non-orthogonal curvilinear coordinate system with identical metric terms Ne=16 Cube Sphere Showing degree of non-uniformity

11/18/08 35 Scalable Numerical Method: High-Order Methods Algorithmic Advantages of High Order Methods –h-p element-based method on quadrilaterals (N e x N e ) –Exponential convergence in polynomial degree (N) Computational Advantages of High Order Methods –Naturally cache-blocked N x N computations –Nearest-neighbor communication between elements (explicit) –Well suited to parallel µprocessor systems

11/18/08 36 HOMME: Computational Mesh Elements: –A quadrilateral “patch” of N x N gridpoints –Gauss-Lobatto Grid –Typically N={4-8} Cube –N e = Elements on an edge –6 x Ne x Ne elements total

11/18/08 37 Partitioning a cube-sphere on 8 processors

11/18/08 38 Partitioning a cubed-sphere on 8 processors

11/18/08 39 Aqua-Planet CAM/HOMME Dycore Full CAM Physics/HOMME Dycore Parallel I/O library used for physics aerosol input and input data ( work COULD NOT have been done without Parallel IO) Work underway to couple to other CCSM components 5 years/day

11/18/0840 Projected 0.25  CCSM “capability” configuration - version 2: 4.0 years/day processors 60 sec. OCN [np=6000] 60 sec. HOMME ATM [np=24000] 47 sec. CPL [np=3840] 8 sec. LND [np=320] ICE [np=16240] 5 sec. Action: insert scalable atmospheric dycore

11/18/08 41 Using a bigger parallel machine can’t be the only answer Progress in the Top 500 list is not fast enough Amdahl’s Law is formidable opponent Dynamical timestep goes like N -1 –Merciless effect of Courant limit –The cost of dynamics relative to physics increases as N –e.g. if dynamics takes 20% at 25 km it will take 86% of the time at 1 km Traditional parallelization of horizontal leaves N 2 per thread cost (vertical x horizontal) –Must inevitably slow down with stalled thread speeds

Options for Application Acceleration Scalability –Eliminate bottlenecks –Find more parallelism –Load balancing algorithms Algorithmic Acceleration –Bigger Timesteps Semi-Lagrangian Transport Implicit or semi-implicit time integration – solvers –Fewer Points Adaptive Mesh Refinement methods Hardware Acceleration –More Threads CMP, GP-GPU’s –Faster threads device innovations (high-K) –Smarter threads Architecture - old tricks, new tricks… magic tricks –Vector units, GPU’s, FPGA’s 11/18/08 42

11/18/08 Accelerator Research Graphics Cards – Nvidia 9800/Cuda –Measured 109x on WRF microphysics on 9800GX2 FPGA – Xilinx (data flow model) –21.7x simulated on sw-radiation code IBM Cell Processor - 8 cores Intel Larrabee 43

11/18/08 44 DG+NH+AMR Curvilinear elements Overhead of parallel AMR at each time- step: less than 1% Idea based on Fischer, Kruse, Loth (02) Courtesy of Amik St. Cyr

11/18/08 45 SLIM ocean model Louvain la Neuve University DG, implicit, AMR unstructured To be coupled to prototype unstructured ATM model (Courtesy of J-F Remacle LNU)

NCAR Summer Internships in Parallel Computational Science (SIParCS) Open to: –Upper division undergrads –Graduate students In Disciplines such as: –CS, Software Engineering –Applied Math, Statistics –ES Science Support: –Travel, Housing, Per diem –10 weeks salary Number of interns selected: –7 in 2007 –11 in

11/18/08 47 Meanwhile - the clock is ticking

11/18/08 48 The Size of the Interdisciplinary/Interagency Team Working on Climate Scalability Contributors: D. Bailey (NCAR) F. Bryan (NCAR) T. Craig (NCAR) A. St. Cyr (NCAR) J. Dennis (NCAR) J. Edwards (IBM) B. Fox-Kemper (MIT,CU) E. Hunke (LANL) B. Kadlec (CU) D. Ivanova (LLNL) E. Jedlicka (ANL) E. Jessup (CU) R. Jacob (ANL) P. Jones (LANL) S. Peacock (NCAR) K. Lindsay (NCAR) W. Lipscomb (LANL) R. Loy (ANL) J. Michalakes (NCAR) A. Mirin (LLNL) M. Maltrud (LANL) J. McClean (LLNL) R. Nair (NCAR) M. Norman (NCSU) T. Qian (NCAR) M. Taylor (SNL) H. Tufo (NCAR) M. Vertenstein (NCAR) P. Worley (ORNL) M. Zhang (SUNYSB) Funding: –DOE-BER CCPP Program Grant DE-FC03-97ER62402 DE-PS02-07ER07-06 DE-FC02-07ER64340 B&R KP –DOE-ASCR B&R KJ –NSF Cooperative Grant NSF01 –NSF PetaApps Award Computer Time: –Blue Gene/L time: NSF MRI Grant NCAR University of Colorado IBM (SUR) program BGW Consortium Days IBM research (Watson) LLNL Stony Brook & BNL –CRAY XT3/4 time: ORNL Sandia

11/18/08 49 Thanks! Any Questions?

11/18/08 50 Q. If you had a petascale computer what would you do with it? A. Use it as a prototype of an exascale computer.