C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence.

Slides:



Advertisements
Similar presentations
SPEC OMP Benchmark Suite H. Saito, G. Gaertner, W. Jones, R. Eigenmann, H. Iwashita, R. Lieberman, M. van Waveren, and B. Whitney SPEC High-Performance.
Advertisements

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Application of Generalized Extreme Value theory to coupled general circulation models Michael.
Thoughts on Climate Theory Based on collaborations with Wenyu Zhou, Dargan Frierson, Sarah Kang, Erica Staehling, Gang Chen, Steve Garner, Ming Zhao Isaac.
NCAS-Climate: Carries out research into climate change and variability, motivated by the need to understand how the climate system will evolve over the.
Weather Research & Forecasting: A General Overview
A High-Order Finite-Volume Scheme for the Dynamical Core of Weather and Climate Models Christiane Jablonowski and Paul A. Ullrich, University of Michigan,
Geophysical Fluid Dynamics Laboratory Review June 30 - July 2, 2009 Geophysical Fluid Dynamics Laboratory Review June 30 - July 2, 2009.
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
LARGE EDDY SIMULATION Chin-Hoh Moeng NCAR.
Lecture 21Comp. Arch. Fall 2006 Chapter 8: I/O Systems Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson.
1 NWS-COMET Hydrometeorology Course 15 – 30 June 1999 Meteorology Primer.
Earth System Curator Spanning the Gap Between Models and Datasets.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N You need how many runs?! Michael F. Wehner Lawrence Berkeley National Laboratory
Ultra-Efficient Exascale Scientific Computing Lenny Oliker, John Shalf, Michael Wehner And other LBNL staff.
Discretizing the Sphere for Multi-Scale Air Quality Simulations using Variable-Resolution Finite-Volume Techniques Martin J. Otte U.S. EPA Robert Walko.
FY 2004 Allocations Francesca Verdier NERSC User Services NERSC User Group Meeting 05/29/03.
Simulating Supercell Storms and Tornaodes in Unprecedented Detail and Accuracy Bob Wilhelmson Professor, Atmospheric Sciences Chief Scientist,
 Understanding the Sources of Inefficiency in General-Purpose Chips.
A Cloud Resolving Model with an Adaptive Vertical Grid Roger Marchand and Thomas Ackerman - University of Washington, Joint Institute for the Study of.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Global Climate Modeling Research John Drake Computational Climate Dynamics Group Computer.
Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000 Frederick Wong, Rich Martin, Remzi Arpaci-Dusseau, David Wu, and.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
Modeling and Predicting Climate Change Michael Wehner Scientific Computing Group Computational Research Division
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Modeling and Predicting Climate Change Michael Wehner Scientific Computing Group Computational.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Numerical weather prediction: current state and perspectives M.A.Tolstykh Institute of Numerical Mathematics RAS, and Hydrometcentre of Russia.
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Modeling and Predicting Climate Change Michael Wehner Scientific Computing Group Computational.
1 CCOS Seasonal Modeling: The Computing Environment S.Tonse, N.J.Brown & R. Harley Lawrence Berkeley National Laboratory University Of California at Berkeley.
Kernel and Application Code Performance for a Spectral Atmospheric Global Circulation Model on the Cray T3E and IBM SP Patrick H. Worley Computer Science.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
NCAS/APRIL Meeting on Urban Air Quality Modelling Dispersion modelling at Imperial College London Professor Helen ApSimon and Dr Roy Colvile Page 1/N ©
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
ESMF Application Status GMAO Seasonal Forecast NCAR/LANL CCSM NCEP Forecast GFDL FMS Suite MITgcm NCEP/GMAO Analysis Climate Data Assimilation.
Scalable Reconfigurable Interconnects Ali Pinar Lawrence Berkeley National Laboratory joint work with Shoaib Kamil, Lenny Oliker, and John Shalf CSCAPES.
What is a Climate Model?.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Assessing Summertime Urban Energy Consumption in a Semiarid Environment: WRF(BEP+BEM) F. Salamanca 1, M. Georgescu 1,2, A. Mahalov 1, M. Moustaoui 1, M.
Geophysical Fluid Dynamics Laboratory Review June 30 - July 2, 2009 Geophysical Fluid Dynamics Laboratory Review June 30 - July 2, 2009.
Comparison of Different Approaches NCAR Earth System Laboratory National Center for Atmospheric Research NCAR is Sponsored by NSF and this work is partially.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Parallel I/O Performance: From Events to Ensembles Andrew Uselton National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
High performance parallel computing of climate models towards the Earth Simulator --- computing science activities at CRIEPI --- Yoshikatsu Yoshida and.
KoreaCAM-EULAG February 2008 Implementation of a Non-Hydrostatic, Adaptive-Grid Dynamics Core in the NCAR Community Atmospheric Model William J. Gutowski,
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
Exascale climate modeling 24th International Conference on Parallel Architectures and Compilation Techniques October 18, 2015 Michael F. Wehner Lawrence.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
LWG, Destin (Fl) 27/1/2009 Observation representativeness error ECMWF model spectra Application to ADM sampling mode and Joint-OSSE.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Development of an Atmospheric Climate Model with Self-Adapting Grid and Physics Joyce E. Penner 1, Michael Herzog 2, Christiane Jablonowski 3, Bram van.
IDC HPC USER FORUM Weather & Climate PANEL September 2009 Broomfield, CO Panel questions: 1 response per question Limit length to 1 slide.
Presented by Ricky A. Kendall Scientific Computing and Workflows National Institute for Computational Sciences Applications National Institute for Computational.
Performance Comparison of Winterhawk I and Winterhawk II Systems Patrick H. Worley Computer Science and Mathematics Division Oak Ridge National Laboratory.
Mirin – AMWG 2006 – Slide 1 Coupled Finite-Volume Simulations at One-Degree Resolution Art Mirin and Govindasamy Bala Lawrence Livermore National Laboratory.
Overview of the CCSM CCSM Software Engineering Group June
Overview of Downscaling
National Center for Atmospheric Research
What is a Climate Model?.
What is a Climate Model?.
COMP60621 Designing for Parallelism
Parallel Computing and Parallel Computers
GFDL-NCAR/CCSM collaborations
Department of Computer Science, University of Tennessee, Knoxville
Modeling and Predicting Climate Change
Parallel Programming in C with MPI and OpenMP
Potential for parallel computers/parallel programming
Presentation transcript:

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Scaling of the Community Atmospheric Model to ultrahigh resolution Michael F. Wehner Lawrence Berkeley National Laboratory with Pat Worley (ORNL), Art Mirin (LLNL) Lenny Oliker (LBNL), John Shalf (LBNL)

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Motivations  First meeting of the WCRP Modeling Panel (WMP)  Convened at the UK MetOffice October, 2005 by Shukla  Discussion focused on benefits and costs of climate and weather models approaching 1km in horizontal resolution  Eventual white paper by Shukla and Shapiro for the WMO JSC  “Counting the Clouds”, A presentation by Dave Randall (CSU) to DOE SciDAC (June 2005)  Dave presents a compelling argument for global atmospheric models that resolve cloud systems rather than parameterize them.  Presentation is on the web at

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N fvCAM  NCAR Community Atmospheric Model version 3.1  Finite Volume hydrostatic dynamics (Lin-Rood)  Parameterized physics is the same as the spectral version  Our previous studies focus on the performance of the fvCAM with a 0.5 o X0.625 o X28L mesh on a wide variety of platforms (See Pat Worley’s talk this afternoon)  In the present discussion, we consider the scaling behavior of this model over a range of existing mesh configurations and extrapolate to ultra-high horizontal resolution.

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations count  Exploit three existing horizontal resolutions to establish the scaling behavior of the number of operations per fixed simulation period.  Existing resolutions (all 28 vertical levels)  “B” 2 o X2.5 o  “C” 1 o X1.25 o  “D” 0.5 o x0.625 o  Define:  m = # of longitudes, n = # of latitudes

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Scaling)  Parameterized physics  Time step can remain constant Ops = m * n  Dynamics  Time step determined by the Courant condition Ops = m * n * n  Filtering  Allows violation of an overly restrictive Courant condition near the poles Ops = m * log(m) * n * n

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Physics)

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (dynamics)

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Operations Count (Filters)

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Sustained computation rate requirements A reasonable metric in climate modeling is that the model must run 1000 times faster than real time. Millenium scale control runs complete in a year. Century scale transient runs complete in a month.

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Can this code scale to these speeds?  Domain decomposition strategies  Np = number of subdomains, Ng = number of grid points  Existing strategy is 1D in the horizontal  A better strategy is 2D in the horizontal  Note: fvCAM also uses a vertical decomposition as well as OpenMP parallelism to increase utilization of processors.

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Processor scaling  The performance data from fvCAM fits the first model well but tells us little about future technologies.  A practical constraint is that the number of subdomains is limited to be less than or equal to the number of horizontal cells.  At three cells across per subdomain, complete communication of the model’s data is required.  This constraint can provide an estimate of the maximum number of subdomains (  processors) as well as the minimum processor performance required to achieve the 1000X real time metric (in the absence of communication costs).

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Maximum number of horizontal subdomains ,123,366

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Minimum processor speed to achieve 1000X real time Assume no vertical decomposition and no OpenMP

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Total memory requirements

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Memory scales slower than processor speed due to Courant condition.

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Strawman 1km climate computer  “I” mesh at 1000X real time .015 o X.02 o X100L  ~10 Petaflops sustained  ~100 Terabytes total memory  ~2 million horizontal subdomains  ~10 vertical domains  ~20 million processors at 500Mflops each sustained including communications costs. 5 MB memory per processor ~20,000 nearest neighbor send-receive pairs per subdomain per simulated hour of ~10KB each

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Conclusions  fvCAM could probably be scaled up to a 1.5km mesh  Dynamics would have to be changed to fully non-hydrostatic  The scaling of the operations count is superlinear with horizontal resolution because of the Courant condition.  Surprisingly, filtering does not dominate the calculation. Physics cost is negligible.  One dimensional horizontal domain decomposition strategy will likely not work. Limits on processor number and performance are too severe.  Two dimensional horizontal domain decomposition strategy would be favorable but requires a code rewrite.  Its not as crazy as it sounds.