SC13 19-20 November 2013 Ben Cash, COLA From Athena to Minerva: COLA’s Experience in the NCAR Advanced Scientific Discovery Program Animation courtesy.

Slides:



Advertisements
Similar presentations
Project Minerva Workshop COLA at George Mason University September 2013.
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
Parallel Research at Illinois Parallel Everywhere
From Athena to Minerva: A Brief Overview Ben Cash Minerva Project Team, Minerva Workshop, GMU/COLA, September 16, 2013.
Resolution and Athena – some introductory comments Tim Palmer ECMWF and Oxford.
Climatology and Climate Change in Athena Simulations Project Athena Team ECMWF, June 7, 2010.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
VO Sandpit, November 2009 NERC Big Data And what’s in it for NCEO? June 2014 Victoria Bennett CEDA (Centre for Environmental Data Archival)
1 PSC update July 3, 2008 Ralph Roskies, Scientific Director Pittsburgh Supercomputing Center
B1 -Biogeochemical ANL - Townhall V. Rao Kotamarthi.
28 August 2006Steinhausen meeting Hamburg On the integration of weather and climate prediction Lennart Bengtsson.
Climate and Food Security Thank you to the Yaqui Valley and Indonesian Food Security Teams at Stanford 1.Seasonal Climate Forecasts 2.Natural cycles of.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Jordan G. Powers Mesoscale and Microscale Meteorology Division NCAR Earth System Laboratory National Center for Atmospheric Research Space Weather Workshop.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Possible Strategic Capability Projects High resolution coupled simulation: (ASD) – 1/8 o CAM, 1/10 o POP, 20 years (initially) – SCIDAC proposal: Small,
National Weather Service National Weather Service Central Computer System Backup System Brig. Gen. David L. Johnson, USAF (Ret.) National Oceanic and Atmospheric.
Larry Marx and the Project Athena Team. Outline Project Athena Resources Models and Machine Usage Experiments Running Models Initial and Boundary Data.
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
The Climate Prediction Project Global Climate Information for Regional Adaptation and Decision-Making in the 21 st Century.
Climate Forecasting Unit Second Ph’d training talk Prediction of climate extreme events at seasonal and decadal time scale Aida Pintó Biescas.
Climate Forecasting Unit Prediction of climate extreme events at seasonal and decadal time scale Aida Pintó Biescas.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Willem A. Landman Asmerom Beraki Francois Engelbrecht Stephanie Landman Supercomputing for weather and climate modelling: convenience or necessity.
High Resolution Climate Modelling in NERC (and the Met Office) Len Shaffrey, University of Reading Thanks to: Pier Luigi Vidale, Jane Strachan, Dave Stevens,
Tropical Cyclones and Climate Change: An Assessment WMO Expert Team on Climate Change Impacts on Tropical Cyclones February 2010 World Weather Research.
Preliminary Results of Global Climate Simulations With a High- Resolution Atmospheric Model P. B. Duffy, B. Govindasamy, J. Milovich, K. Taylor, S. Thompson,
CDC Cover. NOAA Lab roles in CCSP Strategic Plan for the U.S. Climate Change Science Program: Research Elements Element 3. Atmospheric Composition Aeronomy.
Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)
WORKSHOP ON SHORT-RANGE ENSEMBLE PREDICTION USING LIMITED-AREA MODELS Instituto National de Meteorologia, Madrid, 3-4 October 2002 Limited-Area Ensemble.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
The New Zealand Institute for Plant & Food Research Limited Use of Cloud computing in impact assessment of climate change Kwang Soo Kim and Doug MacKenzie.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Migration to Rose and High Resolution Modelling Jean-Christophe Rioual, CRUM, Met Office 09/04/2015.
© Crown copyright Met Office Met Office activities related to needs of humanitarian agencies Anca Brookshaw.
PARTNERING WITH THE NATIONAL SCIENCE FOUNDATION Michael C. Morgan Director, Division of Atmospheric and Geospace Sciences National Science Foundation.
IDC HPC USER FORUM Weather & Climate PANEL September 2009 Broomfield, CO Panel questions: 1 response per question Limit length to 1 slide.
Course Evaluation Closes June 8th.
Scientific Advisory Committee – September 2011COLA Information Systems COLA’s Information Systems 2011.
Mechanisms of drought in present and future climate Gerald A. Meehl and Aixue Hu.
The evolution of climate modeling Kevin Hennessy on behalf of CSIRO & the Bureau of Meteorology Tuesday 30 th September 2003 Canberra Short course & Climate.
PSC’s CRAY-XT3 Preparation and Installation Timeline.
NOAA Intra-Seasonal to Interannual Prediction (ISIP) and Climate Prediction Program for Americas (CPPA) Jin Huang NOAA Office of Global Programs November.
1 Proposal for a Climate-Weather Hydromet Test Bed “Where America’s Climate and Weather Services Begin” Louis W. Uccellini Director, NCEP NAME Forecaster.
NICS RP Update TeraGrid Round Table March 10, 2011 Ryan Braby NICS HPC Operations Group Lead.
Exscale – when will it happen? William Kramer National Center for Supercomputing Applications.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Climate Prediction Center: Challenges and Needs Jon Gottschalck and Arun Kumar with contributions from Dave DeWitt and Mike Halpert NCEP Production Review.
Judith Curry James Belanger Mark Jelinek Violeta Toma Peter Webster 1
Support to scientific research on seasonal-to-decadal climate and air quality modelling Pierre-Antoine Bretonnière Francesco Benincasa IC3-BSC - Spain.
Tackling I/O Issues 1 David Race 16 March 2010.
Climate Dimensions of the Water Cycle Judith Curry.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
AOLI 2015 The NMME Experience: A Research Community Archive Lessons learned from Climate Model data archive and use AOLI Meeting 2015 Eric Nienhouse NCAR.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Climate Mission Outcome A predictive understanding of the global climate system on time scales of weeks to decades with quantified uncertainties sufficient.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.
Cécile Hannay, Julio Bacmeister, Rich Neale, John Truesdale, Kevin Reed, and Andrew Gettelman. National Center for Atmospheric Research, Boulder EGU Meeting,
Jim Kinter David Straus, Erik Swenson, Richard Cirone
Computational Requirements
Course Evaluation Now online You should have gotten an with link.
Shuhua Li and Andrew W. Robertson
Course Evaluation Now online You should have gotten an with link.
Presentation transcript:

SC November 2013 Ben Cash, COLA From Athena to Minerva: COLA’s Experience in the NCAR Advanced Scientific Discovery Program Animation courtesy of CIMSS

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Why does climate research need HPC and Big Data? Societal demand for information about weather-in-climate and climate impacts on weather on regional scales Seamless days-to-decades prediction & unified weather/climate modeling Multi-model ensembles and Earth system prediction Requirements for data assimilation

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Balancing Demands on Resources Duration and/or Ensemble size Resolution Data and HPC Resources Complexity 1/12 0 Data Assimilation

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Athena: An International, Dedicated High- End Computing Project to Revolutionize Climate Modeling (Dedicated XT4 at NICS) Collaborating Groups: COLA, ECMWF, JAMSTEC, NICS, Cray Project Minerva: Toward Seamless, High- Resolution Prediction at Intra-seasonal and Longer Time Scales (Dedicated Advanced Scientific Discovery resources on NCAR Yellowstone) Collaborating Groups: COLA, ECMWF, U. Oxford, NCAR COLA HPC & Big Data Projects

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash NICS Resources for Project Athena The Cray XT4 – Athena – the first NICS machine in 2008 –4512 nodes: AMD 2.3 GHz quad-core CPUs + 4 GB RAM –18,048 cores TB aggregate memory –165 TFLOPS peak performance –Dedicated to this project during October 2009 – March 2010  72 million core-hours! Other resources made available to project: –85 TB Lustre file system –258 TB auxilliary Lustre file system (called Nakji) –Verne: 16-core 128-GB system (data analysis) during production phase ( ) –Nautilus: SGI UV with 1024 Nehelem EX cores, 8 GPUs, 4 TB memory, 960 TB GPFS disk (data analysis) in Many thanks to NICS for resources and sustained support!

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Regional Climate Change – Beyond CMIP3 Models’ Ability?

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Europe Growing Season (Apr-Oct) Precipitation Change: 20 th C to 21 st C T159 (125-km)T1279 (16-km) “Time-slice” runs of the ECMWF IFS global atmospheric model with observed SST for the 20 th century and CMIP3 projections of SST for the 21 st century at two different model resolutions The continental-scale pattern of precipitation change in April – October (growing season) associated with global warming is similar, but the regional details are quite different, particularly in southern Europe.

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash 4X probability of extreme summer drought in Great Plains, Florida, Yucutan, and parts of Eurasia Future Change in Extreme Summer Drought Late 20 th C to Late 21 st C 10 th Percentile Drought: Number of years out of 47 in a simulation of future climate ( ) for which the June-August mean rainfall was less than the 5 th driest year of 47 in a simulation of current climate ( ).

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Clouds and Precipitation: Summer 2009 (NICAM 7km)

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash

Athena Limitations Athena was a tremendous success, generating tremendous amount of data and large number of papers for a six month project. BUT… Limited number of realizations Athena runs generally consisted of a single realization No way to assess robustness of results Uncoupled models Multiple, dissimilar models Resources were split between IFS and NICAM Differences in performance meant very different experiments performed – difficult to directly compare results Storage limitations and post-processing demands limited what could be saved for each model

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva Explore the impact of increased atmospheric resolution on model fidelity and prediction skill in a coupled, seamless framework by using a state-of- the-art coupled operational long-range prediction system to systematically evaluate the prediction skill and reliability of a robust set of hindcast ensembles at low, medium and high atmospheric resolutions NCAR Advanced Scientific Discovery Program to inaugurate Yellowstone Allocated 21 M core-hours on Yellowstone Used ~28 M core-hours Many thanks to NCAR for resources & sustained support!

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Background  NCAR Yellowstone  In 2012, NCAR-Wyoming Supercomputing Center (NWSC) debuted Yellowstone, the successor to Bluefire  IBM iDataplex, 72,280 cores, 1.5 petaflops peak performance  #17 on June 2013 Top500 list  10.7 PB disk capability  High capacity HPSS data archive  Dedicated large memory and floating point accelerator clusters (Geyser and Caldera) Accelerated Scientific Discovery (ASD) program  NCAR accepted a small number proposals for early access to Yellowstone, as it has done in the past with new hardware installs  3 months of near-dedicated access before being opened to general user community  Opportunity  Continue successful Athena collaboration between COLA and ECMWF, and to address limitations in the Athena experiments

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Timeline  March 2012 – ASD proposal submitted  31 million core hours requested  April 2012 – Proposal approved  21 million core hours approved  October 5, 2012  First login to Yellowstone – bcash = user #1 (Ben Cash)  November 21 – Dec 1, 2012  Minerva production code finalized  Yellowstone system instability due to “cable cancer”  Minerva’s low core count jobs avoid problem – user accounts not charged for jobs at this time  Minerva benefits by using ~7 million free core hours  Minerva jobs occupy as many as cores (!)  Minerva sets record: “Most IFS FLOPs in 24 hours”  December 1 – project end  Failure rate falls to 1%, then to 0.5%; production computing tailed off in March 2013  Data management becomes by far the greatest challenge  Project Minerva consumption: ~28 million total  800+ TB generated Many thanks to NCAR for resources & sustained support!

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Minerva Catalog ResolutionStart DatesEnsemblesLengthPeriod of Integration T319May months (total) T639May months (total) T639May 1, Nov 1 51 (total)5 and 4 months, respectively Minerva Catalog: Extended Experiments ResolutionStart DatesEnsemblesLengthPeriod of Integration T319May 1, Nov months T639May 1, Nov months T1279May 1157 months

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Selected Results Simulated precipitation Tropical cyclones SST – ENSO

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Precipitation: Summer 2010 (IFS 16km)

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash

Minerva: Coupled Prediction of Tropical Cyclones June 2005 hurricane off west coast of Mexico: precipitation in mm/day every 3 hours (T1279 coupled forecast initialized on 1 May 2005) The predicted maximum rainfall rate reaches 725 mm/day (30 mm/hr) Based TRMM global TC rainfall observations ( ), the frequency of rainfall rates exceeding 30 mm/hr is roughly 1% Courtesy Julia Manganello, COLA

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Minerva vs. Athena – TC Frequency (NH; JJASON; T1279) 9-Year Mean ( ) OBS49.9 Athena59.1 Minerva48.9 (all members) Athena Minerva IBTrACS Courtesy Julia Manganello, COLA

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash JulSepNov

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Lessons Learned More evidence that dedicated usage of a relatively big supercomputer greatly enhances productivity Experience with ASD period demonstrates tremendous progress can be made with dedicated access Dedicated computing campaigns provide demonstrably more efficient utilization Noticeable decrease in efficiency once scheduling multiple jobs of multiple sizes was turned over to a scheduler In-depth exploration Data saved at much higher frequency Multiple ensemble members, increased vertical levels, etc.

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Lessons Learned Dedicated simulation projects like Athena and Minerva generate enormous amounts of data to be archived, analyzed and managed. Data management is a big challenge. Other than machine instability, data management and post- processing were solely responsible for halts in production.

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Data Volumes Project Athena: Total data volume 1.2 PB (~500 TB unique)* Spinning disk 40 TB at COLA 0 TB at NICS (was 340 TB) * no home after April 2014 Project Minerva: Total data volume 1.0 PB (~800 TB unique) Spinning disk 100 TB at COLA 500TB at NCAR (for now) That much data breaks everything: H/W, systems management policies, networks, apps S/W, tools, and shared archive space NB: Generating 800 TB using 28 M core-hours took ~3 months; this would take about a week using a comparable fraction of a system with 1M cores!

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash HPC capabilityData analysis capacity Automation/abstractionHuman control Data-driven developmentScience-driven development Small, portable codeEnd-to-end tools Tight, local control of dataDistributed data Challenges and Tensions Making effective use of large allocations – takes a village Exaflood of data Resolution vs. parameterization Sampling (e.g. extreme events) TENSIONS “Having more data won’t substitute for thinking hard, recognizing anomalies, and exploring deep truths.” Samuel Arbeson, Wash. Post (18 Aug. 2013)

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Athena and Minerva: Harbingers of the Exaflood Even on a system designed for big projects like these, HPC production capabilities overwhelm storage and processing, a particularly acute problem for ‘rapid burn’ projects such as Athena and Minerva Familiar diagnostics are hard to do at very high resolution Can’t “just recompute” – years of data analysis and mining after production phase Have we wrung all the “science” out of the data sets, given that we can only keep a small percentage of the total data volume on spinning disk? How can we tell? Must move from ad hoc problem solving  systematic, repeatable workflows (e.g. incorporate post-processing and data management into production stream) (transform Noah’s Ark  a Shipping Industry) “We need exaflood insurance.” - Jennifer Adams

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash ANY QUESTIONS?

COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash