Download presentation
Presentation is loading. Please wait.
Published byMaximillian Bryan Modified over 9 years ago
1
SC13 19-20 November 2013 Ben Cash, COLA From Athena to Minerva: COLA’s Experience in the NCAR Advanced Scientific Discovery Program Animation courtesy of CIMSS
2
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Why does climate research need HPC and Big Data? Societal demand for information about weather-in-climate and climate impacts on weather on regional scales Seamless days-to-decades prediction & unified weather/climate modeling Multi-model ensembles and Earth system prediction Requirements for data assimilation
3
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Balancing Demands on Resources Duration and/or Ensemble size Resolution Data and HPC Resources Complexity 1/12 0 Data Assimilation
4
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Athena: An International, Dedicated High- End Computing Project to Revolutionize Climate Modeling (Dedicated XT4 at NICS) Collaborating Groups: COLA, ECMWF, JAMSTEC, NICS, Cray Project Minerva: Toward Seamless, High- Resolution Prediction at Intra-seasonal and Longer Time Scales (Dedicated Advanced Scientific Discovery resources on NCAR Yellowstone) Collaborating Groups: COLA, ECMWF, U. Oxford, NCAR COLA HPC & Big Data Projects
5
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash NICS Resources for Project Athena The Cray XT4 – Athena – the first NICS machine in 2008 –4512 nodes: AMD 2.3 GHz quad-core CPUs + 4 GB RAM –18,048 cores + 17.6 TB aggregate memory –165 TFLOPS peak performance –Dedicated to this project during October 2009 – March 2010 72 million core-hours! Other resources made available to project: –85 TB Lustre file system –258 TB auxilliary Lustre file system (called Nakji) –Verne: 16-core 128-GB system (data analysis) during production phase (2009-2010) –Nautilus: SGI UV with 1024 Nehelem EX cores, 8 GPUs, 4 TB memory, 960 TB GPFS disk (data analysis) in 2010-11 Many thanks to NICS for resources and sustained support!
6
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Regional Climate Change – Beyond CMIP3 Models’ Ability?
7
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Europe Growing Season (Apr-Oct) Precipitation Change: 20 th C to 21 st C T159 (125-km)T1279 (16-km) “Time-slice” runs of the ECMWF IFS global atmospheric model with observed SST for the 20 th century and CMIP3 projections of SST for the 21 st century at two different model resolutions The continental-scale pattern of precipitation change in April – October (growing season) associated with global warming is similar, but the regional details are quite different, particularly in southern Europe.
8
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash 4X probability of extreme summer drought in Great Plains, Florida, Yucutan, and parts of Eurasia Future Change in Extreme Summer Drought Late 20 th C to Late 21 st C 10 th Percentile Drought: Number of years out of 47 in a simulation of future climate (2071-2117) for which the June-August mean rainfall was less than the 5 th driest year of 47 in a simulation of current climate (1961-2007).
9
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Clouds and Precipitation: Summer 2009 (NICAM 7km)
10
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash
11
Athena Limitations Athena was a tremendous success, generating tremendous amount of data and large number of papers for a six month project. BUT… Limited number of realizations Athena runs generally consisted of a single realization No way to assess robustness of results Uncoupled models Multiple, dissimilar models Resources were split between IFS and NICAM Differences in performance meant very different experiments performed – difficult to directly compare results Storage limitations and post-processing demands limited what could be saved for each model
12
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva Explore the impact of increased atmospheric resolution on model fidelity and prediction skill in a coupled, seamless framework by using a state-of- the-art coupled operational long-range prediction system to systematically evaluate the prediction skill and reliability of a robust set of hindcast ensembles at low, medium and high atmospheric resolutions NCAR Advanced Scientific Discovery Program to inaugurate Yellowstone Allocated 21 M core-hours on Yellowstone Used ~28 M core-hours Many thanks to NCAR for resources & sustained support!
13
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Background NCAR Yellowstone In 2012, NCAR-Wyoming Supercomputing Center (NWSC) debuted Yellowstone, the successor to Bluefire IBM iDataplex, 72,280 cores, 1.5 petaflops peak performance #17 on June 2013 Top500 list 10.7 PB disk capability High capacity HPSS data archive Dedicated large memory and floating point accelerator clusters (Geyser and Caldera) Accelerated Scientific Discovery (ASD) program NCAR accepted a small number proposals for early access to Yellowstone, as it has done in the past with new hardware installs 3 months of near-dedicated access before being opened to general user community Opportunity Continue successful Athena collaboration between COLA and ECMWF, and to address limitations in the Athena experiments
14
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Timeline March 2012 – ASD proposal submitted 31 million core hours requested April 2012 – Proposal approved 21 million core hours approved October 5, 2012 First login to Yellowstone – bcash = user #1 (Ben Cash) November 21 – Dec 1, 2012 Minerva production code finalized Yellowstone system instability due to “cable cancer” Minerva’s low core count jobs avoid problem – user accounts not charged for jobs at this time Minerva benefits by using ~7 million free core hours Minerva jobs occupy as many as 61000 cores (!) Minerva sets record: “Most IFS FLOPs in 24 hours” December 1 – project end Failure rate falls to 1%, then to 0.5%; production computing tailed off in March 2013 Data management becomes by far the greatest challenge Project Minerva consumption: ~28 million total 800+ TB generated Many thanks to NCAR for resources & sustained support!
15
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Minerva Catalog ResolutionStart DatesEnsemblesLengthPeriod of Integration T319May 11524 months (total) 1980-2011 T639May 11524 months (total) 1980-2011 T639May 1, Nov 1 51 (total)5 and 4 months, respectively 2000-2011 Minerva Catalog: Extended Experiments ResolutionStart DatesEnsemblesLengthPeriod of Integration T319May 1, Nov 1 517 months1980-2011 T639May 1, Nov 1 157 months1980-2011 T1279May 1157 months2000-2011
16
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Selected Results Simulated precipitation Tropical cyclones SST – ENSO
17
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Precipitation: Summer 2010 (IFS 16km)
18
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash
19
Minerva: Coupled Prediction of Tropical Cyclones 11-12 June 2005 hurricane off west coast of Mexico: precipitation in mm/day every 3 hours (T1279 coupled forecast initialized on 1 May 2005) The predicted maximum rainfall rate reaches 725 mm/day (30 mm/hr) Based TRMM global TC rainfall observations (1998-2000), the frequency of rainfall rates exceeding 30 mm/hr is roughly 1% Courtesy Julia Manganello, COLA
20
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Minerva vs. Athena – TC Frequency (NH; JJASON; T1279) 9-Year Mean (2000-2008) OBS49.9 Athena59.1 Minerva48.9 (all members) Athena Minerva IBTrACS Courtesy Julia Manganello, COLA
21
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash JulSepNov
22
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Lessons Learned More evidence that dedicated usage of a relatively big supercomputer greatly enhances productivity Experience with ASD period demonstrates tremendous progress can be made with dedicated access Dedicated computing campaigns provide demonstrably more efficient utilization Noticeable decrease in efficiency once scheduling multiple jobs of multiple sizes was turned over to a scheduler In-depth exploration Data saved at much higher frequency Multiple ensemble members, increased vertical levels, etc.
23
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Project Minerva: Lessons Learned Dedicated simulation projects like Athena and Minerva generate enormous amounts of data to be archived, analyzed and managed. Data management is a big challenge. Other than machine instability, data management and post- processing were solely responsible for halts in production.
24
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Data Volumes Project Athena: Total data volume 1.2 PB (~500 TB unique)* Spinning disk 40 TB at COLA 0 TB at NICS (was 340 TB) * no home after April 2014 Project Minerva: Total data volume 1.0 PB (~800 TB unique) Spinning disk 100 TB at COLA 500TB at NCAR (for now) That much data breaks everything: H/W, systems management policies, networks, apps S/W, tools, and shared archive space NB: Generating 800 TB using 28 M core-hours took ~3 months; this would take about a week using a comparable fraction of a system with 1M cores!
25
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash HPC capabilityData analysis capacity Automation/abstractionHuman control Data-driven developmentScience-driven development Small, portable codeEnd-to-end tools Tight, local control of dataDistributed data Challenges and Tensions Making effective use of large allocations – takes a village Exaflood of data Resolution vs. parameterization Sampling (e.g. extreme events) TENSIONS “Having more data won’t substitute for thinking hard, recognizing anomalies, and exploring deep truths.” Samuel Arbeson, Wash. Post (18 Aug. 2013)
26
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash Athena and Minerva: Harbingers of the Exaflood Even on a system designed for big projects like these, HPC production capabilities overwhelm storage and processing, a particularly acute problem for ‘rapid burn’ projects such as Athena and Minerva Familiar diagnostics are hard to do at very high resolution Can’t “just recompute” – years of data analysis and mining after production phase Have we wrung all the “science” out of the data sets, given that we can only keep a small percentage of the total data volume on spinning disk? How can we tell? Must move from ad hoc problem solving systematic, repeatable workflows (e.g. incorporate post-processing and data management into production stream) (transform Noah’s Ark a Shipping Industry) “We need exaflood insurance.” - Jennifer Adams
27
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash ANY QUESTIONS?
28
COLA ASD: Project Minerva – SC13 − November 2013 – Ben Cash
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.