T511-L60 CY26R MPI tasks and 4 OpenMP threads

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Distributed Systems CS
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
21/03/2000 9th ORAP Forum ECMWF/DM-1 Simulation et prevision du temps Dominique Marbouty Head, Operations Department European Centre for Medium-range Weather.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
ECMWF Slide 1CAS2K3, Annecy, 7-10 September 2003 Report from ECMWF Walter Zwieflhofer European Centre for Medium-Range Weather Forecasting.
National Weather Service National Weather Service Central Computer System Backup System Brig. Gen. David L. Johnson, USAF (Ret.) National Oceanic and Atmospheric.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Angèle Simard Canadian Meteorological Center Meteorological Service of Canada MSC Computing Update.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Soil moisture generation at ECMWF Gisela Seuffert and Pedro Viterbo European Centre for Medium Range Weather Forecasts ELDAS Interim Data Co-ordination.
High performance parallel computing of climate models towards the Earth Simulator --- computing science activities at CRIEPI --- Yoshikatsu Yoshida and.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
CLIM Fall 2008 What are the Roles of Satellites & Supercomputers in Studying Weather and Climate? CLIM 101.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
National Centers for Environmental Prediction: “Where America’s Climate, Weather and Ocean Services Begin” An Overview.
ECMWF Training course 26/4/2006 DRD meeting, 2 July 2004 Frederic Vitart 1 Predictability on the Monthly Timescale Frederic Vitart ECMWF, Reading, UK.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Applied Operating System Concepts
Copernicus Atmosphere Monitoring Service
Chapter 1: Introduction
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Dieter an Mey Center for Computing and Communication, RWTH Aachen University, Germany V 3.0.
Software Architecture in Practice
Distributed Processors
4D-VAR Optimization Efficiency Tuning
European Centre for Medium-Range Weather Forecasts
Parallel computer architecture classification
Conception of parallel algorithms
Plans for Met Office contribution to SMOS+STORM Evolution
GPC-Seoul: Status and future plans
UNIFIED GLOBAL COUPLED SYSTEM (UGCS) FOR WEATHER AND CLIMATE PREDICTION Saha-UMAC-09Aug2016.
Chapter 1: Introduction
Course Evaluation Now online You should have gotten an with link.
Course Evaluation Now online You should have gotten an with link.
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
BlueGene/L Supercomputer
Course Evaluation Now online You should have gotten an with link.
FSOI adapted for used with 4D-EnVar
CLUSTER COMPUTING.
Operating System Concepts
with Computational Scientists
CSE8380 Parallel and Distributed Processing Presentation
Chapter 1: Introduction
Introduction to Operating Systems
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
ECMWF activities: Seasonal and sub-seasonal time scales
ECMWF "uncoupled" coupled model: Coupling using subroutine calls
Introduction to Operating Systems
Comparison of different combinations of ensemble-based and variational data assimilation approaches for deterministic NWP Mark Buehner Data Assimilation.
Development of an advanced ensemble-based ocean data assimilation approach for ocean and coupled reanalyses Eric de Boisséson, Hao Zuo, Magdalena Balmaseda.
Chapter 1: Introduction
Chapter 1: Introduction
Memory System Performance Chapter 3
MOGREPS developments and TIGGE
Chapter 1: Introduction
Observational Data Source Impacts In The NCEP GDAS
Operating System Concepts
European Centre for Medium-Range Weather Forecasts
Chapter 1: Introduction
Presentation transcript:

T511-L60 CY26R3 - 32 MPI tasks and 4 OpenMP threads

TL1023 ~ 20 km

AFES on the Earth Simulator (T1279L96) T1279 ~ 10 km Source: Satoru Shingu et al. “A 26.58 Tflops Global Atmospheric Simulation with the Spectral Transform Method on the Earth Simulator”

ECMWF’s recent experience For the 6 months up to March 03 ECMWF operated three vector parallel systems from Fujitsu and two large scalar Cluster 1600s from IBM Advantages of vector-based machines include: Clear feedback on the efficiency of the codes being run Better environmental characteristics than systems built using general- purpose SMP servers Advantages of scalar-based machines include: Front-end machines for scalar-only tasks are not required Being more forgiving if a medium level of efficiency is acceptable The consensus view of our application programmers is that vector and scalar system can equally well provide the HPC resources for high-end modelling work

Which architecture is best suited for the job? Number 1 criteria: cost/performance For Earth System Modelling, the following do not constitute good measures for cost/performance evaluations: Peak performance LINPACK results Sustained Gflops that can be achieved on the user’s application as measured by HW counters Peak/sustained performance ratio The most reliable measure for performance is the wall-clock time needed to solve a given task (e.g. 100- year simulation that is representative for the scientific work planned)

Scalability issues Earth System codes do not scale linearly if the same size problem is run on a larger number of processors: Load balancing, subroutine call overheads increase, etc. Serial components (Amdahl’s law) The planned increases in the problem sizes mitigate this effect Generally, a high ratio of the following is advantageous: # of parallel instruction streams in the application (and their length) # of parallel instruction streams required to keep the hardware busy In this respect, an 8-way M&A vector pipe will require as many independent instruction streams as 8 scalar M&A units The sustainable flop count (in absolute terms) per “hardware thread” is very important for application scalability The ratio of peak/sustained flops is not

ECMWF Model /Assimilation / Computing Status & Plans Anthony Hollingsworth Walter Zwieflhoefer Deborah Salmond

Scope of talk ECMWF operational forecast-systems 2003 ECMWF HPC configuration 2003, and planned upgrade Operational timings for production model codes Planned Forecast system upgrades 2004-2007 Drivers for 2007 Computer Upgrade

ECMWF operational assimilation systems 2003 4D-Var with 12-hour period, inner-loop minimizations at (up to) T159 L60, outer-loop resolution T511 L60. Short-cut-off analyses (6-hourly 3D-Var + 4 forecasts/day) for Limited Area Modelling in Member States. Ocean wave assimilation system Ocean circulation assimilation system

ECMWF operational prediction systems 2003 Deterministic Forecasts to 10-days, 2x day, T511/ L60 model Ensemble Predictions to 10-days, 2x day, T255 /L40 , N=51 Ensemble Forecasts to 1 month, using T159 / L40 atmosphere 110km ocean (33km merid. in the tropics). 2 per month at present, weekly in 2004 Seasonal forecasts, once per month, based on ensembles using a T95 (210km) L40 atmospheric model same ocean model as used for the one-month forecasts.

Performance profile of the contracted IBM solution relative to VPP systems 5 4 Phase 3 Regatta H+ Federation Switch 3 Phase 2 2 Phase 1 1 Fujitsu 2002 2003 2004 2005 2006 Performance on ECMWF codes (Fujitsu = 400 GF sustained)

ECMWF HPC configuration, end 2003 Two IBM cluster 1600s, with 30 x p690 servers each Each p690 is partitioned into four 8-way nodes Each of the clusters has 120 nodes 12 of the nodes in each cluster have 32 GB memory All other nodes have 8 GB memory Processors run at 1.3 GHz 960 processors per cluster for user work Dual-plane colony switch with PCI adapters 4.2 terabytes of disk space per cluster Both clusters are configured identically

(=> Forecast days/day) Timings for production model codes, IFS Cycle 26r3, October 2003 (D.Salmond) Resolution Number of PEs Layout of PEs: (MPI x OMP) Time step Time for 10-day forecast (=> Forecast days/day) Equivalent FC_d/d on 1920 PEs T511/L60 288 (72 x 4) 900s 4298s (~201 FCd/d on 288 PEs) 1340 FC_d/d (3.67 yrs) on 1920 PEs T255/L40 32 (8 x 4) 1 node 2700s 1800s (~480 FCd/d on 32 Pes) 28800 FC_d/d (78.9 yrs) on 1920 PEs

Relative efficiency of T255 and T511 production runs To meet delivery schedule T255/L40 is run on 1 server (32 Pes) T511/L60 is run on 9 servers Expected speed-up of T255 L40 v. T511 L60 Horizontal resolution x 4 Vertical resolution x 1.5 Time-step x 3 OVERALL x 18 Actual speed-up of daily production on 1920 PEs 28800/1340 x 21.5 Benefit of reduced communication, and other scalability issues?

Operational timings for production Seasonal Forecast code, October 2003 (D.Salmond) System runs on 1 (LPAR) 8PEs Resolution # threads /PEs Time step Time for 6 mo. Forecast (FC_d/d) Equivalent FC_d/d on 1920 PEs IFS Atmosphere T95/L40 3 OMP threads 3600s 10hrs HOPE Ocean 1 degree ( 0.3 deg in equ. band) (441 d/d on 8 PEs) 105984 FC_d/d (288 yr/d) on 1920 PEs OASIS Coupler 1PE 24 hours

Relative efficiency of T255 atmosphere and T95 Seasonal Forecast production runs Expected speed-up of Seas_Fcst model V. T255 The cost of the ocean is dominant; 1 deg. Ocean with 40 levels is estimated ~T159 L40; T95 atmosphere waits for the ocean; ignore different costs of physics in ocean and atmosphere) Horizontal resolution (255/159)**2 x 2.6 Vertical resolution (40/40) x 1.0 Time-step (3600/ 2400) x 1.5 OVERALL x 3.9 Actual speed-up of daily production on 1920 PEs 105984/28800 x 3.67

Planned Forecast system upgrades 2004-2005 on IBM phase 3 The expected resolutions are: Deterministic forecast & outer loops 4D-Var: T799 (25km) L91 Ensemble Prediction System: T399 (50km) L65 Inner loops of 4D-Var T255 (80km) L91 15-day and monthly forecast system T255(80km )/L65 T159(125km)/L65  

Drivers for 2007 Computer Upgrade Increased computational resources are needed in 2007, to enable:- 4D-Var inner loop resolution T399 An ensemble component of the data assimilation Further improvement of the inner-loop physics Increased use of satellite data (both reduced thinning and introduction of new instruments such as IASI) Increase in resolution for the seasonal forecasting system (to T159 L65 for the atmospheric model).

END thank you for your attention!

Research Directions and Operational Targets 2004 Assimilation of MSG data and additional ENVISAT and AIRS data Increased vertical resolution, particularly in the vicinity of the tropopause Upgrades of inner-loop physics and assimilation of cloud/rainfall information Weekly running of the monthly forecasting system Preparation for upgrades in horizontal resolution High-resolution moist singular vectors for the EPS initial states

Research Directions and Operational Targets 2005   Final validation and implementation of increases in horizontal resolution Validation and assimilation of new satellite data such as SSMIS, AMSR, OMI and HIRDLS Enhanced preparations for monitoring and assimilation of METOP data Seamless ensemble forecast system for medium-range and monthly forecasts

Research Directions and Operational Targets 2006 - 2007 Monitoring and then assimilation of data from the METOP_instruments (IASI/ AMSU/ HIRS/ MHS/ ASCAT/ GRAS/ GOME) Preparation for NPP Increased inner-loop resolution & enhanced inner-loop physics for 4D-Var Ensemble component to data assimilation Increased resolution and forecast range for seasonal forecasting