DMI Update Leif Laursen ( ) Jan Boerhout ( ) CAS2K3, September 7-11, 2003 Annecy, France.

DMI Update WWW.DMI.DK Leif Laursen ( ll@dmi.dk ) Jan Boerhout ( jboerhout@hpce.nec.com ) CAS2K3, September 7-11, 2003 Annecy, France

Danish Meteorological Institute DMI is the national weather service for Denmark, Greenland and the Faeroes. Weather forecasting, Oceanography, Climate Research and Environmental studies Use of numerical models in all areas Increased used of automatic products Demanding high availability of systems

32 Kbyte/s 10 Mbyte/s 2  4 processor SGI ORIGIN 200  data processing  graphics  verification operational database NEC-SX6  preprocessing  analysis  initialisation  forecast  postprocessing Mass storage device GTS-observations ECMWF boundary files

00Z ECMWF boundaries 12Z ECMWF boundaries 06Z ECMWF boundaries 18Z ECMWF boundaries

Evolution in RMS for MSLP

Quality of 24h forecasts of 10m wind speeds >= 8 m/s

Weibull distributions for 24 hour forecasts E, D, ECMWF and UKMO is also shown as well as curve for the observations.

The new NEC-SX6 computer at DMI April 2002March 2003 16+260+2 Memory96 Gbyte320 Gbyte Peak128 Gflops+ 16480 Gflops+ 16 Increase415 Disc1 Tbyte4 Tbyte Front end 2 AzusA systems with 4 cpu’s and 4 Gbyte each PLUS 2 AsamA systems with 4 cpu’s and 8 Gbyte each Total peak25.6 Gflops25.6+51.2 Gflops

Some events during the migration to the NEC-SX6 Oct. 01: Signature of contract between NEC and DMI April 02: Upgrade (advection scheme for q, CW and TKE) May 02: Installation of phase 1 of SX6 May 02: Parallel system on SX6 June 02: DMI-HIRLAM-I (0.014 degree, 602x600 grid) on SX-6 July 02: Stability test passed Sep. 02: Operational suite on SX6, later removal of SX4 Sep. 02: Testing of new developments (diff. and convection) Dec. 02: Upgrade: 40 levels, reduced time step, AMSU-A data Jan. 03: Revised contract between NEC and DMI Mar. 03: Installation of phase 2 of SX6 July 03: Stability test passed. Sep. 03: Improvement in data-assimilation (FGAT, QuikScat etc.) Early 04: New operational HIRLAM set-up using 6 nodes

HIRLAM Scalability Optimization Methods Implementation Performance

Optimization Focus Data transposition –from 2D to FFT distribution and reverse –from FFT to TRI distribution and reverse Exchange of halo points –between north and south –between east and west GRIB File I/O Statistics

Approach First attempt: straight-forward conversion from SHMEM to MPI-2 put/get calls –it works, but: –too much overhead due to fine granularity Redesign of transposition and halo swap routines less and larger messages independent message passing process groups

2D Sub Grids HIRLAM sub grid definition in TWOD data distribution Processors: 012 345 678 91011 latitude longitude levels

Original FFT Sub Grids HIRLAM sub grid definition in FFT data distribution Each processor handles slabs of full longitude lines latitude longitude levels

2D↔FFT Redistribution Sub grid data to be distributed to all processors: send-receive pairs 4 latitude longitude levels

345 2D↔FFT Redistribution Sub grids in east-west direction form full longitude lines nprocy independent sets of nprocx 2 send-receive pairs, or: send-receive pairs nprocy x less messages latitude longitude levels 5 4 3

Transpositions 2D↔FFT↔TRI 012 345 678 91011 2 1 0 5 4 3 8 7 6 10 9 25811 14710 0369 2DFFTTRI latitude longitude levels

MPI Methods Transfer Methods –Remote Memory Access:mpi_put, mpi_get –Async Point-to-Point:mpi_isend, mpi_irecv –All-to-All: mpi_alltoallv, mpi_alltoallw Buffering vs. direct –Explicit buffering –MPI derived types (Method selection by environment variables)

Test grid Details ParameterValueNotes Longitudes602 Latitudes568 Levels60 NSTOP40steps Initializationnone Time step180seconds Performance

Parallel Speedup on NEC SX-6 Cluster of 8 NEC SX-6 nodes at DMI Up to 60 processors:  7 nodes with 8 processors per node  1 node with 4 processors Parallel efficiency 78% on 60 processors

Performance - Observations New data redistribution method much more efficient (78% vs. 45% on 60 processors) No performance advantage with RMA (one- sided MP) or All-to-All over plain Point-to-Point method Elegant code with MPI derived types, but: Explicit buffering faster

Questions? Thank you!

DMI Update Leif Laursen ( ) Jan Boerhout ( ) CAS2K3, September 7-11, 2003 Annecy, France.

Similar presentations

Presentation on theme: "DMI Update Leif Laursen ( ) Jan Boerhout ( ) CAS2K3, September 7-11, 2003 Annecy, France."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DMI Update Leif Laursen ( ) Jan Boerhout ( ) CAS2K3, September 7-11, 2003 Annecy, France.

Similar presentations

Presentation on theme: "DMI Update Leif Laursen ( ) Jan Boerhout ( ) CAS2K3, September 7-11, 2003 Annecy, France."— Presentation transcript:

Similar presentations

About project

Feedback