Presentation is loading. Please wait.

Presentation is loading. Please wait.

Manchester Computing Supercomputing, Visualization & eScience Zoe Chaplin 11 September 2003 CAS2K3 Comparison of the Unified Model Version 5.3 on Various.

Similar presentations


Presentation on theme: "Manchester Computing Supercomputing, Visualization & eScience Zoe Chaplin 11 September 2003 CAS2K3 Comparison of the Unified Model Version 5.3 on Various."— Presentation transcript:

1 Manchester Computing Supercomputing, Visualization & eScience Zoe Chaplin 11 September 2003 CAS2K3 Comparison of the Unified Model Version 5.3 on Various Platforms

2 Supercomputing, Visualization & eScience2 Contents  Why am I talking about the UM?  The Platforms  The Model Version  Results –Global –Mesoscale –Discussion about the IBM  Conclusions and Further Work

3 Manchester Computing Supercomputing, Visualization & eScience Why am I talking about the UM?

4 Supercomputing, Visualization & eScience4 Why am I talking about the UM?  SVE consists of CSAR (HPC group), MVC (Visualisation) and eScience  CSAR has 512 processor Origin 3000, 816 processor Cray T3E and is getting a 256 processor Altix Itanium  Many academic users of the UM based throughout the UK using the CSAR service (UGAMP – The UK Universities Global Atmospheric Modelling Group)  Links between Manchester Computing and the Met Office

5 Manchester Computing Supercomputing, Visualization & eScience The Platforms

6 Supercomputing, Visualization & eScience6 The Platforms  At CSAR –‘Green’: Origin 3000, 400MHz MIPS R12000 processors, 1GB memory/processor ie 512 GB memory in total –‘Turing’: Cray T3E, 816 600MHz processors, 256MB memory/processor ie 209 GB memory in total  At ECMWF –IBM P690: 2x30 P690 nodes, 32 1.3GHz processors/node ie 16 POWER4 chips/node. Each node divided into 4 LPARs, most P690 nodes have 32 GB memory, 2x3 have 128 GB. SPSwitch2 connects LPARs.

7 Manchester Computing Supercomputing, Visualization & eScience The Model Version

8 Supercomputing, Visualization & eScience8 The Model Version  All experiments performed at version 5.3 ‘the new dynamics’  Semi-Lagrangian dynamics  Semi-implicit physics  Non-hydrostatic  Arakawa C-grid  Charney-Phillips grid in the vertical  Must have even number of procs in x direction

9 Supercomputing, Visualization & eScience9 Global Runs  N216 ie 432x325 points in the horizontal  38 vertical levels  20 minute timestep  Simulation run for 1 day ie 72 timesteps  5 point halos in both directions  Up to 256 processors used (144 on the IBM)  Fast solver used  MPI used for communication

10 Supercomputing, Visualization & eScience10 Mesoscale Runs  Mes covers the UK, parts of Northern Europe and Scandinavia  146x182 points in the horizontal  38 vertical levels  5 minute timestep  Simulation run for 36 hours ie 432 timesteps

11 Supercomputing, Visualization & eScience11 Mesoscale Runs  5 point halos in both directions  8 points for merging lbcs with main field  Up to 120 processors used  Limited to 10 procs max east-west and 13 procs max north-south  MPI used for communication

12 Manchester Computing Supercomputing, Visualization & eScience Global Results

13 Supercomputing, Visualization & eScience13 Global Results  T3E required minimum of 24 processors so comparisons are against this value  At lower processor numbers, the Origin 3000 proves to be the most scalable  Above ~156 processors, the T3E overtakes the Origin  The IBM is the least scalable but only run up to 144 processors (normally using < 8 cpus/LPAR – discussed later)  Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs

14 Supercomputing, Visualization & eScience14 Global Results

15 Supercomputing, Visualization & eScience15 Global Results

16 Supercomputing, Visualization & eScience16 General Comments on the Global Results  Up to 144 processors, normally better to use 4 processors in the east-west direction  Below 32 processors for the IBM, results more varied – sometimes 2 is better  Origin between 1.33 and 1.63 times faster than the T3E  IBM between 2.36 and 3.07 times faster than the T3E

17 Supercomputing, Visualization & eScience17 Global Results

18 Supercomputing, Visualization & eScience18 Some Timings for the Global Model No of Processors T3EOriginIBM 24525135111709 7219301199681 1441057679432

19 Manchester Computing Supercomputing, Visualization & eScience Mesoscale Results

20 Supercomputing, Visualization & eScience20 Mesoscale Results  T3E required minimum of 8 processors so comparisons are against this value  Up to 48 processors the IBM out performs the other two machines (using < 8cpus/LPAR – discussed later)  Above ~64 processors, the T3E has the greatest scalability  Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs (using all 8 cpus/LPAR)

21 Supercomputing, Visualization & eScience21 Mesoscale Results

22 Supercomputing, Visualization & eScience22 Mesoscale Results

23 Supercomputing, Visualization & eScience23 General Comments on the Mesoscale Results  For the T3E, better to use fewer processors in EW direction  For the Origin, below 36 processors use 2 or 4 procs in the EW direction. For 36 procs and above use 6 or even 8  For the IBM, below 24 processors, use more procs in the north-south direction than the east-west. For 24 and above, reverse this

24 Supercomputing, Visualization & eScience24 General Comments on the Mesoscale Results  Origin between 1.30 and 1.65 times faster than the T3E  IBM between 1.93 and 3.82 times faster than the T3E  The dip in the results on the IBM at 64 processors is due to having to use 8 procs/LPAR rather than 6 (discussed later)

25 Supercomputing, Visualization & eScience25 Mesoscale Results

26 Supercomputing, Visualization & eScience26 Some Timings for the Mesoscale Model No of Processors T3EOriginIBM 81248279924549 4822361407759 1201069825444

27 Manchester Computing Supercomputing, Visualization & eScience Discussion About the IBM

28 Supercomputing, Visualization & eScience28 Discussion about the IBM  For a given processor configuration, results indicate that it is better to use < 8 tasks/LPAR  If you are charged by the number of LPARs used then it is faster to use all the processors on the LPAR (ie increase the configuration)  Eg For a configuration of 4x9, use 6 LPARs (ie 6 tasks/LPAR)  However, if you are using 6 LPARs, the run will be faster using a 4x12 configuration. But 4x12 will produce faster results on 8 LPARs rather than 6.

29 Supercomputing, Visualization & eScience29 Discussion about the IBM Timings from the Global Model for the IBM No of ProcsProcessor Configuration No of LPARsTasks/LPARTiming 364x9661199 484x12681020 484x1286942 724x18126692 964x24128600 964x24166554

30 Manchester Computing Supercomputing, Visualization & eScience Conclusions and Further Work

31 Supercomputing, Visualization & eScience31 Conclusions  The amount of time spent on optimising the UM for use with the T3E is clearly reflected in the results  Further work needs to be done to optimise the code adequately for the Origin and the IBM  The processor configuration for the IBM may be dependent on the charging mechanism

32 Supercomputing, Visualization & eScience32 Conclusions  For a given configuration, using < 8 tasks/LPAR will produce faster results  On all machines, it is generally better to use as few processors in the east-west direction as possible

33 Supercomputing, Visualization & eScience33 Further Work  Optimisations for the Origin and the P690  Perform similar tests on the SGI Altix Itanium

34 Manchester Computing Supercomputing, Visualization & eScience World Leading Supercomputing Service, Support and Research Bringing Science and Supercomputers Together www.man.ac.uk/sve sve@man.ac.uk Thanks to ECMWF SVE @ Manchester Computing


Download ppt "Manchester Computing Supercomputing, Visualization & eScience Zoe Chaplin 11 September 2003 CAS2K3 Comparison of the Unified Model Version 5.3 on Various."

Similar presentations


Ads by Google