Download presentation
Presentation is loading. Please wait.
Published byJeffery Barker Modified over 8 years ago
1
Manchester Computing Supercomputing, Visualization & eScience Zoe Chaplin 11 September 2003 CAS2K3 Comparison of the Unified Model Version 5.3 on Various Platforms
2
Supercomputing, Visualization & eScience2 Contents Why am I talking about the UM? The Platforms The Model Version Results –Global –Mesoscale –Discussion about the IBM Conclusions and Further Work
3
Manchester Computing Supercomputing, Visualization & eScience Why am I talking about the UM?
4
Supercomputing, Visualization & eScience4 Why am I talking about the UM? SVE consists of CSAR (HPC group), MVC (Visualisation) and eScience CSAR has 512 processor Origin 3000, 816 processor Cray T3E and is getting a 256 processor Altix Itanium Many academic users of the UM based throughout the UK using the CSAR service (UGAMP – The UK Universities Global Atmospheric Modelling Group) Links between Manchester Computing and the Met Office
5
Manchester Computing Supercomputing, Visualization & eScience The Platforms
6
Supercomputing, Visualization & eScience6 The Platforms At CSAR –‘Green’: Origin 3000, 400MHz MIPS R12000 processors, 1GB memory/processor ie 512 GB memory in total –‘Turing’: Cray T3E, 816 600MHz processors, 256MB memory/processor ie 209 GB memory in total At ECMWF –IBM P690: 2x30 P690 nodes, 32 1.3GHz processors/node ie 16 POWER4 chips/node. Each node divided into 4 LPARs, most P690 nodes have 32 GB memory, 2x3 have 128 GB. SPSwitch2 connects LPARs.
7
Manchester Computing Supercomputing, Visualization & eScience The Model Version
8
Supercomputing, Visualization & eScience8 The Model Version All experiments performed at version 5.3 ‘the new dynamics’ Semi-Lagrangian dynamics Semi-implicit physics Non-hydrostatic Arakawa C-grid Charney-Phillips grid in the vertical Must have even number of procs in x direction
9
Supercomputing, Visualization & eScience9 Global Runs N216 ie 432x325 points in the horizontal 38 vertical levels 20 minute timestep Simulation run for 1 day ie 72 timesteps 5 point halos in both directions Up to 256 processors used (144 on the IBM) Fast solver used MPI used for communication
10
Supercomputing, Visualization & eScience10 Mesoscale Runs Mes covers the UK, parts of Northern Europe and Scandinavia 146x182 points in the horizontal 38 vertical levels 5 minute timestep Simulation run for 36 hours ie 432 timesteps
11
Supercomputing, Visualization & eScience11 Mesoscale Runs 5 point halos in both directions 8 points for merging lbcs with main field Up to 120 processors used Limited to 10 procs max east-west and 13 procs max north-south MPI used for communication
12
Manchester Computing Supercomputing, Visualization & eScience Global Results
13
Supercomputing, Visualization & eScience13 Global Results T3E required minimum of 24 processors so comparisons are against this value At lower processor numbers, the Origin 3000 proves to be the most scalable Above ~156 processors, the T3E overtakes the Origin The IBM is the least scalable but only run up to 144 processors (normally using < 8 cpus/LPAR – discussed later) Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs
14
Supercomputing, Visualization & eScience14 Global Results
15
Supercomputing, Visualization & eScience15 Global Results
16
Supercomputing, Visualization & eScience16 General Comments on the Global Results Up to 144 processors, normally better to use 4 processors in the east-west direction Below 32 processors for the IBM, results more varied – sometimes 2 is better Origin between 1.33 and 1.63 times faster than the T3E IBM between 2.36 and 3.07 times faster than the T3E
17
Supercomputing, Visualization & eScience17 Global Results
18
Supercomputing, Visualization & eScience18 Some Timings for the Global Model No of Processors T3EOriginIBM 24525135111709 7219301199681 1441057679432
19
Manchester Computing Supercomputing, Visualization & eScience Mesoscale Results
20
Supercomputing, Visualization & eScience20 Mesoscale Results T3E required minimum of 8 processors so comparisons are against this value Up to 48 processors the IBM out performs the other two machines (using < 8cpus/LPAR – discussed later) Above ~64 processors, the T3E has the greatest scalability Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs (using all 8 cpus/LPAR)
21
Supercomputing, Visualization & eScience21 Mesoscale Results
22
Supercomputing, Visualization & eScience22 Mesoscale Results
23
Supercomputing, Visualization & eScience23 General Comments on the Mesoscale Results For the T3E, better to use fewer processors in EW direction For the Origin, below 36 processors use 2 or 4 procs in the EW direction. For 36 procs and above use 6 or even 8 For the IBM, below 24 processors, use more procs in the north-south direction than the east-west. For 24 and above, reverse this
24
Supercomputing, Visualization & eScience24 General Comments on the Mesoscale Results Origin between 1.30 and 1.65 times faster than the T3E IBM between 1.93 and 3.82 times faster than the T3E The dip in the results on the IBM at 64 processors is due to having to use 8 procs/LPAR rather than 6 (discussed later)
25
Supercomputing, Visualization & eScience25 Mesoscale Results
26
Supercomputing, Visualization & eScience26 Some Timings for the Mesoscale Model No of Processors T3EOriginIBM 81248279924549 4822361407759 1201069825444
27
Manchester Computing Supercomputing, Visualization & eScience Discussion About the IBM
28
Supercomputing, Visualization & eScience28 Discussion about the IBM For a given processor configuration, results indicate that it is better to use < 8 tasks/LPAR If you are charged by the number of LPARs used then it is faster to use all the processors on the LPAR (ie increase the configuration) Eg For a configuration of 4x9, use 6 LPARs (ie 6 tasks/LPAR) However, if you are using 6 LPARs, the run will be faster using a 4x12 configuration. But 4x12 will produce faster results on 8 LPARs rather than 6.
29
Supercomputing, Visualization & eScience29 Discussion about the IBM Timings from the Global Model for the IBM No of ProcsProcessor Configuration No of LPARsTasks/LPARTiming 364x9661199 484x12681020 484x1286942 724x18126692 964x24128600 964x24166554
30
Manchester Computing Supercomputing, Visualization & eScience Conclusions and Further Work
31
Supercomputing, Visualization & eScience31 Conclusions The amount of time spent on optimising the UM for use with the T3E is clearly reflected in the results Further work needs to be done to optimise the code adequately for the Origin and the IBM The processor configuration for the IBM may be dependent on the charging mechanism
32
Supercomputing, Visualization & eScience32 Conclusions For a given configuration, using < 8 tasks/LPAR will produce faster results On all machines, it is generally better to use as few processors in the east-west direction as possible
33
Supercomputing, Visualization & eScience33 Further Work Optimisations for the Origin and the P690 Perform similar tests on the SGI Altix Itanium
34
Manchester Computing Supercomputing, Visualization & eScience World Leading Supercomputing Service, Support and Research Bringing Science and Supercomputers Together www.man.ac.uk/sve sve@man.ac.uk Thanks to ECMWF SVE @ Manchester Computing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.