Download presentation
Presentation is loading. Please wait.
Published byBernice Williams Modified over 8 years ago
1
4.3.2003SOS71 Is a Grid cost-effective? Ralf Gruber, EPFL-SIC/FSTI-ISE-LIN, Lausanne
2
4.3.2003SOS72 TOP500: 176 in Europe, 12 have more than 1 Tflops/s Linpack First is CEA-DAM: No. 7 Germany: 71, UK: 39, France: 22, Italy: 16, Others: 28 Industry: 108, first (Telecom I) at No. 96 BMW: 11, Daimler-Chrysler: 5, Car F: 6 Not one big, but many smaller machines HPC Companies: Quadrics Scali, SCI-based clusters: No. 51 SCS: see Toni’s presentation Beowulf production: Paralline, Dalco,...... HPC in Europe
3
4.3.2003SOS73 The Swiss-Tx machines (with TNet switch): 1998: Prototype Swiss-T0 with 16 Alphas 21164 1999: Swiss-T1 (Baby) with 16 Alphas 21264 2000: Swiss-T1 with 70 Alphas 21264 Know-how transfer to industry: 2001: GeneProt protein sequencing machine with 1420 Alphas 21264 Peak performance=1780Gflop/s In June 2001, would have been No. 12 in the Top500, 2nd in Europe and Was world number 1 of industrial computer installations Would be No. 48 (=C-Plant) in the Top500 list of November 2002 and Is still number 2 of industrial computer installations Swiss-Tx project
4
4.3.2003SOS74 NO! Is a grid cost-effective? Reasons: Since 25 years, we can use machines all over the world Those who needed good connections, installed it (HEPNET, Swissprot,..) Using Java is against HPC
5
4.3.2003SOS75 EPFL-SIC: SGI Origin3800 (500 MHz) 128 processors HP Alpha ES45/Quadrics (1.25 GHz) 100 processors Institutes PC clusters (CFD, Chemistry, Mathematics, Physics) IBM SP-2 (EFD) CSCS NEC SX-5 (16 processors) IBM Regatta (256 processors, 1.3 GHz) Parallel machines at EPFL and CSCS
6
4.3.2003SOS76 Parameterisation of. Single processor. Cluster. Application Application tailored Grid scheduling Optimal grid scheduling
7
4.3.2003SOS77 V a = Operations (Ops) / Memory accesses (LS) Examples SAXPY: y = y + a * x Ops = 2 LS = 3 (2 loads + 1 store) V a = 2 / 3 Matrix*matrix multiply and add: V a = n / 2 r a = min (R , R * V a / V m ) = min (R , M * V a ) r a = 2/3 * M r a = R Characteristic single processor parameters V a and r a
8
4.3.2003SOS78 V m = R [Mflop/s] / M [Mword/s] MachinePR r a =M V r % NEC SX-5180008000 1 Pentium 4 1.5/R11500 400 422957 Alpha 2126422000 333 620060 Pentium 4 1.7/S11700 133 12 9269 AMD 1.2/S12400 133 18 5743 r:Performance mesurée %:100*r/ r a /S: Slow SDRAM memory /R:Fast Rambus or RDRAM memory Results with MATMULT V a = 1 (double precision) R [Mflop/s] = Theoretical peak performance M [Mword/s] = Theoretical peak memory bandwidth
9
4.3.2003SOS79 > 1 Tailoring clusters to applications
10
4.3.2003SOS710 = a / m Application: a = O / S Machine: m = r a / b O: Number of operations in Flops S: Number of words sent in Words r a : Theoretical peak performance of application in Mflops/s b: Peak network bandwidth per processor in Mwords/s Tailoring clusters to applications
11
4.3.2003SOS711 Table : The m values for MATMULT (double precision) Machine P P*r a C m [Mflops/s][Mwords/s] T1 (TNet) 32*221333 640 1.25 40 T1 (Fast Ethernet) 32*221333 481444 IELNX (P4+FE)22 8800 341250 m = P * r a [Mflops/s] * / C [Mwords/s] m = r a / b b = C / P Cluster characterisation
12
4.3.2003SOS712 Swiss-T1 (TNet): r a = 1000 Mflops/s, b = 10 Mwords/s m = 100 Water molecules: a = 5*P*(0.65*N orb +4.24*log 2 V) / 3*(P-1) P=8, N orb =128, log 2 V=20 a = 330 = 3.3 (3.6 measured) -> 25% of overall time is due to communication 75% is due to computation LAUTREC on Swiss-T1 + TNet
13
4.3.2003SOS713 Swiss-T1 (FE): r a = 2000 Mflops/s, b = 1.5 Mwords/s m = 1333 Water molecules: a = 5*P*(0.65*N orb +4.24*log 2 V) / 3*(P-1) P=8, N orb =128, log 2 V=20 a = 330 = 0.25 (0.25 measured) -> 20% of overall time is due to computation 80% is due to communication LAUTREC on Swiss-T1 + Fast Ethernet
14
4.3.2003SOS714 TNet/Swiss-T1: L=13 s MPI latency, b=80MB/s Break-even message length: beml=L*b=1000B Fast Ethernet: L=100 s MPI latency, b=10MB/s Break-even message length: beml=L*b=1000B Average message length in Lautrec: aml= *V/16*P 2 For test case (V=96**3, P=8): aml=40 kB>>beml LAUTREC : Effect of latency
15
4.3.2003SOS715 a = Operations (O) / Sends (S) FE/FV: O Nb of volume nodes O Nb of variables per node square O Nb of non-zero matrix elements O Nb of operations per matrix element FE/FV: S Nb of surface nodes S Nb of variables per node FE/FV: a Nb of nodes in one direction a Nb of variables per node a Nb of non-zero matrix elements a Nb of operations per matrix element a Nb of surfaces a (NS/FV/100**3) C 2000 a (Poisson/FD/100**3) C 400 Reminder (Beowulf+Fast Ethernet): m C 250 Point-to-point applications
16
4.3.2003SOS716 Memory usage Price per 1h CPU time Engineering salary Energy consumption Maintenance/servicing/personnel costs User commodity Other quantities
17
4.3.2003SOS717 Goal: Add an application tailored Grid scheduling to RMS. Estimate machine and application parameters by counts. Measure machine and application parameters (PAPI,...). Build up a data base on these parameters. Find and submit to best suited Grid ressource (not always optimum). Update the data base dynamically. Perform statistics on decisions and decision failures Optimal Grid scheduling
18
4.3.2003SOS718 Settle and apply rules to find best suited ressource by:. Match machine/application (MPI or not MPI). Best price/performance ratio based on parameterisation. Availability of the ressources. Engineering costs. Energy consumption Optimal Grid scheduling
19
4.3.2003SOS719 Perform statistics to:. Detect too often demanded unavailable ressources. Detect real costs of an application. Detect applications that should be parallelised/optimised to reduce costs. Guide decision making for the next purchase. Guide decision on R&D money attribution Optimal Grid scheduling
20
4.3.2003SOS720 Yes, it can be! Is a grid cost-effective? Minimise overall costs by application adapted job execution Purchase not available demanded low-cost ressources Parallelise cost-ineffective applications Reduce engineering and energy costs Note: “Cheap” ressources do not have to be used up during 90% Results in More computing ressources for the same price More rapid increase of application efficiencies Questions Do computer manufacturers play the game? Do application owners play the game? Can we change users, decision makers and computing centres?
21
4.3.2003SOS721 R. Gruber, P. Volgers, A. de Vita, M. Stengel, T.-M. Tran, Parameterisation to tailor commodity clusters to applications, Future Generation Computer Systems 19 (2003) 111-120 see also: http://sawww.epfl.ch/SIC/SA/publications/SCR02/scr13e.html Reference
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.