Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technologies for the Future: CLUSTERS

Similar presentations


Presentation on theme: "Technologies for the Future: CLUSTERS"— Presentation transcript:

1 Technologies for the Future: CLUSTERS
Anne C. Elster Dept. of Computer & Information Science (IDI) Norwegian Univ. of Science & Tech. (NTNU) Trondheim, Norway NOTUR 2003 November 14, 2018 NOTUR Cluster proj. status

2 Clusters (Networks of PCs/Workstation)
Are they suitable for HPC? Advantage: Cost-effective hardware since uses COTS (Commercial Of-The-Shelf) parts BUT: Typically much slower processor interconnectes than traditional HPC systems What about usability? NTNU IDI’s 40-node AMD 1.46GHz cluster 2GB RAM, 40GB disk, Fast Ethernet November 14, 2018 NOTUR Cluster proj. status

3 NOTUR Cluster proj. status
Cluster Technologies: NOTUR Emerging Technology project Collaboration between NTNU & Univ. of Tromsø Goal: Analyze Cluster technologies’ suitability for HPC by looking at some of the most interesting NOTUR applications The results will provide a foundation for decisions regarding future HPC programs November 14, 2018 NOTUR Cluster proj. status

4 Main Collaborators include
Anne C. Elster (IDI, NTNU) – Project leader Otto Anshus & Tore Larsen (CS, U of Tromsø) Tor Johansen & staff (CC, U of Tromsø) Torbjørn Hallgren (IDI, NTNU) Einar Rønquist (IMF, NTNU) Master & Ph.D. Students and Post Docs at NTNU and Univ. of Tromsø November 14, 2018 NOTUR Cluster proj. status

5 General Issues to Consider:
Why cluster vs. Powerful desktop vs. Large SMPs? What are the total costs associated with clusters (hardware, software, support, usability) 32-bit vs. 64-bit architectures November 14, 2018 NOTUR Cluster proj. status

6 Cluster Project ACTIVITIES:
A.1 Profiling & Tuning Selected Applications: A.1.a/b Physics and Chemistry Codes (Elster & students, Dept. of Computer Science Dept., NTNU) A.1.2a Profiling & User-Analysis of Amber, Dalton & Gaussian (Tor Johansen & staff, Comp. Center, U of Tromsø) A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, Dept. of Comp. Sci., U of Tromsø) November 14, 2018 NOTUR Cluster proj. status

7 Cluster Project ACTIVITIES continuted:
A.2 Execution Monitoring (Anshus, Tore Larsen & students, CS, U of T) A.3 Visualization servers, etc. (Hallgren, Elster & students, CS, NTNU) A.4 Impact of future numerical algorithms (Rønquist & student, Dept. of Mathematics, NTNU A.5 Interface with NOTUR ET – Grid Project (Elster, Harald Simonsen and colleagues, staff & students associated with the NOTUR ET Cluster & Grid projects) November 14, 2018 NOTUR Cluster proj. status

8 NOTUR Cluster proj. status
A.1.a/b Physics & Chemistry Codes (Elster & students, Dept. of CS Dept., NTNU) Lessons Learned so far -- Paul Sack’s work on a Physics application (report available on the Web) FORTRAN problems: Different FORTRAN implementations have non-stardard add-ons (e.g. FORTRAN 90) Leads to great difficulty in porting code to a different platform with a different Fortran compiler (e.g. by a different vendor) November 14, 2018 NOTUR Cluster proj. status

9 A.1.a/b Physics & Chemistry Codes contin.
Performance of programs can individually vary on different machines Åsmund Østvold wrote a proj. report on porting PROTOMOL from an SMP w/ MPI one-siden communication primitives (MPI put/get) to a cluster. (available on WWW) He also did a MS study with SCALI on various MPI broadcast algorithms and bechmarking November 14, 2018 NOTUR Cluster proj. status

10 A.1.a/b Physics & Chemistry Codes contin.2
Ongoing work with Snorre Boasson & Jan Christian Meyer on porting of PIC code using Pthread (SMP primitives) to MPI . Preliminary report will be available later this week. ”Recent Trends in Cluster Computing” presented at ParCo 2003 by Elster et. al. includes harware trends and survey of libraries and performance tools. November 14, 2018 NOTUR Cluster proj. status

11 NOTUR Cluster proj. status
A.1.2a Profiling & User-Analysis of Amber, Dalton & Gaussian (Tor Johansen & staff, Comp. Center, U of Tromsø) Koordineringsarbeide Reise: NOTUR 2003 Porting og testing av Amber og Scali SW November 14, 2018 NOTUR Cluster proj. status

12 NOTUR Cluster proj. status
A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/students, CS, U of Tromsø) “Ytelsesmålinger gjort på DALTON” A Report for the NOTUR Project Emerging Technologies: Cluster Daniel Stødle, Otto J. Anshus, John Markus Bjørndalen “Survey of optimizing techniques for parallel programs running on computer clusters” Espen S. Johnsen, Otto J. Anshus, John Markus Bjørndalen, Lars Ailo Bongo (September 29, 2003) November 14, 2018 NOTUR Cluster proj. status

13 NOTUR Cluster proj. status
A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, IFI, U i Tromsø) CONTINUED RESULTS: Dalton scales pretty well – 25x speedup on 32 nodes NOTE: Only with-out caching temp. If use cache – only 3-5x speedup on 32! Even through the 8-way cluster had no local disk (only a netork file system), the sequential Dalton code was significantly faster. This indicates that network bandwith may not be a problem if caching is used in the parallel Communication pattern: master-slave "bag-of-tasks" oriented programs with little communicaiton & sychronization and generally good utilization of the slave nodes. Master does relatively little work and is blocked most of the time Finally checked if the master node could be a bottle neck, but could not detect differences in execution time when Master put on a slow node vs. a fast node.. NOTE: Only tested up to 32 nodes …using larger no. of nodes may limit performance by overloading the master node. November 14, 2018 NOTUR Cluster proj. status

14 NOTUR Cluster proj. status
A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, IFI, U i Tromsø) CONTINUED 2 Thanks to: Kenneth Ruud, Chemistry, UiT Roy Dragseth, CC UiT for support on the Itanium at U og Tromsø. November 14, 2018 NOTUR Cluster proj. status

15 A.2 Execution Monitoring (Anshus, Tore Larsen & students, CS, U of T)
“Survey of execution monitoring tools for computer clusters” Espen S. Johnsen, Otto J. Anshus, John Markus Bjørndalen, Lars Ailo Bongo, Sept 03 “Performance Monitoring” Lars Ailo Bongo, Otto J. Anshus, John Markus Bjørndalen November 14, 2018 NOTUR Cluster proj. status

16 NOTUR Cluster proj. status
A.3 Visualization servers, etc. (Hallgren, Elster & students, CS, NTNU) On going work with Torbjørn Vik Preliminary report on survey of how clusters are currently used in visualization: To types of Cluster usages:: off-line (non-real-time rendering). Often called "renderingfarms" with lots of nodes which all work on a frame each of a larger animation. Typically used in the film industry and other areas where interactivity and/or real-time rendering not needed. All larger 3D modelling programs such as Lightwave, 3DStudio, Maya has functionality for this. * on-line ( realtime). Most interesting from a technical viewpoint... November 14, 2018 NOTUR Cluster proj. status

17 A.3 Visualization servers, etc. - Contin.
Cluster brukes innenfor interaktiv visualiseringsprogramvare for å øke ytelsen, muliggjøre større datasett, unngå begrensninger i lokal hardware. De fleste visualiseringscluster fungerer prinsipielt ved at en bruker sitter på en klientmaskin som i seg selv ikke har noe særlig kapasitet. Clusteret tar seg av all beregning og sender bare de ferdige bildene til klienten. Klientmaskinen sørger også for å ta imot input fra bruker og sende disse til cluster. Datasett for slik visualisering er ofte svært store, og, avhengig av situasjonen, brukes både polygonbasert og voxelbasert rendering. Hovedproblemet med å få clusters brukbare innenfor interaktive visualiseringsprogram er forsinkelser pga nettverk. Dette løses ved å redusere tiden som brukes for å overføre bilder mellom cluster og klient. Det kan enten løses ved å redusere datamengden (komprimeringsmetoder) eller øke nettverksytelsen. Eller begge. Parallelitet i selve clusteret baseres på uavhengighetsforhold mellom forskjellige data. Det kan være uavhengigheter mellom forskjellige deler i samme datasett, eller det kan være uavhengigheter mellom forskjellige frames i et 4D datasett. Load-balancing blir ofte et problem i slike sammenhenger og er et viktig forskningsområde. Hvilken metode som brukes for load-balancing er som oftest svært kontekstavhengig. Clusterprogramvare for visualisering fremdeles manglende ?? November 14, 2018 NOTUR Cluster proj. status

18 NOTUR Cluster proj. status
A.4 Impact of future numerical algorithms (Rønquist & student, Dept. of Mathematics, NTNU Rønquist student Staff (now at Simulasenteret) wrote a report based on his summer jobb May add in experiences from Elster’s group – fall 2003 November 14, 2018 NOTUR Cluster proj. status

19 NOTUR Cluster proj. status
A.5 Interface with NOTUR ET – Grid Project (Elster, Harald Simonsen and colleagues, staff & students associated with the NOTUR ET Cluster & Grid projects) Test node established at NTNU Andreas Botnen(USIT) and Robin Holtet (IDI, now ITEA) May use IDI’s node cluster in testgrid Meetings Between Elster and Simonsen’s groups Robin Holtet and Elster’s student Thorvald Natvig to Linköping meeting this month. Collaborations re. National GRID and EEGE Student from NTNU and UiO at CERN November 14, 2018 NOTUR Cluster proj. status

20 NOTUR Cluster proj. status
Main cluster issues: Global operations have more severe impact on cluster performance than traditional supercomputers since communication between processors take relatively more of the total execution time SCALABILITY!! November 14, 2018 NOTUR Cluster proj. status

21 NOTUR Cluster proj. status
Lessons leared Clusters generally have cheap hardware, but may cause increased ”hidden” costs regarding: More incompatible compilers, especially Fortran 90 (also C++) Some applications are non-trivial to port from a share-memory paradigm to a distributed memory paradigms Some applications require high-bandwidth interconnects which drive up costs (e.g. SGI Altix) Power and cooling costs (ref. Brian Vinter) Stability, recovery Overall costs and scalability should be further studied November 14, 2018 NOTUR Cluster proj. status

22 The ”Ideal” Cluster -- Hardware
High-bandwidth network Low-latency network Low Operating System overhead (tcp causes ”slow start”) Great floating-point performance (64-bit processors or more?) November 14, 2018 NOTUR Cluster proj. status

23 The ”Ideal” Cluster -- Software
Compiler that is: Portable Optimizing Do extra work to save communication Self-tuning /Load -balanced Automatic selection of best algorithm One-sided communication support? Optimized middleware November 14, 2018 NOTUR Cluster proj. status

24 NOTUR Cluster proj. status
For more information: A dozen or more reports associated with this project will be made available on the web at: November 14, 2018 NOTUR Cluster proj. status


Download ppt "Technologies for the Future: CLUSTERS"

Similar presentations


Ads by Google