Technologies for the Future: CLUSTERS

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

Distributed Systems CS
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Performance Analysis of MPI Communications on the SGI Altix 3700 Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan Distributed & High Performance.
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Anne C. Elster1 CLUSTER TECHNOLOGIES (Foilene ble også presentert på NOTUR 2003) Anne C. Elster Dept. of Computer & Information Science (IDI) Norwegian.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Understanding Parallel Computers Parallel Processing EE 613.
Parallel IO for Cluster Computing Tran, Van Hoai.
Background Computer System Architectures Computer System Software.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Canadian Bioinformatics Workshops
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
These slides are based on the book:
Roy Taragan Shaham Kenat
DDC 2223 SYSTEM SOFTWARE DDC2223 SYSTEM SOFTWARE.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Processing
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Software Architecture in Practice
ECRG High-Performance Computing Seminar
Parallel Programming By J. H. Wang May 2, 2017.
Computer System and Programming
Grid Computing.
Stallo: First impressions
Constructing a system with multiple computers or processors
CRESCO Project: Salvatore Raia
Is System X for Me? Cal Ribbens Computer Science Department
University of Technology
Introduction to Reconfigurable Computing
UNIV 103 CS Majors Seminar Dr. Blaise W. Liffick Fall 2017.
Department of Computer Science University of California, Santa Barbara
Summary Background Introduction in algorithms and applications
Support for ”interactive batch”
Multiple Processor Systems
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Introduction to Operating Systems
Hybrid Programming with OpenMP and MPI
COMP60621 Fundamentals of Parallel and Distributed Systems
Prof. Leonardo Mostarda University of Camerino
High Performance Computing
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
Chapter 4 Multiprocessors
Introduction, background, jargon
Department of Computer Science, University of Tennessee, Knoxville
Distributing META-pipe on ELIXIR compute resources
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Database System Architectures
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
COMP60611 Fundamentals of Parallel and Distributed Systems
Support for Adaptivity in ARMCI Using Migratable Objects
Types of Parallel Computers
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Technologies for the Future: CLUSTERS Anne C. Elster Dept. of Computer & Information Science (IDI) Norwegian Univ. of Science & Tech. (NTNU) Trondheim, Norway NOTUR 2003 November 14, 2018 NOTUR Cluster proj. status

Clusters (Networks of PCs/Workstation) Are they suitable for HPC? Advantage: Cost-effective hardware since uses COTS (Commercial Of-The-Shelf) parts BUT: Typically much slower processor interconnectes than traditional HPC systems What about usability? NTNU IDI’s 40-node AMD 1.46GHz cluster 2GB RAM, 40GB disk, Fast Ethernet November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status Cluster Technologies: NOTUR Emerging Technology project Collaboration between NTNU & Univ. of Tromsø Goal: Analyze Cluster technologies’ suitability for HPC by looking at some of the most interesting NOTUR applications The results will provide a foundation for decisions regarding future HPC programs November 14, 2018 NOTUR Cluster proj. status

Main Collaborators include Anne C. Elster (IDI, NTNU) – Project leader Otto Anshus & Tore Larsen (CS, U of Tromsø) Tor Johansen & staff (CC, U of Tromsø) Torbjørn Hallgren (IDI, NTNU) Einar Rønquist (IMF, NTNU) Master & Ph.D. Students and Post Docs at NTNU and Univ. of Tromsø November 14, 2018 NOTUR Cluster proj. status

General Issues to Consider: Why cluster vs. Powerful desktop vs. Large SMPs? What are the total costs associated with clusters (hardware, software, support, usability) 32-bit vs. 64-bit architectures November 14, 2018 NOTUR Cluster proj. status

Cluster Project ACTIVITIES: A.1 Profiling & Tuning Selected Applications: A.1.a/b Physics and Chemistry Codes (Elster & students, Dept. of Computer Science Dept., NTNU) A.1.2a Profiling & User-Analysis of Amber, Dalton & Gaussian (Tor Johansen & staff, Comp. Center, U of Tromsø) A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, Dept. of Comp. Sci., U of Tromsø) November 14, 2018 NOTUR Cluster proj. status

Cluster Project ACTIVITIES continuted: A.2 Execution Monitoring (Anshus, Tore Larsen & students, CS, U of T) A.3 Visualization servers, etc. (Hallgren, Elster & students, CS, NTNU) A.4 Impact of future numerical algorithms (Rønquist & student, Dept. of Mathematics, NTNU A.5 Interface with NOTUR ET – Grid Project (Elster, Harald Simonsen and colleagues, staff & students associated with the NOTUR ET Cluster & Grid projects) November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.1.a/b Physics & Chemistry Codes (Elster & students, Dept. of CS Dept., NTNU) Lessons Learned so far -- Paul Sack’s work on a Physics application (report available on the Web) FORTRAN problems: Different FORTRAN implementations have non-stardard add-ons (e.g. FORTRAN 90) Leads to great difficulty in porting code to a different platform with a different Fortran compiler (e.g. by a different vendor) November 14, 2018 NOTUR Cluster proj. status

A.1.a/b Physics & Chemistry Codes contin. Performance of programs can individually vary on different machines Åsmund Østvold wrote a proj. report on porting PROTOMOL from an SMP w/ MPI one-siden communication primitives (MPI put/get) to a cluster. (available on WWW) He also did a MS study with SCALI on various MPI broadcast algorithms and bechmarking November 14, 2018 NOTUR Cluster proj. status

A.1.a/b Physics & Chemistry Codes contin.2 Ongoing work with Snorre Boasson & Jan Christian Meyer on porting of PIC code using Pthread (SMP primitives) to MPI . Preliminary report will be available later this week. ”Recent Trends in Cluster Computing” presented at ParCo 2003 by Elster et. al. includes harware trends and survey of libraries and performance tools. November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.1.2a Profiling & User-Analysis of Amber, Dalton & Gaussian (Tor Johansen & staff, Comp. Center, U of Tromsø) Koordineringsarbeide Reise: NOTUR 2003 Porting og testing av Amber og Scali SW November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/students, CS, U of Tromsø) “Ytelsesmålinger gjort på DALTON” A Report for the NOTUR Project Emerging Technologies: Cluster Daniel Stødle, Otto J. Anshus, John Markus Bjørndalen “Survey of optimizing techniques for parallel programs running on computer clusters” Espen S. Johnsen, Otto J. Anshus, John Markus Bjørndalen, Lars Ailo Bongo (September 29, 2003) November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, IFI, U i Tromsø) CONTINUED RESULTS: Dalton scales pretty well – 25x speedup on 32 nodes NOTE: Only with-out caching temp. If use cache – only 3-5x speedup on 32! Even through the 8-way cluster had no local disk (only a netork file system), the sequential Dalton code was significantly faster. This indicates that network bandwith may not be a problem if caching is used in the parallel Communication pattern: master-slave "bag-of-tasks" oriented programs with little communicaiton & sychronization and generally good utilization of the slave nodes. Master does relatively little work and is blocked most of the time Finally checked if the master node could be a bottle neck, but could not detect differences in execution time when Master put on a slow node vs. a fast node.. NOTE: Only tested up to 32 nodes …using larger no. of nodes may limit performance by overloading the master node. November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.1.2b Optimization & tool analysis of Dalton (Anshus & PostDoc/student, IFI, U i Tromsø) CONTINUED 2 Thanks to: Kenneth Ruud, Chemistry, UiT Roy Dragseth, CC UiT for support on the Itanium at U og Tromsø. November 14, 2018 NOTUR Cluster proj. status

A.2 Execution Monitoring (Anshus, Tore Larsen & students, CS, U of T) “Survey of execution monitoring tools for computer clusters” Espen S. Johnsen, Otto J. Anshus, John Markus Bjørndalen, Lars Ailo Bongo, Sept 03 “Performance Monitoring” Lars Ailo Bongo, Otto J. Anshus, John Markus Bjørndalen November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.3 Visualization servers, etc. (Hallgren, Elster & students, CS, NTNU) On going work with Torbjørn Vik Preliminary report on survey of how clusters are currently used in visualization: To types of Cluster usages:: off-line (non-real-time rendering). Often called "renderingfarms" with lots of nodes which all work on a frame each of a larger animation. Typically used in the film industry and other areas where interactivity and/or real-time rendering not needed. All larger 3D modelling programs such as Lightwave, 3DStudio, Maya has functionality for this. * on-line ( realtime). Most interesting from a technical viewpoint... November 14, 2018 NOTUR Cluster proj. status

A.3 Visualization servers, etc. - Contin. Cluster brukes innenfor interaktiv visualiseringsprogramvare for å øke ytelsen, muliggjøre større datasett, unngå begrensninger i lokal hardware. De fleste visualiseringscluster fungerer prinsipielt ved at en bruker sitter på en klientmaskin som i seg selv ikke har noe særlig kapasitet. Clusteret tar seg av all beregning og sender bare de ferdige bildene til klienten. Klientmaskinen sørger også for å ta imot input fra bruker og sende disse til cluster. Datasett for slik visualisering er ofte svært store, og, avhengig av situasjonen, brukes både polygonbasert og voxelbasert rendering. Hovedproblemet med å få clusters brukbare innenfor interaktive visualiseringsprogram er forsinkelser pga nettverk. Dette løses ved å redusere tiden som brukes for å overføre bilder mellom cluster og klient. Det kan enten løses ved å redusere datamengden (komprimeringsmetoder) eller øke nettverksytelsen. Eller begge. Parallelitet i selve clusteret baseres på uavhengighetsforhold mellom forskjellige data. Det kan være uavhengigheter mellom forskjellige deler i samme datasett, eller det kan være uavhengigheter mellom forskjellige frames i et 4D datasett. Load-balancing blir ofte et problem i slike sammenhenger og er et viktig forskningsområde. Hvilken metode som brukes for load-balancing er som oftest svært kontekstavhengig. Clusterprogramvare for visualisering fremdeles manglende ?? November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.4 Impact of future numerical algorithms (Rønquist & student, Dept. of Mathematics, NTNU Rønquist student Staff (now at Simulasenteret) wrote a report based on his summer jobb May add in experiences from Elster’s group – fall 2003 November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status A.5 Interface with NOTUR ET – Grid Project (Elster, Harald Simonsen and colleagues, staff & students associated with the NOTUR ET Cluster & Grid projects) Test node established at NTNU Andreas Botnen(USIT) and Robin Holtet (IDI, now ITEA) May use IDI’s 30-40-node cluster in testgrid Meetings Between Elster and Simonsen’s groups Robin Holtet and Elster’s student Thorvald Natvig to Linköping meeting this month. Collaborations re. National GRID and EEGE Student from NTNU and UiO at CERN November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status Main cluster issues: Global operations have more severe impact on cluster performance than traditional supercomputers since communication between processors take relatively more of the total execution time SCALABILITY!! November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status Lessons leared Clusters generally have cheap hardware, but may cause increased ”hidden” costs regarding: More incompatible compilers, especially Fortran 90 (also C++) Some applications are non-trivial to port from a share-memory paradigm to a distributed memory paradigms Some applications require high-bandwidth interconnects which drive up costs (e.g. SGI Altix) Power and cooling costs (ref. Brian Vinter) Stability, recovery Overall costs and scalability should be further studied November 14, 2018 NOTUR Cluster proj. status

The ”Ideal” Cluster -- Hardware High-bandwidth network Low-latency network Low Operating System overhead (tcp causes ”slow start”) Great floating-point performance (64-bit processors or more?) November 14, 2018 NOTUR Cluster proj. status

The ”Ideal” Cluster -- Software Compiler that is: Portable Optimizing Do extra work to save communication Self-tuning /Load -balanced Automatic selection of best algorithm One-sided communication support? Optimized middleware November 14, 2018 NOTUR Cluster proj. status

NOTUR Cluster proj. status For more information: A dozen or more reports associated with this project will be made available on the web at: http://www.idi.ntnu.no/~elster Email: elster@idi.ntnu.no November 14, 2018 NOTUR Cluster proj. status