Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Chabot College Chapter 2 Review Questions Semester IIIELEC Semester III ELEC
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Types of Parallel Computers
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
Low Overhead Fault Tolerant Networking (in Myrinet)
An Overview of Myrinet By: Ralph Zajac. What is Myrinet? n LAN designed for clusters n Based on USCD’s ATOMIC LAN n Has many characteristics of MPP message-passing.
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
Figure 1.1 Interaction between applications and the operating system.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Storage area network and System area network (SAN)
Router Architectures An overview of router architectures.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
DataLink Layer1 Ethernet Technologies: 10Base2 10: 10Mbps; 2: 200 meters (actual is 185m) max distance between any two nodes without repeaters thin coaxial.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Introduction to IT and Communications Technology Justin Champion C208 – 3292 Ethernet Switching CE
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
Link Layer and Wireless CS144 Review Session 7 May 16, 2008 Ben Nham.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
A record and replay mechanism using programmable network interface cards Laurent Lefèvre INRIA / LIP (UMR CNRS, INRIA, ENS, UCB)
12/8/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
SLAAC/ACS API: A Status Report Virginia Tech Configurable Computing Lab SLAAC Retreat March 1999.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Interconnection network network interface and a case study.
Software Overhead in Messaging Layers Pitch Patarasuk.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
High Performance and Reliable Multicast over Myrinet/GM-2
J.M. Landgraf, M.J. LeVine, A. Ljubicic, Jr., M.W. Schulz
CS 286 Computer Organization and Architecture
High Performance Messaging on Workstations
Myrinet 2Gbps Networks (
Cluster Computers.
Presentation transcript:

Cluster Computers

Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing (idle) machines or use (new) dedicated machines Cluster computers versus supercomputers –Processing power is similar: based on microprocessors –Communication performance was the key difference –Modern networks (Myrinet, SCI, Infiniband) may bridge this gap

Overview Cluster computers at our department –128-node Pentium-Pro/Myrinet cluster –72-node dual-Pentium-III/Myrinet-2000 cluster –Part of a wide-area system: Distributed ASCI Supercomputer Network interface protocols for Myrinet –Low-level systems software –Partly runs on the network interface card (firmware)

Distributed ASCI Supercomputer ( )

Node configuration 200 MHz Pentium Pro 128 MB memory 2.5 GB disk Fast Ethernet 100 Mbit/s Myrinet 1.28 Gbit/s (full duplex) Operating system: Red Hat Linux

DAS-2 Cluster (2002-now) 72 nodes, each with 2 CPUs (144 CPUs in total) 1 GHz Pentium-III 1 GB memory per node 20 GB disk Fast Ethernet 100 Mbit/s Myrinet Gbit/s (crossbar) Operating system: Red Hat Linux Part of wide-area DAS-2 system (5 clusters with 200 nodes in total) Myrinet switch Ethernet switch

Myrinet Components: 8-port switches Network interface card for each node (on PCI bus) Electrical cables: reliable links Myrinet switches: 8 x 8 crossbar switch Each port connects to a node (network interface) or another switch Source-based, cut-through routing Less than 1 microsecond switching delay

24-node DAS-1 cluster

128-node DAS-1 cluster Ring topology would have: –22 switches –Poor diameter: 11 –Poor bisection width: 2

Topology 128-node cluster PC 4 x 8 grid with wrap-around –Each switch is connected to 4 other switches and 4 PCs –32 switches (128/4) –Diameter: 6 –Bisection width: 8

Myrinet interface board Hardware 40 MHz custom cpu (LANai 4.1) 1 MByte SRAM 3 DMA engines (send, receive, to/from host) full duplex Myrinet link PCI bus interface Software LANai Control Program (LCP)

Properties of Myrinet Programmable processor on the network interface –Slow (40 MHz) NI on the I/O bus, not the memory bus –Synchronization between host and NI is expensive Messages are staged through NI memory

Network interface protocols for Myrinet Myrinet has programmable Network Interface processor –Gives much flexibility to protocol designer NI protocol: low-level software running on NI and host Used to implement higher-level programming languages and libraries Critical for performance –Want few μsec latency, 10s MB/sec throughput Goal: give supercomputer communication performance to clusters

Basic Network Interface protocol for Myrinet Implement simple interface: –send (destination, buffer); –poll(); –handle_packet (buffer); Map network interface (NI) into user space to avoid OS overhead –No protection (or sharing) No flow control –Drop messages if buffers overrun –Unreliable communication

Basic NI protocol - Overview

Basic NI protocol - Sending packets

Issues Optimizing throughput using Programmed I/O instead of DMA Making communication reliable using flow control How to receive messages: polling overhead => Interrupts vs. polling Efficient multicast communication

Control transfers: polling versus interrupts Interrupts –User-level signal handlers are very expensive (24 μsec on BSD/OS) Polling –Hard to determine optimal polling rate –Burdon on programmer or compiler Combine polling and interrupts –Host polls when idle, else it enables interrupts –Requires integration with thread scheduler Polling watchdog (LFC) –Generate interrupt only if host does not poll within T μsec –Implemented using timer on NI

Multicast Implement spanning tree forward protocol on NIs –Reduces forward latency –No interrupts on hosts

Performance on DAS 9.6 μsec 1-way null-latency 57.7 MB/sec point-to-point throughput 48.0 μsec multicast null-latency 11.0 MB /sec multicast throughput