March 22, 2000Dr. Thomas Sterling, Caltech1. Networking Options for Beowulf Clusters Dr. Thomas Sterling California Institute of Technology and NASA Jet.

Slides:



Advertisements
Similar presentations
Clusters, Grids and their applications in Physics David Barnes (Astro) Lyle Winton (EPP)
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Beowulf Clusters Matthew Doney. What is a cluster?  A cluster is a group of several computers connected  Several different methods of connecting them.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Chapter 1: Introduction We begin with a brief, introductory look at the components in a computer system We will then consider the evolution of computer.
2. Computer Clusters for Scalable Parallel Computing
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
Types of Parallel Computers
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Chapter Chapter Goals Describe the layers of a computer system Describe the concept of abstraction and its relationship to computing Describe.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Lecture 1: Introduction to High Performance Computing.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Chapter 1 The Big Picture Chapter Goals Describe the layers of a computer system Describe the concept of abstraction and its relationship to computing.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
CENG 546 Dr. Esma Yıldırım. Copyright © 2012, Elsevier Inc. All rights reserved What is a computing cluster?  A computing cluster consists of.
High-Performance Computing 12.1: Concurrent Processing.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Chapter 1 The Big Picture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Ohio Supercomputer Center Cluster Computing Overview Summer Institute for Advanced Computing August 22, 2000 Doug Johnson, OSC.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
Beowulf Cluster Jon Green Jay Hutchinson Scott Hussey Mentor: Hongchi Shi.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
3/15/2002CSE Final Remarks Concluding Remarks SOAP.
1 SIAC 2000 Program. 2 SIAC 2000 at a Glance AMLunchPMDinner SunCondor MonNOWHPCGlobusClusters TuePVMMPIClustersHPVM WedCondorHPVM.
October 12, 2004Thomas Sterling - Caltech & JPL 1 Roadmap and Change How Much and How Fast Thomas Sterling California Institute of Technology and NASA.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
CS591x -Cluster Computing and Parallel Programming
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Academic PowerPoint Computer System – Architecture.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Computer Engineering Department (KFUPM) Computer Engineering Department Sadiq M. Sait College of Computer Sciences and Engineering.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Generations of Computing. The Computer Era Begins: The First Generation  1950s: First Generation for hardware and software Vacuum tubes worked as memory.
Open Source and Business Issues © 2004 Northrop Grumman Corp. All rights reserved. 1 Grid and Beowulf : A Look into the Near Future NorthNorth F-18C Weapons.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Constructing a system with multiple computers or processors
What is Parallel and Distributed computing?
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Types of Parallel Computers
Cluster Computers.
Presentation transcript:

March 22, 2000Dr. Thomas Sterling, Caltech1

Networking Options for Beowulf Clusters Dr. Thomas Sterling California Institute of Technology and NASA Jet Propulsion Laboratory March 22, 2000 Presentation to the American Physical Society:

March 22, 2000Dr. Thomas Sterling, Caltech3

March 22, 2000Dr. Thomas Sterling, Caltech4

March 22, 2000Dr. Thomas Sterling, Caltech5 Points of Inflection Computing Heroic Era (1950) –technology: vacuum tubes, mercury delay lines, pulse transformers –architecture: accumulator based –model: von-Neumann, sequential instruction execution –examples: Whirlwind, EDSAC Mainframe (1960) –technology: transistors, core memory, disk drives –architecture: register bank based –model: reentrant concurrent processes –examples: IBM 7042, 7090, PDP-1 Scientific Computer(1970) –technology: earliest SSI logic gate modules –architecture: virtual memory –model: parallel processing –examples: CDC 6600, Goodyear STARAN

March 22, 2000Dr. Thomas Sterling, Caltech6 Points of Inflection in the History of Computing Supercomputers (1980) –technology: ECL, semiconductor integration, RAM –architecture: pipelined –model: vector –example: Cray-1 Massively Parallel Processing (1990) –technology: VLSI, microprocessor, –architecture: MIMD –model: Communicating Sequential Processes, Message passing –examples: TMC CM-5, Intel Paragon ? (2000) –trans-teraflops epoch

March 22, 2000Dr. Thomas Sterling, Caltech7

March 22, 2000Dr. Thomas Sterling, Caltech8

March 22, 2000Dr. Thomas Sterling, Caltech9 Punctuated Equilibrium nonlinear dynamics drive to point of inflexion Drastic reduction in vendor support for HPC Component technology for PCs matches workstation capability PC hosted software environments achieve sophistication and robustness of mainframe O/S Low cost network hardware and software enable balanced PC clusters MPPs establish low level of expectation Cross-platform parallel programming model

March 22, 2000Dr. Thomas Sterling, Caltech10 BEOWULF-CLASS SYSTEMS Cluster of PCs –Intel x86 –DEC Alpha –Mac Power PC Pure M 2 COTS Unix-like O/S with source –Linux, BSD, Solaris Message passing programming model –PVM, MPI, BSP, homebrew remedies Single user environments Large science and engineering applications

March 22, 2000Dr. Thomas Sterling, Caltech11

March 22, 2000Dr. Thomas Sterling, Caltech12 Beowulf-class Systems A New Paradigm for the Business of Computing Brings high end computing to broad ranged problems –new markets Order of magnitude Price-Performance advantage Commodity enabled –no long development lead times Low vulnerability to vendor-specific decisions –companies are ephemeral; Beowulfs are forever Rapid response technology tracking Just-in-place user-driven configuration –requirement responsive Industry-wide, non-proprietary software environment

March 22, 2000Dr. Thomas Sterling, Caltech13

March 22, 2000Dr. Thomas Sterling, Caltech14 Have to Run Big Problems on Big Machines? Its work, not peak flops A user’s throughput over application cycle Big machines yield little slices –due to time and space sharing But data set memory requirements –wide range of data set needs, three order of magnitude –latency tolerant algorithms enable out-of-core computation What is Beowulf breakpoint for price-performance?

March 22, 2000Dr. Thomas Sterling, Caltech15 Throughput Turbochargers Recurring costs approx.. 10% MPPs Rapid response to technology advances Just-in-place configuration and reconfigurable High reliability Easily maintained through low cost replacement Consistent portable programming model –Unix, C, Fortran, Message passing Applicable to wide range of problems and algorithms Double machine room throughput at a tenth the cost Provides super-linear speedup

March 22, 2000Dr. Thomas Sterling, Caltech16 Beowulf Project - A Brief History Started in late 1993 NASA Goddard Space Flight Center –NASA JPL, Caltech, academic and industrial collaborators Sponsored by NASA HPCC Program Applications: single user science station –data intensive –low cost General focus: –single user (dedicated) science and engineering applications –out of core computation –system scalability –Ethernet drivers for Linux

March 22, 2000Dr. Thomas Sterling, Caltech17 Beowulf System at JPL (Hyglac) 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory, Fast Ethernet card. Connected using 100Base-T network, through a 16-way crossbar switch. u Theoretical peak performance: 3.2 GFlop/s. u Achieved sustained performance: 1.26 GFlop/s.

March 22, 2000Dr. Thomas Sterling, Caltech18 A 10 Gflops Beowulf California Institute of Technology Center for Advance Computing Research 172 Intel Pentium Pro microprocessors

March 22, 2000Dr. Thomas Sterling, Caltech19 Avalon architecture and price.

March 22, 2000Dr. Thomas Sterling, Caltech20 1st printing: May, nd printing: Aug MIT Press

March 22, 2000Dr. Thomas Sterling, Caltech21 Beowulf at Work

March 22, 2000Dr. Thomas Sterling, Caltech22 Beowulf Scalability

March 22, 2000Dr. Thomas Sterling, Caltech23 Electro-dynamic FDTD Code All timing data is in CPU seconds/simulated time step, for a global grid size of 282  362  102, distributed on 16 processors.

March 22, 2000Dr. Thomas Sterling, Caltech24 Network Topology Scaling Latencies (  s)

March 22, 2000Dr. Thomas Sterling, Caltech25 Routed Network - Random Pattern

March 22, 2000Dr. Thomas Sterling, Caltech26

March 22, 2000Dr. Thomas Sterling, Caltech27 System Area Network Technologies Fast Ethernet –LAN, 100 Mbps, 100 usec Gigabit Ethernet –LAN/SAN, 1000 Mbps, 50 usec ATM –WAN/LAN, 155/620 Mbps, Myrinet –SAN, 1250 Mbps, 20 usec Giganet –SAN/VIA, 1000 Gbps, 5 usec Servernet II –SAN/VIA, 1000 Gbps, 10 usec SCI –SAN, 8000 Gbps, 5 usec

March 22, 2000Dr. Thomas Sterling, Caltech28 3Com CoreBuilder 9400 Switch and Gigabit Ethernet NIC

March 22, 2000Dr. Thomas Sterling, Caltech29 Lucent Cajun M770 Multifunction Switch

March 22, 2000Dr. Thomas Sterling, Caltech30 M2LM-SW16 16-Port Myrinet Switch with 8 SAN ports and 8 LAN ports

March 22, 2000Dr. Thomas Sterling, Caltech31 Dolphin Modular SCI Switch for System Area Networks

March 22, 2000Dr. Thomas Sterling, Caltech32 Giganet High Performance Host Adapters

March 22, 2000Dr. Thomas Sterling, Caltech33 Giganet High Performance Cluster Switch

March 22, 2000Dr. Thomas Sterling, Caltech34

March 22, 2000Dr. Thomas Sterling, Caltech35

March 22, 2000Dr. Thomas Sterling, Caltech36

March 22, 2000Dr. Thomas Sterling, Caltech37

March 22, 2000Dr. Thomas Sterling, Caltech38

March 22, 2000Dr. Thomas Sterling, Caltech39

March 22, 2000Dr. Thomas Sterling, Caltech40 The Beowulf Delta looking forward 6 years Clock rate: X 4 flops (per chip): X 50 (2-4 proc/chip, 4-8 way ILP/proc) #processors: 32 Networking: X 32 ( Gbps) Memory: X 10 (4 Gbytes) Disk: X 100 price-performance: X 50 system performance: 50 Tflops

March 22, 2000Dr. Thomas Sterling, Caltech41 Million $$ Teraflops Beowulf? Today, $3M peak Tflops < year 2002 $1M peak Tflops Performance efficiency is serious challenge System integration –does vendor support of massive parallelism have to mean massive markup System administration, boring but necessary Maintenance without vendors; how? –New kind of vendors for support Heterogeneity will become major aspect

March 22, 2000Dr. Thomas Sterling, Caltech42