NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

Distributed Systems CS
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
High-Performance Clusters part 1: Performance David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June 28, 1998.
2. Computer Clusters for Scalable Parallel Computing
Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000 Frederick Wong, Rich Martin, Remzi Arpaci-Dusseau, David Wu, and.
Distributed Processing, Client/Server, and Clusters
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley
Introduction to Systems Architecture Kieran Mathieson.
NOW 1 Berkeley NOW Project David E. Culler Sun Visit May 1, 1998.
6/28/98SPAA/PODC1 High-Performance Clusters part 2: Generality David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
1 Recap. 2 No. of Processors C.P.I Computational Power Improvement Multiprocessor Uniprocessor.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Hardware/Software Concepts Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
DISTRIBUTED COMPUTING
NETWORKING HARDWARE.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Computer System Architectures Computer System Software
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Operating System Principles And Multitasking
VMware vSphere Configuration and Management v6
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Parallel IO for Cluster Computing Tran, Van Hoai.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Networking Week #10 OBJECTIVES Chapter #6 Questions Review Chapter #8.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Berkeley Cluster Projects
University of Technology
Results of Prior NSF RI Grant: TITAN
Presentation transcript:

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and the NPACI Berkeley NOW David E. Culler Computer Science Division U.C. Berkeley

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 2 Architectural Drivers Node architecture dominates performance –processor, cache, bus, and memory –design and engineering $ => performance Greatest demand for performance is on large systems –must track the leading edge of technology without lag MPP network technology => mainstream –system area networks System on every node is a powerful enabler –very high speed I/O, virtual memory, schedulings, … Incremental scalability (up, down, and across) Complete software tools Wide class of applications

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 3 Berkeley NOW 100 Sun UltraSparcs –200 disks Myrinet SAN –160 MB/s Fast comm. –AM, MPI,... Ether/ATM switched external net Global OS Self Config

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 4 Basic Components $ P M I/O bus MyriNet P Sun Ultra 170 Myricom NIC 160 MB/s M

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 5 Massive Cheap Storage Cluster Basic unit: 2 PCs double-ending four SCSI chains of 8 disks each Currently serving Fine Art at

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 6 Cluster of SMPs (CLUMPS) Four Sun E5000s –8 processors –4 Myricom NICs each Multiprocessor, Multi- NIC, Multi-Protocol NPACI => Sun 450s

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 7 Millennium PC Clumps Inexpensive, easy to manage Cluster Replicated in many departments Prototype for very large PC cluster

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 8 So What’s So Different? Commodity parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node –virtual memory –scheduler –files –...

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 9 Communication Performance  Direct Network Access LogP: Latency, Overhead, and Bandwidth Active Messages: lean layer supporting programming models Latency1/BW

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 10 MPI Performance

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 11 NAS Parallel Benchmarks

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 12 World-Record Disk-to-Disk Sort Sustain 500 MB/s disk bandwidth and 1,000 MB/s network bandwidth

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 13 General purpose Parallel System Many timeshared processes –each with direct, protected access –partition it any way you like User and system Client/Server, Parallel clients, parallel servers –they grow, shrink, handle node failures Multiple packages in a process –each may have own internal communication layer Use communication as easily as memory

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 14 Virtual Networks Endpoint abstracts the notion of “attached to the network” Virtual network is a collection of endpoints that can name each other. Many processes on a node can each have many endpoints, each with own protection domain.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 15 Process 3 How are they managed? How do you get direct hardware access for performance with a large space of logical resources? Just like virtual memory –active portion of large logical space is bound to physical resources Process n Process 2 Process 1 *** Host Memory Processor NIC Mem Network Interface P

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 16 Network Interface Support NIC has endpoint frames Services active endpoints Signals misses to driver –using a system endpont Frame 0 Frame 7 Transmit Receive EndPoint Miss

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 17 Communication under Load Client Server Msg burst work

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 18 Beyond the Personal Supercomputer Able to timeshare parallel programs –with fast, protected communication Mix with sequential and interactive jobs Use fast communication in OS subsystems –parallel file system, network virtual memory, … Nodes have powerful, local OS scheduler Simple implicit scheduling techniques provide coordinated scheduling => ride workstation/PC nodes and internet server systems technology => focus CS partners on RAS for long running apps