NOW and Beyond Workshop on Clusters and Computational Grids for Scientific Computing David E. Culler Computer Science Division Univ. of California, Berkeley.

Slides:



Advertisements
Similar presentations
-Grids and the OptIPuter Software Architecture Andrew A. Chien Director, Center for Networked Systems SAIC Chair Professor, Computer Science and Engineering.
Advertisements

Distributed Processing, Client/Server and Clusters
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
High Performance Computing Course Notes Grid Computing.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Unique Opportunities in Experimental Computer Systems Research - the Berkeley Testbeds David Culler U.C. Berkeley Grad.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
Towards I-Space Ninja Mini-Retreat June 11, 1997 David Culler, Steve Gribble, Mark Stemm, Matt Welsh Computer Science Division U.C. Berkeley.
IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
OPERATING SYSTEM OVERVIEW
NOW 1 Berkeley NOW Project David E. Culler Sun Visit May 1, 1998.
6/28/98SPAA/PODC1 High-Performance Clusters part 2: Generality David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June.
ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
Figure 1.1 Interaction between applications and the operating system.
TITAN: A Next-Generation Infrastructure for Integrating and Communication David E. Culler Computer Science Division U.C. Berkeley NSF Research Infrastructure.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
PRASHANTHI NARAYAN NETTEM.
Packing for the Expedition David Culler. 5/25/992 Ongoing Endeavors Millennium: building a large distributed experimental testbed –Berkeley Cluster Software.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.
Millennium Executive Committee Meeting David E. Culler Computer Science Division
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
SimMillennium Systems Requirements and Challenges David E. Culler Computer Science Division U.C. Berkeley NSF Site Visit March 2, 1998.
Seminar On Rain Technology
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
System Models Advanced Operating Systems Nael Abu-halaweh.
Module 12: I/O Systems I/O hardware Application I/O Interface
Introduction to Distributed Platforms
Berkeley Cluster Projects
Grid Computing.
University of Technology
IBM Pervasive Computing Visit June 9, 1997
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
CS703 - Advanced Operating Systems
Multiple Processor Systems
Introduction to Operating Systems
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Operating Systems : Overview
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

NOW and Beyond Workshop on Clusters and Computational Grids for Scientific Computing David E. Culler Computer Science Division Univ. of California, Berkeley

7/30/98HPDC Panel2 NOW Project Goals Make a fundamental change in how we design and construct large-scale systems –market reality: »50%/year performance growth => cannot allow 1-2 year engineering lag –technological opportunity: »single-chip “Killer Switch” => fast, scalable communication Highly integrated building-wide system Explore novel system design concepts in this new “cluster” paradigm

7/30/98HPDC Panel3 Berkeley NOW 100 Sun UltraSparcs –200 disks Myrinet SAN –160 MB/s Fast comm. –AM, MPI,... Ether/ATM switched external net Global OS Self Config

7/30/98HPDC Panel4 Landmarks Top 500 Linpack Performance List MPI, NPB performance on par with MPPs RSA 40-bit Key challenge World Leading External Sort Inktomi search engine NPACI resource site

7/30/98HPDC Panel5 Taking Stock Surprising successes –virtual networks –implicit co-scheduling –reactive IO –service-based applications –automatic network mapping Surprising unsuccesses –global system layer –xFS file system New directions for Millennium –Paranoid construction –Computational Economy –Smart Clients

7/30/98HPDC Panel6 Fast Communication Fast communication on clusters is obtained through direct access to the network, as on MPPs Challenge is make this general purpose –system implementation should not dictate how it can be used

7/30/98HPDC Panel7 Virtual Networks Endpoint abstracts the notion of “attached to the network” Virtual network is a collection of endpoints that can name each other. Many processes on a node can each have many endpoints, each with own protection domain.

7/30/98HPDC Panel8 Process 3 How are they managed? How do you get direct hardware access for performance with a large space of logical resources? Just like virtual memory –active portion of large logical space is bound to physical resources Process n Process 2 Process 1 *** Host Memory Processor NIC Mem Network Interface P

7/30/98HPDC Panel9 Network Interface Support NIC has endpoint frames Services active endpoints Signals misses to driver –using a system endpont Frame 0 Frame 7 Transmit Receive EndPoint Miss

7/30/98HPDC Panel10 Communication under Load Client Server Msg burst work => Use of networking resources adapts to demand.

7/30/98HPDC Panel11 Implicit Coscheduling Problem: parallel programs designed to run in parallel => huge slowdowns with local scheduling –gang scheduling is rigid, fault prone, and complex Coordinate schedulers implicitly using the communication in the program –very easy to build, robust to component failures –inherently “service on-demand”, scalable –Local service component can evolve. A LS A GS A LS GS A LS A GS LS A GS

7/30/98HPDC Panel12 Why it works Infer non-local state from local observations React to maintain coordination observationimplication action fast response partner scheduledspin delayed response partner not scheduledblock WS 1 Job A WS 2 Job BJob A WS 3 Job BJob A WS 4 Job BJob A sleep spin requestresponse

7/30/98HPDC Panel13 Example Range of granularity and load imbalance –spin wait 10x slowdown

7/30/98HPDC Panel14 I/O Lessons from NOW sort Complete system on every node powerful basis for data intensive computing –complete disk sub-system –independent file systems »MMAP not read, MADVISE –full OS => threads Remote I/O (with fast comm.) provides same bandwidth as local I/O. I/O performance is very tempermental –variations in disk speeds –variations within a disk –variations in processing, interrupts, messaging,...

7/30/98HPDC Panel15 Reactive I/O Loosen data semantics –ex: unordered bag of records Build flows from producers (eg. Disks) to consumers (eg. Summation) Flow data to where it can be consumed D A D A D A D A D A D A D A D A Distributed Queue Static Parallel Aggregation Adaptive Parallel Aggregation

7/30/98HPDC Panel16 Performance Scaling Allows more data to go to faster consumer

7/30/98HPDC Panel17 Service Based Applications Application provides services to clients Grows/Shrinks according to demand, availability, and faults Service request Front-end service threads Caches User Profile Database Manager Physical processor Transcend Transcoding Proxy

7/30/98HPDC Panel18 On the other hand Glunix –offered much that was not available elsewhere »interactive use, load balancing, transparency (partial), … –straightforward master-slaves architecture –millions of jobs served, reasonable scalability, flexible partitioning –crash-prone, inscrutable, unaware, … xFS –very sophisticated co-operative caching + network RAID –integrated at vnode layer –never robust enough for real use Both are hard, outstanding problems

7/30/98HPDC Panel19 Lessons Strength of clusters comes from –complete, independent components –incremental scalability (up and down) –nodal isolation Performance heterogeneity and change are fundamental Subsystems and applications need to be reactive and self-tuning Local intelligence + simple, flexible composition

7/30/98HPDC Panel20 Millennium Campus-wide cluster of clusters PC based (Solaris/x86 and NT) Distributed ownership and control Computational science and internet systems testbed Gigabit Ethernet SIMS C.S. E.E. M.E. BMRC N.E. IEOR C. E. MSME NERSC Transport Business Chemistry Astro Physics Biology Economy Math

7/30/98HPDC Panel21 Paranoid Construction What must work for RSH, dCOM, RMI, read, …? A page of C to safely read a line from a socket! => carefully controlled set of cluster system op’s => non-blocking with timeout and full error checking –even if need a watcher thread => optimistic with fail-over of implementation => global capability at physical level => indirection used for transparency must track fault envelope, not just provide mapping

7/30/98HPDC Panel22 Computational Economy Approach System has a supply of various resources Demand for resources revealed in price –distinct from the cost of acquiring the resources User has unique assessment of value Client agent negotiates for system resources on user’s behalf –submits requests, receives bids or participates in auctions –selects resources of highest value at least cost

7/30/98HPDC Panel23 Advantages of the Approach Decentralized load balancing –according to user’s perception of importance, not system’s –adapts to system and workload changes Creates Incentive to adopt efficient modes of use –maintain resources in usable form –avoid excessive usage when needed by others –exploit under-utilized resources –maximize flexibility (e.g., migratable, restartable applications) Establishes user-to-user feedback on resource usage –basis for exchange rate across resources Powerful framework for system design –Natural for client to be watchful, proactive, and wary –Generalizes from resources to services Rich body of theory ready for application

7/30/98HPDC Panel24 Resource Allocation Traditional approach allocates requests to resources to optimize some system utility function –e.g., put work on least loaded, most free mem, short queue,... Economic approach views each user as having a distinct utility function –e.g., can exchange resource and have both happy! Allocator Stream of (incomplete) Client Requests Stream of (partial, delayed, or incomplete) resource status information

7/30/98HPDC Panel25 Pricing and all that What’s the value of a CPU-minute, a MB-sec, a GB-day? Many iterative market schemes –raise price till load drops Auctions avoid setting a price –Vikrey (second price sealed bid) will cause resources to go to where they are most valued at the lowest price –In self-interest to reveal true utility function! Small problem: auctions are awkward for most real allocation problems Big problem: people (and their surrogates) don’t know what value to place on computation and storage!

7/30/98HPDC Panel26 Smart Clients Adopt the NT “everything is two-tier, at least” –UI stays on the desktop and interacts with computation “in the cluster of clusters” via distributed objects –Single-system image provided by wrapper Client can provide complete functionality –resource discovery, load balancing –request remote execution service Flexible appln’s will monitor availability and adapt. Higher level services 3-tier optimization –directory service, membership, parallel startup

7/30/98HPDC Panel27 Everything is a service Load-balancing Brokering Replication Directories => they need to be cost-effective or client will fall back to “self support” –if they are cost-effective, competitors might arise Useful applications should be packaged as services –their value may be greater than the cost of resources consumed

7/30/98HPDC Panel28 Conclusions We’ve got the building blocks for very interesting clustered systems –fast communication, authentication, directories, distributed object models Transparency and uniform access are convenient, but... It is time to focus on exploiting the new characteristics of these systems in novel ways. We need to get real serious about availability. Agility (wary, reactive, adaptive) is fundamental. Gronky “F77 + MPI and no IO” codes will seriously hold us back Need to provide a better framework for cluster applications