SimMillennium Systems Requirements and Challenges David E. Culler Computer Science Division U.C. Berkeley NSF Site Visit March 2, 1998.

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Distributed Systems CS
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Multiple Processor Systems
Unique Opportunities in Experimental Computer Systems Research - the Berkeley Testbeds David Culler U.C. Berkeley Grad.
Distributed Processing, Client/Server, and Clusters
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Millennium: Computer Systems, Computational Science and Engineering in the Large David Culler, J. Demmel, E. Brewer, J. Canny, A. Joseph, J. Landay, S.
Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.
Millennium: Cluster Technology for Computational Science and Engineering David Culler E. Brewer, J. Canny, J. Demmel, A. Joseph, J. Landay, S. McCanne.
Software Engineering and Middleware: a Roadmap by Wolfgang Emmerich Ebru Dincel Sahitya Gupta.
NOW and Beyond Workshop on Clusters and Computational Grids for Scientific Computing David E. Culler Computer Science Division Univ. of California, Berkeley.
ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
Connecting the Invisible Extremes of Computing David Culler U.C. Berkeley Summer Inst. on Invisible Computing July,
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
PRASHANTHI NARAYAN NETTEM.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Storage area network and System area network (SAN)
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
PMIT-6102 Advanced Database Systems
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
SimMillennium Project Overview David E. Culler Computer Science Division U.C. Berkeley NSF Site Visit March 2, 1998.
CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.
Distributed System Concepts and Architectures Services
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
Millennium Executive Committee Meeting David E. Culler Computer Science Division
Interconnection network network interface and a case study.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Tackling I/O Issues 1 David Race 16 March 2010.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Chapter 1: Introduction
Programming Models for SimMillennium
Scaling for the Future Katherine Yelick U.C. Berkeley, EECS
University of Technology
QNX Technology Overview
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Introduction to Operating Systems
Distributed Systems CS
Cloud Computing Architecture
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Presentation transcript:

SimMillennium Systems Requirements and Challenges David E. Culler Computer Science Division U.C. Berkeley NSF Site Visit March 2, 1998

System Design2 Research Issues Bottom-up Node Design Cluster Network, API, and Prog. Model Inter-cluster network Remote Execution Foundations of a Computational Economy Design on the crest of technology transformation Design for scale

March 2, 1998System Design3 Node Design for a Large Cluster Classic Architecture Problem “in the large” Basic node has several degrees of freedom –processors per node (4, 2, 1)- Disks –memory capacity- Space, Volume –PCI busses- Power Cost is well-defined (Intel) Workload is defined by real applications Design against technology change –Quad PPro, Dual P II, P II, … Merced –Processor predictable, system aspects more difficult

March 2, 1998System Design4 Cluster Design Adds additional degrees of freedom –network –network interfaces Given fixed budget, what is the best partitioning of group and campus cluster resources? –Spectrum of workloads –Advancing application experience –Effectiveness of sharing –Technology The infrastructure is itself a research question.

March 2, 1998System Design5 Cluster Interconnect Design Proposed design based on MyriNet –16+8 port switch in fat-tree variant –today offers best latency, BW, simplicity, flexibility, and cost »source-based packet routing, open to the metal –link-by-link flow control with cut-through routing –almost reliable System Area Network (SAN) revolution –Tandem/Compaq ServerNet

March 2, 1998System Design6 Communication Interface Revolution Low Overhead Communication “Happens” Academic Research put it on the map –Active Messages (AM), FM, PM, …Unet –Memory Messaging (Get/Put, Reflective, VMMC, Mem. Chan.) Intel / Microsoft / Compaq recognized it –Virtual Interface Architecture 1.0 released 12/16/97 Apply UCB virtual networks to VIA

March 2, 1998System Design7 Multiprotocol Communication Hardware has two fundamental protocols Communication may involve either At what level is this exposed? –Who must cope with it? Uniform Programming model –Message Passing (MPI) »multiprotocol run-time –Shared address space »shared virtual memory »multiprotocol code-generation Hybrid Programming model –MPI + threads = performance * complexity Shared Memory Access Network Transaction Data Producer Data Consumer

March 2, 1998System Design8 Example: Multiprotocol AM Careful shared-memory programming to get BW within SMP –cache alignment, special copy routine Novel Concurrent Access Algorithm for shared message queue object –lock-free techniques borrowed from non-blocking literature –depends on synchronization operations of instruction set and system timing Attention to network protocol impacts memory protocol –adaptive fractional polling Applications should not be exposed to this

March 2, 1998System Design9 Inter-Cluster Networking Gigabit Ethernet - what was the question? –ATM, FiberChannels, HPPI, Serial HPPI, HPPI 6400, SCI, P1394, … fading fast –standard due in April Not the Ethernet you remember –switched, full duplex - multiframe bursts –broadcast, multicast trees - level 3 switching –flow control - QoS support Network Interfaces –vastly simpler and more flexible (alread 2nd generation) Switches clean and fast Clearly the Storage and Video Transport Is it also the Cluster solution? –VIA/IP

March 2, 1998System Design10 Remote Execution NOW lessons –UNIX syscall / command interface does not virtualize well »inter-positioning helps –Global support more error prone than individual nodes »good design helps »watch-dogs and fast restart help –Explicit coordination tends to be very fragile –Complex system interactions –No allocation policy pleases all => Need looser, more robust design techniques Key developments –Smart Clients: decision making close to the user –Implicit Co-ordination: use naturally occurring events to schedule resources –Virtual Networks: fast communication with multiprogramming

March 2, 1998System Design11 SimMillennium “Smart Client” Adopt the NT “everything is two-tier, at least” –UI stays on the desktop and interacts with computation “in the cluster” via distributed objects –Single-system image provided by wrapper Client can provide complete functionality –resource discovery, load balancing –request remote execution service Higher level services 3-tier optimization –directory service, membership, parallel startup

March 2, 1998System Design12 What about NT? In many ways a better framework –COM -> dCOM -> cluster components –cleaner internal structure –better tools –Active Directory a powerful tool –WolfPack can be leveraged Most of the basic problems are same Community is in transition Cross system support moving very fast –Java Beans dCOM Strong support from both Sun and Microsoft

March 2, 1998System Design13 SimMillennium Resource Allocation User behavior drives resource allocation –makes a series of requests and is reactive to load –interested in “whole study” Property rights establish “fair share” –each brings resources to the cluster Price determined by competition for the resource Incentive to adopt efficient modes of use –exploit under-utilized resources –maximize flexibility (e.g., migratable, restartable applications) Natural for client to be watchful, proactive, and wary –tends to stabilize load

March 2, 1998System Design14 Primitives for a Comp. Economy Server side –Monitoring of resource usage, enforcement of contracts –major challenge in Unix »build parallel thread structure and interpose on calls »fundamentally same machinery for redirection –supposedly solved in NT 5.0 Client side –agents, protocols, UI Bidding, negotiation, brokering(=> Varian) –RFQs, Auctions have very different requirements –“Lowest Bid” not well-defined, use “highest value” Banking (=> Brewer)

March 2, 1998System Design15 System Administration Uniformity is key Clusters evolve and are constantly changing over time Administrative domains matter => create incentive to simplify administration –more uniform, higher value (=> Joseph)

March 2, 1998System Design16 Systems of Systems Design It is about making things work at large scale –things change, things break, demands extreme Make all components wary, reactive, and self- tuning Use implicit information whenever possible User behavior is critical to closing the loop –when there is personal responsibility SimMillennium is a good model of large scale systems challenges