Cluster Computing Overview CS241 Winter 01 © 1999-2001 Armando Fox

Slides:



Advertisements
Similar presentations
Clustering Technology For Scaleability Jim Gray Microsoft Research
Advertisements

Multiple Processor Systems
Tableau Software Australia
Distributed Processing, Client/Server and Clusters
© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Beowulf Supercomputer System Lee, Jung won CS843.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Netscape Application Server Application Server for Business-Critical Applications Presented By : Khalid Ahmed DS Fall 98.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Distributed Processing, Client/Server, and Clusters
Approaches to Clustering CS444I Internet Services Winter 00 © Armando Fox
Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.
City University London
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
G Robert Grimm New York University Scalable Network Services.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.
G Robert Grimm New York University Scalable Network Services.
Cluster Computing Overview CS444I Internet Services Winter 00 © Armando Fox
Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.
Post-PC Summary Prof. Eric A. Brewer
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Simplify your Job – Automatic Storage Management Angelo Session id:
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
PMIT-6102 Advanced Database Systems
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Computer System Architectures Computer System Software
Windows Azure Conference 2014 Running Docker on Windows Azure.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Overview IS 8040 Data Communications Dr. Hoganson Course Overview Sending signals over a wire –Data: bits – binary (0/1) –How to transmit the digital data:
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility –A well.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Cluster-Based Scalable Network Service Author: Armando Steven D.Gribble Steven D.Gribble Yatin Chawathe Yatin Chawathe Eric A. Brewer Eric A. Brewer Paul.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Background Computer System Architectures Computer System Software.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
CS 540 Database Management Systems
Netscape Application Server
The Case for a Session State Storage Layer
The Client/Server Database Environment
Scaling for the Future Katherine Yelick U.C. Berkeley, EECS
#01 Client/Server Computing
QNX Technology Overview
Web Server Administration
Co-designed Virtual Machines for Reliable Computer Systems
Introduction To Distributed Systems
#01 Client/Server Computing
Presentation transcript:

Cluster Computing Overview CS241 Winter 01 © Armando Fox

© 2001 Stanford Today’s Outline n Clustering: the Holy Grail l The Case For NOW l Clustering and Internet Services l Meeting the Cluster Challenges n Cluster case studies l GLUnix l SNS/TACC l DDS?

© 2001 Stanford Cluster Prehistory: Tandem NonStop n Early (1974) foray into transparent fault tolerance through redundancy l Mirror everything (CPU, storage, power supplies…), can tolerate any single fault (later: processor duplexing) l “Hot standby” process pair approach l What’s the difference between high availability and fault tolerance? n Noteworthy l “Shared nothing”--why? l Performance and efficiency costs? l Later evolved into Tandem Himalaya, which used clustering for both higher performance and higher availability

© 2001 Stanford Pre-NOW Clustering in the 90’s n IBM Parallel Sysplex and DEC OpenVMS l Targeted at conservative (read: mainframe) customers l Shared disks allowed under both (why?) l All devices have cluster-wide names (shared everything?) l 1500 installations of Sysplex, 25,000 of OpenVMS Cluster n Programming the clusters l All System/390 and/or VAX VMS subsystems were rewritten to be cluster-aware l OpenVMS: cluster support exists even in single-node OS! l An advantage of locking into proprietary interface

© 2001 Stanford Networks of Workstations: Holy Grail Use clusters of workstations instead of a supercomputer. n The case for NOW l difficult for custom designs to track technology trends (e.g. uproc perf. increases at 50%/yr, but design cycles are 2-4 yrs) l No economy of scale in 100s => +$ l Software incompatibility (OS & apps) => +$$$$ l “Scale makes availability affordable” (Pfister) l “systems of systems” can aggressively use off-the-shelf hardware and OS software n New challenges (“the case against NOW”): l performance and bug-tracking vs. dedicated system l underlying system is changing underneath you l underlying system is poorly documented

© 2001 Stanford Clusters: “Enhanced Standard Litany” n Hardware redundancy n Aggregate capacity n Incremental scalability n Absolute scalability n Price/performance sweet spot n Software engineering n Partial failure management n Incremental scalability n System administration n Heterogeneity

© 2001 Stanford Clustering and Internet Services n Aggregate capacity l TB of disk storage, THz of compute power (if we can harness in parallel!) n Redundancy l Partial failure behavior: only small fractional degradation from loss of one node l Availability: industry average across “large” sites during 1998 holiday season was 97.2% availability (source: CyberAtlas) l Compare: mission-critical systems have “four nines” (99.99%)

© 2001 Stanford Clustering and Internet Workloads n Internet vs. “traditional” workloads l e.g. Database workloads (TPC benchmarks) l e.g. traditional scientific codes (matrix multiply, simulated annealing and related simulations, etc.) n Some characteristic differences l Read mostly l Quality of service (best-effort vs. guarantees) l Task granularity n “Embarrasingly parallel”…why? l HTTP is stateless with short-lived requests l Web’s architecture has already forced app designers to work around this! (not obvious in 1990)

© 2001 Stanford Meeting the Cluster Challenges n Software & programming models n Partial failure and application semantics n System administration n Two case studies to contrast programming models l GLUnix goal: support “all” traditional Unix apps, providing a single system image l SNS/TACC goal: simple programming model for Internet services (caching, transformation, etc.), with good robustness and easy administration

© 2001 Stanford Software Challenges n What is the programming model for clusters? l Explicit message passing (e.g. Active Messages) l RPC (but remember the problems that make RPC hard) l Shared memory/network RAM (e.g. Yahoo! directory) l Traditional OOP with object migration (“network transparency”): not relevant for Internet workload? n Programming model should support decent failure semantics and exploit inherent modularity of clusters l Traditional uniprocessor programming idioms/models don’t seem to scale up to clusters l Question: Is there a “natural to use” cluster model that scales down to uniprocessors, at least for Internet-like workloads? l Later in the quarter we’ll take a shot at this

© 2001 Stanford Partial Failure Management n What does partial failure mean for… l a transactional database? l A read-only database striped across cluster nodes? l A compute-intensive shared service? n What are appropriate “partial failure abstractions”? l Incomplete/imprecise results? l Longer latency? n What current programming idioms make partial failure hard? l Hint: remember the original RPC papers?

© 2001 Stanford System Administration on a Cluster Thanks to Eric Anderson (1998) for some of this material. n Total cost of ownership (TCO) way high for clusters due to administration costs n Previous Solutions l Pay someone to watch l Ignore or wait for someone to complain l “Shell Scripts From Hell” (not general  vast repeated work) n Need an extensible and scalable way to automate the gathering, analysis, and presentation of data

© 2001 Stanford System Administration, cont’d. Extensible Scalable Monitoring For Clusters of Computers (Anderson & Patterson, UC Berkeley) n Relational tables allow properties & queries of interest to evolve as the cluster evolves n Extensive visualization support allows humans to make sense of masses of data n Multiple levels of caching decouple data collection from aggregation n Data updates can be “pulled” on demand or triggered by push

© 2001 Stanford Visualizing Data: Example n Display aggregates of various interesting machine properties on the NOW’s n Note use of aggregation, color

© 2001 Stanford Case Study: The Berkeley NOW n History and Pictures of an early research cluster Pictures l NOW-0: four HP-735’s l NOW-1: 32 headless Sparc-10’s and Sparc-20’s l NOW-2: 100 UltraSparc 1’s, Myrinet interconnect l inktomi.berkeley.edu: four Sparc-10’s n Ultra’s, 200 CPU’s total l NOW-3: eight 4-way SMP’s n Myrinet interconnection l In addition to commodity switched Ethernet l Originally Sparc SBus, now available on PCIbus

© 2001 Stanford The Adventures of NOW: Applications n AlphaSort: 8.41 GB in one minute, 95 UltraSparcs l runner up: Ordinal Systems nSort on SGI Origin, 5 GB) l pre-1997 record, 1.6 GB on an SGI Challenge n 40-bit DES key crack in 3.5 hours l “NOW+”: headless and some headed machines n inktomi.berkeley.edu (now inktomi.com) l now fastest search engine, largest aggregate capacity n TranSend proxy & Top Gun Wingman Pilot browser l ~15,000 users, 3-10 machines

© 2001 Stanford NOW: GLUnix n Original goals: l High availability through redundancy l Load balancing, self-management l Binary compatibility l Both batch and parallel-job support n I.e., single system image for NOW users l Cluster abstractions == Unix abstractions l This is both good and bad…what’s missing compared to early 90’s proprietary cluster systems? n For portability and rapid development, build on top of off- the-shelf OS (Solaris)

© 2001 Stanford GLUnix Architecture n Master collects load, status, etc. info from daemons l Repository of cluster state, centralized resource allocation l Pros/cons of this approach? n Glib app library talks to GLUnix master as app proxy l Signal catching, process mgmt, I/O redirection, etc. l Death of daemon is treated as a SIGKILL by master GLUnix Master NOW node glud daemon NOW node glud daemon NOW node glud daemon 1 per cluster

© 2001 Stanford GLUnix Retrospective n Trends that changed the assumptions l SMP’s have replaced MPP’s, and are tougher to compete with for MPP workloads l Kernels have become extensible n Final features vs. initial goals l Tools: glurun, glumake (2nd most popular use of NOW!), glups/glukill, glustat, glureserve l Remote execution--but not total transparency l Load balancing/distribution--but not transparent migration/failover l Redundancy for high availability--but not for the “GLUnix master” node n Philosophy: Did GLUnix ask the right question (for our purposes)?

© 2001 Stanford TACC/SNS n Specialized cluster runtime to host Web-like workloads l TACC: transformation, aggregation, caching and customization-- elements of an Internet service l Build apps from composable modules, Unix-pipeline-style n Goal: complete separation of *ility concerns from application logic l Legacy code encapsulation, multiple language support l Insulate programmers from nasty engineering

© 2001 Stanford TACC Examples n Simple search engine l Query crawler’s DB l Cache recent searches l Customize UI/presentation n Simple transformation proxy l On-the-fly lossy compression of inline images (GIF, JPG, etc.) l Cache original & transformed l User specifies aggressiveness, “refinement” UI, etc. C T T $ $ A A T T $ $ C DB html

© 2001 Stanford Cluster-Based TACC Server n Component replication for scaling and availability n High-bandwidth, low-latency interconnect n Incremental scaling: commodity PC’s C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Load Balancing & Fault Tolerance Administration Interface

© 2001 Stanford “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A

© 2001 Stanford “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables If partition heals, extra LB’s commit suicide FE’s operate using cached LB info during failure

© 2001 Stanford “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables If partition heals, extra LB’s commit suicide FE’s operate using cached LB info during failure

© 2001 Stanford SNS Availability Mechanisms n Soft state everywhere l Multicast based announce/listen to refresh the state l Idea stolen from multicast routing in the Internet! n Process peers watch each other l Because of no hard state, “recovery” == “restart” l Because of multicast level of indirection, don’t need a location directory for resources n Load balancing, hot updates, migration are “easy” l Shoot down a worker, and it will recover l Upgrade == install new software, shoot down old l Mostly graceful degradation

© 2001 Stanford SNS Availability Mechanisms, cont’d. n Orthogonal mechanisms l Composition without interfaces l Example: Scalable Reliable Multicast (SRM) group state management with SNS l Eliminates O(n 2 ) complexity of composing modules l State space of failure mechanisms is easy to reason about n What’s the cost? n More on orthogonal mechanisms later

© 2001 Stanford Administering SNS n Multicast means monitor can run anywhere on cluster Extensible via self- describing data structures and mobile code in Tcl

© 2001 Stanford Clusters Summary n Many approaches to clustering, software transparency, failure semantics l An end-to-end problem that is often application-specific l We’ll see this again at the application level in harvest vs. yield discussion n Internet workloads are a particularly good match for clusters l What software support is needed to mate these two things? l What new abstractions do we want for writing failure-tolerant applications in light of these techniques?