Approaches to Clustering CS444I Internet Services Winter 00 © 1999-2000 Armando Fox

Slides:



Advertisements
Similar presentations
Clustering Technology For Scaleability Jim Gray Microsoft Research
Advertisements

Tableau Software Australia
Distributed Processing, Client/Server and Clusters
Database Architectures and the Web
COS 461 Fall 1997 Workstation Clusters u replace big mainframe machines with a group of small cheap machines u get performance of big machines on the cost-curve.
Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Distributed Processing, Client/Server, and Clusters
Distributed components
Technical Architectures
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.
City University London
G Robert Grimm New York University Scalable Network Services.
Cluster Computing Overview CS241 Winter 01 © Armando Fox
Topics ACID vs BASE Starfish Availability TACC Model Transend Measurements SNS Architecture.
G Robert Grimm New York University Scalable Network Services.
Presentation on Clustering Paper: Cluster-based Scalable Network Services; Fox, Gribble et. al Internet Services Suman K. Grandhi Pratish Halady.
Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.
Tiered architectures 1 to N tiers. 2 An architectural history of computing 1 tier architecture – monolithic Information Systems – Presentation / frontend,
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
National Manager Database Services
Copyright © 2002 Wensong Zhang. Page 1 Free Software Symposium 2002 Linux Virtual Server: Linux Server Clusters for Scalable Network Services Wensong Zhang.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Dynamics AX Technical Overview Application Architecture Dynamics AX Technical Overview.
Domino MailDomino AppsQuickPlace Sametime Domino WebHub / SMTP Top Ten Reasons to Consolidate Lotus  Workloads on IBM eServer  iSeries  and eServer.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Exploiting Application Semantics: Harvest, Yield CS 444A Fall 99 Software for Critical Systems Armando Fox & David Dill © 1999 Armando Fox.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
PMIT-6102 Advanced Database Systems
Computer System Architectures Computer System Software
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Cluster-Based Scalable Network Service Author: Armando Steven D.Gribble Steven D.Gribble Yatin Chawathe Yatin Chawathe Eric A. Brewer Eric A. Brewer Paul.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
VMware vSphere Configuration and Management v6
High Availability in DB2 Nishant Sinha
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Cluster-Based Scalable
Introduction to Distributed Platforms
Netscape Application Server
Dag Toppe Larsen UiB/CERN CERN,
The Case for a Session State Storage Layer
High Availability Linux (HA Linux)
Dag Toppe Larsen UiB/CERN CERN,
Network Load Balancing
Operating System Structure
Storage Virtualization
Chapter 17: Database System Architectures
QNX Technology Overview
RM3G: Next Generation Recovery Manager
Presentation transcript:

Approaches to Clustering CS444I Internet Services Winter 00 © Armando Fox

© 1999, Armando Fox Outline n Non-cluster approaches to bigness n Approaches to clustering Cluster case studies n Berkeley NOW/GLUnix n SNS/TACC n Microsoft Wolfpack Wolfpair

© 1999, Armando Fox Approaches to Bigness n One Big Mongo Server n DNS Round Robin n Magic Routers (a/k/a L4/L5 load balancing) n Application-Level Replication n True Clustering (case studies) l NOW/GLUnix: single system Unix image l Microsoft Wolfpack: virtualize every service l SNS/TACC: fixed Internet-service programming model

© 1999, Armando Fox One Big Mongo Server n Example: AltaVista l Scaling: What if you can’t get a server with enough main memory? l Availability l Growth path and cost n Advantages of one big mongo server? l Many agencies now using their (old?) mainframes! (IBM 390, e.g.) l Putting Web front end on legacy DB’s/apps n What if application is (say) I/O bound?

© 1999, Armando Fox DNS Round Robin n Benefits l Software transparent all the way to network level l Expand farm by updating DNS servers n Costs l Coarse grain l Ad hoc l Effect of node failure n Some apps can’t be easily replicated l Database

© 1999, Armando Fox Approaches to True Clustering n NOW/GLUnix: single Unix system image n Microsoft Wolfpack: off-the-shelf support for commodity apps n SNS/TACC: fixed Internet-service programming model

© 1999, Armando Fox NOW: GLUnix n Original goals: l High availability through redundancy l Load balancing, self-management l Binary compatibility l Both batch and parallel-job support n I.e., single system image for NOW users l Cluster abstractions == Unix abstractions l This is both good and bad…what’s missing? n For portability and rapid development, build on top of off-the-shelf OS (Solaris)

© 1999, Armando Fox GLUnix Architecture n Master collects load, status, etc. info from daemons l Repository of cluster state,c entralized resource allocation l Pros/cons of this approach? n Glib app library talks to GLUnix master as app proxy l Signal catching, process mgmt, I/O redirection, etc. l Death of daemon is treated as a SIGKILL by master GLUnix Master NOW node glud daemon NOW node glud daemon NOW node glud daemon 1 per cluster

© 1999, Armando Fox GLUnix Retrospective n Trends that changed the assumptions l SMP’s have replaced MPP’s, and are tougher to compete with l Kernels have become extensible n Final features vs. initial goals l Tools: glurun, glumake (2nd most popular use of NOW!), glups/glukill, glustat, glureserve l Remote execution--but not total transparency l Load balancing/distribution--but not transparent migration/failover l Redundancy for high availability--but not for the “GLUnix master” node

© 1999, Armando Fox GLUnix Interesting Problems n Glumake and NFS “consistency” n Support for benchmark-style batch jobs l Many instantiations, different parameters l Embarrassingly parallel n Social considerations l User-initiated unnecessary (malicious?) restarts l Lack of migration: an obstacle to harnessing desktop idle cycles (why?) n Philosophy: Did GLUnix ask the right question?

© 1999, Armando Fox Scalability Limits n Centralized resource management n TCP connections! (file descriptors) n Interconnect latency and bandwidth (HW level) l Myrinet: ~10 usec latency, 640 Mbits/s throughput l Ethernet: ~400 usec latency, 100 Mbits/s throughput l ATM: ~600 usec latency, 78 Mbits/s throughput (ATM was the initial target of the NOW!) n Thoughts about the interconnect l What’s more important, latency or bandwidth? l Why else might we want a secondary interconnect?

© 1999, Armando Fox Microsoft Wolfpack n Goal: clustering support for “commodity” OS & apps (NT) l Clustering DLL’s l Limited support for existing applications n Elements of a Wolfpack cluster l Cluster leader& quorum resource l Other cluster members l Failover managers l Virtualized services

© 1999, Armando Fox Wolfpack Operation n Cluster leader and quorum resource l The quorum (cluster configuration DB) defines the cluster l Quorum had better be robust/highly-available! l Prevents “split brain” problem resulting from partitioning n Heartbeats used to obtain membership info n Services can be virtualized to run on one or more nodes, but sharing a single network name

© 1999, Armando Fox Wolfpack: Failover n Failover managers negotiate among themselves to determine when/where/whether to restart a failed service n Degenerate case: can restart legacy apps l Cluster-aware DLL’s provided for writing your own apps n No guarantees on integrity/consistency Pfister: “…a means of simply providing transactional semantics for data, without necessarily having to buy an entire relational database in the bargain, would make it significantly easier for applications to be highly available in a cluster.”

© 1999, Armando Fox TACC/SNS n Specialized cluster runtime to host Web-like workloads l TACC: transformation, aggregation, caching and customization--elements of an Internet service l Build apps from composable modules, Unix-pipeline-style n Goal: complete separation of *ility concerns from application logic l Legacy code encapsulation, multiple language support l Insulate programmers from nasty engineering

© 1999, Armando Fox TACC Examples n HotBot search engine l Query crawler’s DB l Cache recent searches l Customize UI/presentation n TranSend transformation proxy l On-the-fly lossy compression of inline images (GIF, JPG, etc.) l Cache original & transformed l User specifies aggressiveness, “refinement” UI, etc. C T T $ $ A A T T $ $ C DB html

© 1999, Armando Fox Cluster-Based TACC Server n Component replication for scaling and availability n High-bandwidth, low-latency interconnect n Incremental scaling: commodity PC’s C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Load Balancing & Fault Tolerance Administration Interface

© 1999, Armando Fox “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A

© 1999, Armando Fox “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables If partition heals, extra LB’s commit suicide FE’s operate using cached LB info during failure

© 1999, Armando Fox “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables If partition heals, extra LB’s commit suicide FE’s operate using cached LB info during failure

© 1999, Armando Fox SNS Availability Mechanisms n Soft state everywhere l Multicast based announce/listen to refresh the state l Idea stolen from multicast routing in the Internet! n Process peers watch each other l Because of no hard state, “recovery” == “restart” l Because of multicast level of indirection, don’t need a location directory for resources n Load balancing, hot updates, migration are “easy” l Shoot down a worker, and it will recover l Upgrade == install new software, shoot down old l Mostly graceful degradation

© 1999, Armando Fox SNS Availability Mechanisms, cont’d. n Orthogonal mechanisms l Composition without interfaces l Example: Scalable Reliable Multicast (SRM) group state management with SNS l Eliminates O(n 2 ) complexity of composing modules l State space of failure mechanisms is easy to reason about n What’s the cost? n More on orthogonal mechanisms later

© 1999, Armando Fox Administering SNS n Multicast means monitor can run anywhere on cluster Extensible via self- describing data structures and mobile code in Tcl

© 1999, Armando Fox Comparing SNS & Wolfpack n Somewhat different targets n Quorum Resource Load Balancer/FT manager l But soft state, and cluster can (temporarily) function without it l Better partition resilience n Failover l Wolfpack Failover Manager slightly more flexible l Neither system provides any integrity/consistency guarantees itself n Multicast heartbeats detect membership, failures, locations of things

© 1999, Armando Fox What We Really Learned From TACC n Design for failure l It will fail anyway l End-to-end argument applied to high availability n Orthogonality is even better than layering l Narrow interface vs. no interface l A great way to manage system complexity l The price of orthogonality l Techniques: Refreshable soft state; watchdogs/timeouts; sandboxing n Software compatibility is hard, but valuable

© 1999, Armando Fox Clusters Summary n Many approaches to clustering, software transparency, failure semantics l An end-to-end problem that is often application-specific l We’ll see this again at the application level in harvest vs. yield discussion n Internet workloads are a particularly good match for clusters l What software support is needed to mate these two things? l What new abstractions do we want for writing failure-tolerant applications in light of these techniques? l What about Pfister’s comment about transactional semantics?