Approaches to Clustering CS444I Internet Services Winter 00 © 1999-2000 Armando Fox

© 1999, Armando Fox Approaches to Bigness n One Big Mongo Server n DNS Round Robin n Magic Routers (a/k/a L4/L5 load balancing) n Application-Level Replication n True Clustering (case studies) l NOW/GLUnix: single system Unix image l Microsoft Wolfpack: virtualize every service l SNS/TACC: fixed Internet-service programming model

© 1999, Armando Fox One Big Mongo Server n Example: AltaVista l Scaling: What if you can’t get a server with enough main memory? l Availability l Growth path and cost n Advantages of one big mongo server? l Many agencies now using their (old?) mainframes! (IBM 390, e.g.) l Putting Web front end on legacy DB’s/apps n What if application is (say) I/O bound?

© 1999, Armando Fox DNS Round Robin n Benefits l Software transparent all the way to network level l Expand farm by updating DNS servers n Costs l Coarse grain l Ad hoc l Effect of node failure n Some apps can’t be easily replicated l Database

© 1999, Armando Fox Approaches to True Clustering n NOW/GLUnix: single Unix system image n Microsoft Wolfpack: off-the-shelf support for commodity apps n SNS/TACC: fixed Internet-service programming model

© 1999, Armando Fox NOW: GLUnix n Original goals: l High availability through redundancy l Load balancing, self-management l Binary compatibility l Both batch and parallel-job support n I.e., single system image for NOW users l Cluster abstractions == Unix abstractions l This is both good and bad…what’s missing? n For portability and rapid development, build on top of off-the-shelf OS (Solaris)

© 1999, Armando Fox GLUnix Architecture n Master collects load, status, etc. info from daemons l Repository of cluster state,c entralized resource allocation l Pros/cons of this approach? n Glib app library talks to GLUnix master as app proxy l Signal catching, process mgmt, I/O redirection, etc. l Death of daemon is treated as a SIGKILL by master GLUnix Master NOW node glud daemon NOW node glud daemon NOW node glud daemon 1 per cluster

© 1999, Armando Fox GLUnix Retrospective n Trends that changed the assumptions l SMP’s have replaced MPP’s, and are tougher to compete with l Kernels have become extensible n Final features vs. initial goals l Tools: glurun, glumake (2nd most popular use of NOW!), glups/glukill, glustat, glureserve l Remote execution--but not total transparency l Load balancing/distribution--but not transparent migration/failover l Redundancy for high availability--but not for the “GLUnix master” node

© 1999, Armando Fox GLUnix Interesting Problems n Glumake and NFS “consistency” n Support for benchmark-style batch jobs l Many instantiations, different parameters l Embarrassingly parallel n Social considerations l User-initiated unnecessary (malicious?) restarts l Lack of migration: an obstacle to harnessing desktop idle cycles (why?) n Philosophy: Did GLUnix ask the right question?

© 1999, Armando Fox Scalability Limits n Centralized resource management n TCP connections! (file descriptors) n Interconnect latency and bandwidth (HW level) l Myrinet: ~10 usec latency, 640 Mbits/s throughput l Ethernet: ~400 usec latency, 100 Mbits/s throughput l ATM: ~600 usec latency, 78 Mbits/s throughput (ATM was the initial target of the NOW!) n Thoughts about the interconnect l What’s more important, latency or bandwidth? l Why else might we want a secondary interconnect?

© 1999, Armando Fox Microsoft Wolfpack n Goal: clustering support for “commodity” OS & apps (NT) l Clustering DLL’s l Limited support for existing applications n Elements of a Wolfpack cluster l Cluster leader& quorum resource l Other cluster members l Failover managers l Virtualized services

© 1999, Armando Fox Wolfpack Operation n Cluster leader and quorum resource l The quorum (cluster configuration DB) defines the cluster l Quorum had better be robust/highly-available! l Prevents “split brain” problem resulting from partitioning n Heartbeats used to obtain membership info n Services can be virtualized to run on one or more nodes, but sharing a single network name

© 1999, Armando Fox Wolfpack: Failover n Failover managers negotiate among themselves to determine when/where/whether to restart a failed service n Degenerate case: can restart legacy apps l Cluster-aware DLL’s provided for writing your own apps n No guarantees on integrity/consistency Pfister: “…a means of simply providing transactional semantics for data, without necessarily having to buy an entire relational database in the bargain, would make it significantly easier for applications to be highly available in a cluster.”

© 1999, Armando Fox TACC/SNS n Specialized cluster runtime to host Web-like workloads l TACC: transformation, aggregation, caching and customization--elements of an Internet service l Build apps from composable modules, Unix-pipeline-style n Goal: complete separation of *ility concerns from application logic l Legacy code encapsulation, multiple language support l Insulate programmers from nasty engineering

© 1999, Armando Fox TACC Examples n HotBot search engine l Query crawler’s DB l Cache recent searches l Customize UI/presentation n TranSend transformation proxy l On-the-fly lossy compression of inline images (GIF, JPG, etc.) l Cache original & transformed l User specifies aggressiveness, “refinement” UI, etc. C T T $ $ A A T T $ $ C DB html

© 1999, Armando Fox Cluster-Based TACC Server n Component replication for scaling and availability n High-bandwidth, low-latency interconnect n Incremental scaling: commodity PC’s C $ LB/FT Interconnect FE $$ W W W T W W W A GUI Front Ends CachesCaches User Profile Database WorkersWorkers Load Balancing & Fault Tolerance Administration Interface

© 1999, Armando Fox “Starfish” Availability: LB Death l FE detects via broken pipe/timeout, restarts LB C $ Interconnect FE $$ W W W T LB/FT W W W A New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables If partition heals, extra LB’s commit suicide FE’s operate using cached LB info during failure

© 1999, Armando Fox SNS Availability Mechanisms n Soft state everywhere l Multicast based announce/listen to refresh the state l Idea stolen from multicast routing in the Internet! n Process peers watch each other l Because of no hard state, “recovery” == “restart” l Because of multicast level of indirection, don’t need a location directory for resources n Load balancing, hot updates, migration are “easy” l Shoot down a worker, and it will recover l Upgrade == install new software, shoot down old l Mostly graceful degradation

© 1999, Armando Fox SNS Availability Mechanisms, cont’d. n Orthogonal mechanisms l Composition without interfaces l Example: Scalable Reliable Multicast (SRM) group state management with SNS l Eliminates O(n 2 ) complexity of composing modules l State space of failure mechanisms is easy to reason about n What’s the cost? n More on orthogonal mechanisms later

© 1999, Armando Fox Comparing SNS & Wolfpack n Somewhat different targets n Quorum Resource Load Balancer/FT manager l But soft state, and cluster can (temporarily) function without it l Better partition resilience n Failover l Wolfpack Failover Manager slightly more flexible l Neither system provides any integrity/consistency guarantees itself n Multicast heartbeats detect membership, failures, locations of things

© 1999, Armando Fox What We Really Learned From TACC n Design for failure l It will fail anyway l End-to-end argument applied to high availability n Orthogonality is even better than layering l Narrow interface vs. no interface l A great way to manage system complexity l The price of orthogonality l Techniques: Refreshable soft state; watchdogs/timeouts; sandboxing n Software compatibility is hard, but valuable

© 1999, Armando Fox Clusters Summary n Many approaches to clustering, software transparency, failure semantics l An end-to-end problem that is often application-specific l We’ll see this again at the application level in harvest vs. yield discussion n Internet workloads are a particularly good match for clusters l What software support is needed to mate these two things? l What new abstractions do we want for writing failure-tolerant applications in light of these techniques? l What about Pfister’s comment about transactional semantics?

Approaches to Clustering CS444I Internet Services Winter 00 © 1999-2000 Armando Fox

Similar presentations

Presentation on theme: "Approaches to Clustering CS444I Internet Services Winter 00 © 1999-2000 Armando Fox"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Approaches to Clustering CS444I Internet Services Winter 00 © 1999-2000 Armando Fox

Similar presentations

Presentation on theme: "Approaches to Clustering CS444I Internet Services Winter 00 © 1999-2000 Armando Fox"— Presentation transcript:

Similar presentations

About project

Feedback