Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University.

Similar presentations

Presentation on theme: "An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University."— Presentation transcript:

1 An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University of California at Santa Barbara

2 Outline Background & motivation Membership protocol design Implementation Evaluation Related work Conclusion

3 Background Large-scale 24x7 Internet services  Thousands of machines connected by many level-2 and level-3 switches (e.g. 10,000 at Ask Jeeves)  Multi-tiered architecture with data partitioning and replication  Some of machines are unavailable frequently due to failures, operational errors, and scheduled service update.

4 Network Topology in Service Clusters Multiple hosting centers across Internet In a hosting center  Thousands of nodes  Many level-2 and level-3 switches  Complex switch topology

5 Motivation Membership protocol  Yellow page directory – discovery of services and their attributes  Server aliveness – quick fault detection Challenges  Efficiency  Scalability  Fast detection

6 Fast Failure Detection is crucial Online auction service even with replication  Failure of one replica 7s - 12s  Service unavailable 10s - 13s

7 Communication Cost for Fast Detection Communication requirement  Propagate to all nodes  Fast detection needs higher packet rate  High bandwidth Higher hardware cost More chances of failures.

8 Design Requirements of Membership Protocol for Large-scale Clusters Efficient: bandwidth, # of packets Topology-adaptive: localize traffic within switches Scalable: scale to tens of thousands of nodes Fast failure detection and information propagation.

9 Approaches Centralized  Easy to implement  Single point of failure, not scalable, extra delay Distributed  All-to-all broadcast [Shen’01]: doesn’t scale well  Gossip [Renesse’98]: probabilistic guarantee  Ring: slow to handle multi-failures Don’t consider network topology

10 TAMP: Topology-Adaptive Membership Protocol Topology-awareness  Form a hierarchical tree according to network topology Topology-adaptiveness  Network changes: add/remove/move switches  Service changes: add/remove/move nodes  Exploit TTL field in IP packet

11 Hierarchical Tree Formation Algorithm 1.Form small multicast groups with low TTL values; 2.Each multicast group performs elections; 3.Group leaders form higher level groups with larger TTL values; 4.Stop when max. TTL value is reached; otherwise, goto Step 2.

12 An Example 3 Level-3 switches with 9 nodes

13 Node Joining Procedure Purpose  Find/elect a leader  Exchange membership information Process 1.Join a channel and listen; 2.If a leader exists, stop and bootstrap with the leader; 3.Otherwise, elects a leader (bully algorithm); 4.If is leader, increase channel ID & TTL, goto 1.

14 Properties of TAMP Upward propagation guarantee  A node is always aware of its leader  Messages can always be propagated to nodes in the higher levels Downward propagation guarantee  A node at level i must leaders of level i-1, i-2, …, 0  Messages can always be propagated to lower level nodes Eventual convergence  View of every node converges

15 Update protocol when cluster structure changes Heartbeat for failure detection Leader receive an update - multicast up & down

16 Fault Tolerance Techniques Leader failure: backup leader or election Network partition failure  Timeout all nodes managed by a failed leader  Hierarchical timeout: longer timeout for higher levels Packet loss  Leaders exchanges deltas since last update  Piggyback last three changes

17 Scalability Analysis Protocols: all-to-all, gossip, and TAMP Basic performance factors  Failure detection time (T fail_detect )  View convergence time (T converge )  Communication cost in terms of bandwidth (B)

18 Scalability Analysis (Cont.) Two metrics  BDP = B * T fail_detect, lower failure detection time with low bandwidth is desired  BCP = B * T converge, lower convergence time with low bandwidth is desired BDPBCP All-to-allO(n 2 ) GossipO(n 2 logn) TAMPO(n)O(n)+O(B*log k n) n: total # of nodes k: each group size, a constant

19 Implementation Inside Neptune middleware [Shen’01] – programming and runtime support for building cluster-based Internet services Can be easily coupled into others clustering frameworks

20 Evaluation: Objectives & Settings Metrics  Bandwidth  failure detection time  View convergence time Hardware settings  100 dual PIII 1.4GHz nodes  2 switches connected by a Gigabit switch Protocol related settings  Frequency: 1 packet/s  A node is deemed dead after 5 consecutive loss  Gossip mistake probability 0.1%  # of nodes: 20 – 100 in step of 20

21 Bandwidth Consumption All-to-All & Gossip: quadratic increase TAMP: close to linear

22 Failure Detection Time Gossip: log(N) increase All-to-All & TAMP: constant

23 View Convergence Time Gossip: log(N) increase All-to-All & TAMP: constant

24 Related Work Membership & failure detection  [Chandra’96], [Fetzer’99], [Fetzer’01], [Neiger’96], and [Stok’94] Gossip-style protocols  SCAMP, [Kempe’01], and [Renesse’98] High-availability system (e.g., HA-Linux, Linux Heartbeat) Cluster-based network services  TACC, Porcupine, Neptune, Ninja Resource monitoring: Ganglia, NWS, MDS2

25 Contributions & Conclusions TAMP is a highly efficient and scalable protocol for giant clusters Exploiting TTL count in IP packet for topology-adaptive design. Verified through property analysis and experimentation. Deployed at Ask Jeeves clusters with thousands of machines.

26 Questions?

Download ppt "An Efficient Topology-Adaptive Membership Protocol for Large- Scale Cluster-Based Services Jingyu Zhou * §, Lingkun Chu*, Tao Yang* § * Ask Jeeves §University."

Similar presentations

Ads by Google