Download presentation
Presentation is loading. Please wait.
1
1 A Framework for Highly Available Services Based on Group Communication Alan Fekete Idit Keidar University of Sidney MIT
2
2 Highly Available Services Availability through replication Dynamic set of servers –For load-balancing, adding new servers –For fault-tolerance, when servers fail / detach Clients connect to ‘abstract’ service Preemptive migration: client can be migrated in on-going session / transaction
3
3 Inspired by Highly available Video-on-Demand (VoD) [ Anker, Keidar, Dolev ICDCS 99] Uses group communication Server written in 2500 C++ code lines, including all availability logic Use of group communication not obvious
4
4 Framework Servers store content units –Partially replicated among servers –Static: no updates to content Client-service interaction in sessions Service is stateful during session –Service stores changing context for client Content served according to client context
5
5 Examples VoD –Content unit = movie, partially replicated –Context: location in movie, transmission rate,... –Movie frames sent to client depend on context –Client can random access -- changes context Courseware Interactive queries
6
6 Design Goals Client request leads to appropriate response Availability in face of failures –Need replication (can be partial) –Need preemptive migration Availability for varying number of clients –Need to vary number of servers Simple clients, flexible service –Availability is servers’ responsibility
7
7 Possible Problems at Migration Lost request (sent to dead server) –“Stale” context –Irrelevant responses Lost response (sent by dying server) Duplicate response Study what failure patterns cause problems, costs of minimizing them
8
8 Our Solution: The Basics Framework, not service Configurable in several parameters –Support for different policies Primary server assigned to session Preemptive migration to backup server Backups mirror session context –Context freshness at backup – configurable –3 levels
9
9 Replication of Context Info: Three levels of Freshness Unit database per content unit –Replicated among all servers of content unit –Periodic updates, frequency configurable Context reflecting all client requests –at primary and 1 st level backups Context reflecting server responses too –at primary
10
10 Group Communication (GC) Processes organized in groups, communication addressed to group Groups are dynamic (join, leave, crash,..) Groups can partition (“partitionable GC”) G Send(G)
11
11 GC Service Interface Input send( group, message ) Output receive( message ) Input join( group ) Input leave( group ) Output view( group, members, id ) view id is increasing Multicast Membership
12
12 Semantics: “Virtual Synchrony” [ Birman et al. 87] Group members that remain connected see events in same order –events: messages, views (totally ordered mcast) Framework for “state-machine” replication with fault tolerance, local consistency –Connected members go through same states –New members get state transfer Use to replicated unit database
13
13 Highly-Available Service: Multicast Groups Service Group Content Group Crouchin g Content Group Gladiator Content Group Spy Kids Session Group Session Group Session Group
14
14 Messages to Groups Content? start control update
15
15 Server Session Setup When start-session arrives, use local unit database to choose primary and 1 st backups Primary and 1 st level backups join session group (thus creating it) Primary sends session group name to client, serves content to client
16
16 Migration Triggered by view in content group State transfer to new members Use local unit database to choose new primary and backups per client –Choose 1 st level backup as primary if possible –By virtual synchrony, same decision made Chosen primaries and backups join session group, primary sends content,...
17
17 Configurable Parameters Replicas per content unit 1 st level backups Periodic updates frequency
18
18 Availability Analysis: Bad Scenarios Membership service not live, or does not give servers consistent views –Consistent migration decisions not made –Can lead to no service or duplicate service All content unit replicas crash –No service possible Context lost in migration –Risk depends on configurable parameters
19
19 Conclusions High availability by replication of content and context Group communication facilitates context replication Risk vs. load tradeoff –3 levels of freshness –Configurable freshness
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.