Fine-Grained Failover Using Connection Migration Alex C. Snoeren, David G. Andersen, Hari Balakrishnan MIT Laboratory for Computer Science
The Problem Servers Fail. More often than users want to know… ClientContent server
Solution: Server Redundancy Use a healthy one at all times.
1.Health Monitoring 2.Server Selection 3.Connection Resumption Failover Components
Today’s Replication Technology DNS/Content Routing Wide-area replication Need client awareness Layer 4/Web Switches Transparent, possibly mid-stream failover Requires co-location DNS Web Switch
Wide area replication Yet somehow synchronize replica servers Transparent failover Enable other servers to continue connections Ideal Technology
Stream Mapping Infer application state from transport layer information Connection Migration Transparently hand off sessions between servers Migrate Architecture Stream Mapper
Stream Mapping HTTP OK Content-Length: Content-Type: video/mpeg GET /StreamingContent.mpg HTTP/1.1 Client: Server Response: Stream Map: TCP SeqNo TCP ISS ClientObject (URL)Offset (TCP SeqNo) :4234/StreamingContent.mpg083346
Anatomy of Failover Client Support Group Initial Connection Migrated Connection
Support Groups Set of partially mirrored servers All servers able to provide same content Can be topologically diverse Synchronize on per-connection basis Servers need not be complete mirrors Connections from a failed server can be handled by a different support server Connections may have distinct support groups
Soft State Synchronization Synchronize within support groups Periodic advertisements Advertise client application object requests Communicate initial transport layer state Only initial state need be communicated Current info inferred from transport layer Clients will reject redundant migrates from stale support servers
TCP Connection Migration 1.Initial SYN 2.SYN/ACK 3.ACK (with data) 4.Normal data transfer 5.Migrate SYN 6.Migrate SYN/ACK 7.ACK (with data) clientserver
TCP Connection Migration 1.Initial SYN 2.SYN/ACK 3.ACK (with data) 4.Normal data transfer 5.Migrate SYN 6.Migrate SYN/ACK 7.ACK (with data) clientserver
TCP Connection Migration 1.Initial SYN 2.SYN/ACK 3.ACK (with data) 4.Normal data transfer 5.Migrate SYN 6.Migrate SYN/ACK 7.ACK (with data) clientserver failover server :546414(536) ack SYN :533525(0) ack current SYN :083521(0) (migrate T, R) stale
Implementation Server App Client Stream Mapping Wedges Software “Wedge” Stream Mapping Synchronization Wedge
Wedge Overhead e+06 1e Microseconds per request Request size (Kbytes) Wedge Direct
Experimental Topology Client initiates a transfer to A… Linux/Apache 1.3 then migrates to B… and back to A… 128Kbs links
Varying Oscillation Rates e Goodput (bytes) Time (secs) No Oscillations 10 sec 12 sec 2 sec 5 sec
Benefits & Limitations Enable wide area server replication Low server synchronization overhead Infer current state from transport layer Robust even under adverse loads Health monitors can be overly reactive Gracefully handle cascaded failures Leverages connection migration Requires modern transport stack
Software available on the web: Networks and Mobile Systems