PROVIDING AN SFX FAILOVER SYSTEM USING MYSQL REPLICATION Anne L. Highsmith Head of Consortia Systems Texas A&M University
Failing over gracefully, or, What to do when your computer crashes
Whys and hows of a failover SFX is a critical application requiring as much public uptime as possible Decided to model SFX failover on our Voyager system What does it cost to run a failover? (double your server, double your fun, double your invoice?)
The name game Failover server (bonden.tamu.edu) (public name of service) Production server (killick.tamu.edu) DB update via replication
Failing over our way 1 The licence request contained: Server names and ips for the production and failover server Service name and ip for the “public name” of the service. The /etc/hosts file on each server contains: Server names and ips for the production and failover server Service name and ip for the “public name” of the service.
Failing over our way 2 – ifconfig Sysadmin uses ifconfig to configure name for linkresolver.tamu.edu on production server When switching between servers, the sysadmin uses ifconfig to take down the name on production and bring it up on failover. Takes about 5 min. Avoids DNS reload
Initial data load on failover
Production and failover setup Install vanilla SFX 4 on failover server Verify that vanilla installation works, then take it down and remove /exlibris/sfx_ver/sfx_4[slot] Run a full cold backup on production Transfer cold backup files to failover server and unpack
MySQL documentation: cation-howto.html MySQL replication setup & testing
Special setup for replication (1) Run binary logging on production but not failover Set up a unique server id for both source and target Create a userid on the source server that the target server can use to query for updates DBA finishes MySQL replication setup [i.e. MAGIC HAPPENS HERE]
Special setup for replication (2) Leave reverse proxy apache down on failover Disable admin updates to failover by setting up [instance]/config/connection_admin.config_ (optional)
Software updates on failover KBDB updates are unnecessary, because replication takes care of them. Software updates must still be applied Use a special option on the rev-up process /[sfxglb41_path]/admin/revision/rev-up --type=sw - -type=kbsw –backup=no Apache restarts after update not a problem
Failover testing Steps to switch from production to failover Stop replication process on failover (DBA) Start reverse apache on failover (SFX sysadmin) Move the linkresolver interface from production to failover (Computer center sysadmin). Steps to switch back to production Move the linkresolver interface back to production (Computer center sysadmin) Stop reverse apache on failover (SFX sysadmin) Rebuild database on failover (DBA) Start replication process on failover (DBA)
Implications of running on failover Switchover/switchback creates a synchronization issue between databases Failover database logged new statistics When production came back, it created stat requests with keys that duplicated those already in failover Replication to failover couldn’t be restarted until all of the potential duplicates were deleted Decided to rebuild failover database after “use” and lose statistics