High Availability in Hurricane Alley Multi-site multi-node CAS Deep in the Heart of Texas Srinivas Varadaraj & Bill Thompson Jasig Sakai Conference1
Agenda 1.Strategy 2.Technical requirements 3.Constraints 4.Stuff at hand 5.Architectural decisions 6.Cluster & production architecture 7.Challenges and solutions 8.Multi-site routing 9.Production experiences 10.Questions & Comments Jasig Sakai Conference2
Strategic requirements Single Identity Single Sign On/ Single Sign Off Maximize self service tools Improved user experience Jasig Sakai Conference3
Technical requirements Application Compatibility High Availability Rolling maintenance Transparency Scalability AD integration Customization(branding) Jasig Sakai Conference4
Constraints Limited budget, use existing resources. – Power in the datacenters – Single internet – High latency connectivity Limited in-house development & experience – Stay close to release code Aggressive timeframe Jasig Sakai Conference5
Stuff we had at hand SAN infrastructure with replication to DR VM clusters Site-to-site VPN based connectivity to DR F5 loadbalancers Dedicated firewalls Opportunity Jasig Sakai Conference6
Decisions ! Decisions ! Decisions ! Virtual Machines SAN based storage The great ticket registry debate To replicate tickets or NOT ! Building by cloning “Appliance” like SSL Local vs Offloading Cluster VS Standalone application servers Timeout ! Jasig Sakai Conference7
Cluster components Jasig Sakai Conference8
Final Architecture Jasig Sakai Conference9
“Holy troubles, Batman!” SSL offloading – Tomcat offloading workaround Authentication and Validation persistence – User and application can go to either site. – Enter site identifiers Multi-site ticket replication. – Latency in WAN Algorithm usage in phpCAS clients and Java CAS clients Slow performance of mod_auth_cas on VMs Jasig Sakai Conference10
Routing logic HTTP_REQUEST HTTP_REQUST_DATA HTTP_RESPONSE Jasig Sakai Conference11
HTTP_REQUEST(Request from the client) HTTP_REQUEST{ 1)Grab header length to determine payload size 2)If both sites are down, redirect to a branded service unavailable page 3) If URI has siteID of other site and other site is up, route to other site 4) Otherwise default route to local site } Jasig Sakai Conference12
HTTP_REQUEST_DATA(Payload manipulation) HTTP_REQUEST_DATA{ 1)Grab from payload, this may contain siteID 2) if we have a siteID of the other side { If the siteID is Loadbalancer introduced { blank the loadbalancer extension } Route to other side else { if we have a siteID of the local side { If the siteID is Loadbalancer introduced { blank the loadbalancer extension } Route to local side } Jasig Sakai Conference13
HTTP_RESPONSE(Response from the server) HTTP_RESPONSE{ 1)Grab server’s response headers 2) If SiteID is not in the response header { Introduce a loadbalancer siteID to compensate for java CAS client } Release HTTP to client } Jasig Sakai Conference14
Jasig Sakai Conference15
Experiences in Production Approx. 8 months in production 7 Applications in production 10 in development Survived two power outages at DR Survived multiple internet outages Successful rolling upgrades to MySQL & CAS Flow based redesign. LPPE Re-visit ticket registry Jasig Sakai Conference16
Questions/Comments Credits: – CAS developers and community – F5 & F5 devcentral – Unicon – LU & Txstate Thank you for your time !! Contacts: – Sri: – Bill: Jasig Sakai Conference17