Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.

Similar presentations


Presentation on theme: "High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation."— Presentation transcript:

1 High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation

2 Topics What is High Availability (HA)? What is High Availability (HA)? Current HA strategies Current HA strategies What’s coming? What’s coming? Questions & Discussion Questions & Discussion

3 What is High Availability (HA)? Reliability Reliability Fault-tolerance Fault-tolerance High Uptime High Uptime Operational Continuity Operational Continuity Redundancy Redundancy Minimal Disruption Minimal Disruption Availability %Downtime per year Downtime per month Downtime per week 90%36.5 days72 hours16.8 hours 95%18.25 days36 hours8.4 hours 99%3.65 days7.20 hours1.68 hours 99.9%8.67 hours43.2 minutes10.1 minutes 99.99%52.6 minutes4.32 minutes1.01 minutes 99.999%5.26 minutes25.9 seconds6.05 seconds 99.9999%31.5 seconds2.59 seconds0.605 seconds

4 High Availability vs. Disaster Recovery High Availability = fault detection & correction procedures to maximize availability of critical services and applications, often in an automated fashion. High Availability = fault detection & correction procedures to maximize availability of critical services and applications, often in an automated fashion. Disaster Recovery = process of preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Disaster Recovery = process of preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. High Availability ≠ Disaster Recovery!

5 Current HA Strategies Failover = Automatic switch to redundant system Failover = Automatic switch to redundant system Uses some type of heartbeat software (e.g., HACMP) Uses some type of heartbeat software (e.g., HACMP) Current Failover Options: Current Failover Options: –Failover Clusters –Concurrent Clusters –ECP Clusters With Failover Cluster for DatabaseWith Failover Cluster for Database With Concurrent Cluster for DatabaseWith Concurrent Cluster for Database

6 Failover Clusters One active system (PROD), and one standby system (STDBY), with a heartbeat connection One active system (PROD), and one standby system (STDBY), with a heartbeat connection Windows Cluster, IBM HACMP, Sun Cluster, HP Serviceguard, Red Hat Cluster Suite, Veritas Cluster Services… Windows Cluster, IBM HACMP, Sun Cluster, HP Serviceguard, Red Hat Cluster Suite, Veritas Cluster Services… Needs shared disk for install directory, WIJ, database files, and journal files Needs shared disk for install directory, WIJ, database files, and journal files Users/Applications connect to a DNS which is mapped to PROD Users/Applications connect to a DNS which is mapped to PROD In event of failure, 3 rd party cluster software fails Caché to STDBY node In event of failure, 3 rd party cluster software fails Caché to STDBY node Caché performs recovery on STDBY node before allowing connections - open Tx’s are rolled back, open locks are released, etc… Caché performs recovery on STDBY node before allowing connections - open Tx’s are rolled back, open locks are released, etc…

7 Concurrent Clusters AKA Caché Clusters AKA Caché Clusters Can be configured on OpenVMS and Tru64 UNIX Can be configured on OpenVMS and Tru64 UNIX Two or more servers, each running an instance of Caché and each with access to all disks, concurrently provide access to all data Two or more servers, each running an instance of Caché and each with access to all disks, concurrently provide access to all data Users connect to either one of the clustered nodes; Caché provides data and lock synchronization across nodes Users connect to either one of the clustered nodes; Caché provides data and lock synchronization across nodes If one machine fails, users can immediately reconnect to any of the remaining cluster nodes If one machine fails, users can immediately reconnect to any of the remaining cluster nodes Caché performs cluster-wide recovery during failover – logical and physical data integrity is maintained Caché performs cluster-wide recovery during failover – logical and physical data integrity is maintained

8 ECP Clusters – with DB as Failover Cluster Enterprise Cache Protocol (ECP) provides a distributed, tiered system Enterprise Cache Protocol (ECP) provides a distributed, tiered system Typical configuration: Typical configuration: –N+1 application servers –Users load-balanced across app servers If any app server goes down, users can be reconnected to other remaining app servers If any app server goes down, users can be reconnected to other remaining app servers If database goes down, users on app servers will experience pause while DB failover completes (here DB is configured as a failover cluster) If database goes down, users on app servers will experience pause while DB failover completes (here DB is configured as a failover cluster) Application servers will reconnect after database has performed recovery Application servers will reconnect after database has performed recovery

9 ECP Clusters – with DB as Concurrent Cluster Similar to previous example, except DB server is configured as a concurrent cluster (OpenVMS or Tru64 UNIX) Similar to previous example, except DB server is configured as a concurrent cluster (OpenVMS or Tru64 UNIX) App servers can connect to any one of the nodes App servers can connect to any one of the nodes If any node fails, the app server(s) connected to that node will reconnect to another surviving node after failover If any node fails, the app server(s) connected to that node will reconnect to another surviving node after failover Caché performs cluster-wide recovery during failover – logical and physical data integrity is maintained Caché performs cluster-wide recovery during failover – logical and physical data integrity is maintained

10 High Availability: What’s Coming? Database Mirroring: Delivers faster, automated failover Delivers faster, automated failover Eliminates requirement for shared disk configurations Eliminates requirement for shared disk configurations Reduces dependency on 3 rd party clustering software Reduces dependency on 3 rd party clustering software Uses multiple redundant servers Uses multiple redundant servers Integrated ECP recovery Integrated ECP recovery

11 Database Mirroring Multiple servers in Mirror Set - one is Primary, others are Backup (1+) Multiple servers in Mirror Set - one is Primary, others are Backup (1+) TCP connections between mirror members TCP connections between mirror members Primary PUSHES journal updates to Backups, who ack and continuously de-journal Primary PUSHES journal updates to Backups, who ack and continuously de-journal Primary role can flip from one server to another within moments – automated failover Primary role can flip from one server to another within moments – automated failover All clients (except ECP) connect to a Mirror Virtual IP – mirror handles appropriate redirection to current Primary All clients (except ECP) connect to a Mirror Virtual IP – mirror handles appropriate redirection to current Primary ECP protocol is “mirror aware” – app servers will connect directly to current primary, and will fail over to new primary as appropriate. ECP will perform recovery on reconnection. ECP protocol is “mirror aware” – app servers will connect directly to current primary, and will fail over to new primary as appropriate. ECP will perform recovery on reconnection.

12 Wrap-up Questions & Discussion


Download ppt "High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation."

Similar presentations


Ads by Google