Paul Dolan World-Wide Manager Disaster Tolerance Services Compaq Computer Limited Paul Dolan World-Wide Manager Disaster Tolerance Services Compaq Computer Limited Disaster Tolerant Systems The Ultimate Level Of Service
2 Application specific DT Best use of inter-site bandwidth Best possible recovery capability Application Specific Must be written in to code Expensive to engineer Effective Disaster Tolerance Server App
3 Good recovery capability Fully integrated replication – simple to manage and monitor Server Based DT Specific to O/S & H/W platform Less good use of bandwidth Fewer specific application recovery features Effective Disaster Tolerance Server App
4 Cheapest to deploy General Purpose Multi-OS, Multi-HW Hardest to monitor and control No integration between the DT layers – must compensate SAN Based DT Effective Disaster Tolerance Server App
5 OpenVMS & Disaster Tolerance OpenVMS Clusters have a unique enabling capability to build the most advanced disaster tolerant solutions in the world Short recovery times following data center loss are obtainable and reproducible in production environments
6 Quorum site Production Site B Live Processors Payroll Funds Transfer Shadow set High Speed Network Payroll Funds Transfer Data SystemSystem System Production Site A Live Processors Shadow set OpenVMS Disaster Tolerant Cluster
7 Host Based Volume Shadowing Data Shadowing Conventional OpenVMS Cluster Design FDDI T3 ATM FDDI Gigabit Ethernet Controller Pair Alpha Site System Disk Storage Interconnect
8 Higher Functionality Infrastructure Storage Systems BackupApplication File Server ManagementApplication OtherApplication Storage Area Networks and DT High performance scaleable storage Large separation between storage and servers Real-time data replication Multi-platform capability Clustered Mail Server Clustered Database Server Web Server
9 Fiber Channel StorageWorks Features for DT SAN - Availability RA8000 Storage RA8000 Storage Switches High availability in every element Dual Power Dual Path No single-point of failure designs High Performance Separation of servers and storage FC over IP ATM
10 Remote Mirroring Medium Storage Large Storage Switches Medium Storage Large Storage Switches Application Server Application Server Application Server Application Server Application Server Application Server Application Server Application Server StorageWorks Features for DT SAN - DRM Storage Controller Driven Replication Fully Synchronous Peer to peer copy Disks break together functionality
11 Fiber Channel ATM Windows Solution Architecture Domain Controller Domain Controller Domain Controller Domain Controller Clustered Application Server Clustered Application Server Clustered Application Server Clustered Application Server Bridge/Router RA8000 Storage RA8000 Storage DRM Switches Management Station Management Station Management Station Management Station
12 Fiber Channel ATM Windows Alt. Solution Architecture Domain Controller Domain Controller Domain Controller Domain Controller SAN Booting Application Server SAN Booting Application Server SAN Booting Application Server SAN Booting Application Server Bridge/Router RA8000 Storage RA8000 Storage DRM Switches Management Station Management Station Management Station Management Station
13 Switch Fiber FDDI T3 ATM FDDI Gigabit Ethernet Controller Pair Switch Controller Pair Alpha Site System Disk “Next Generation” DT - OpenVMS & FibreChannel Host Based Volume Shadowing Data Shadowing
14 Oracle – Specific Solutions –Primary site is a Certified Oracle 9i RAC solution, and the remote site is a Certified or validated Oracle configuration. –If anything happens to the primary site (i.e. fire, flood, etc), the remote site can be accessed and operations continue. –The configurations are scalable and can be fortified (no single points of failure) based upon the customer’s needs and requirements. –Different configurations available for flexibility in cost, levels of protection, and meeting different distance requirements (Tru64 UNIX and OpenVMS) –May be configured for either warm or hot backup ATM, unlimited distance DRM configuration (warm) Campus-Wide Cluster configuration (hot) Adds disaster tolerance features to Certified Oracle9i Real Application Cluster solutions
15 Host 1Host 2 FC Switch Raid Storage Raid Storage ATM Brdg Primary Site Certified Oracle 91 RAC solution Remote Site Certified or validated Oracle 9i RAC solution – either single… ATM Network Host 1 BU FC Switch Raid Storage Oracle 9i RAC - DRM Solution
16 Host 1Host 2 FC Switch Raid Storage Raid Storage ATM Brdg Primary Site Certified Oracle 91 RAC solution Remote Site Certified or validated Oracle 9i RAC solution – … or clustered ATM Network Host 1 BUHost 2 BU FC Switch Raid Storage Raid Storage DRM and Cluster Solution
17 Oracle 9i RAC Campus-Wide Cluster Solution Primary storage sub-system Primary cluster memberSecondary cluster member Secondary storage sub -system FC Switches 2km (6 km with MC II) hubs Single Mode Fiber Max 10km Memory Channel Interconnect
18 Split-site Replication Comparison
19 Split-site Replication Comparison (Cont.)
20 Split-site Operation Comparison
21 Split-site Solution Comparison
22 Effective Management Management Engine Management Engine Network Components Switches Storage Application Servers Remote Management Station Remote Management Station Comprehensive Integrated Environment specific DT Aware DT Specific Remote capable Events Events Events
23 Design Considerations – Site Location
24 Design Considerations – Distance Longer distance perceived to offer better business protection –Usually demanded by the business But, longer distance :- –Increases latency – degrades performance for TP applications –Decreases bandwidth – degrades performance for mirror synchronization and high-bandwidth data –Increases costs of inter-site links Balance business risk against application performance
25 Design Considerations – Communications Beware of Data Resynch Times Best possible times to copy 1 TB of Data –T3 – ATM 44 Mb link ~ 91 hours –FDDI 100Mb link. ~ 31.6 hours –OC3 - ATM 155Mb link. ~ 26 hours –FC 1000Mb link (Full Duplex) ~ 3 hours 10 minutes Assume a minimum of 2 FULL site consolidations (data copies) a year (observed)
26 So You’ve Chosen Your Sites and Your Communications…… How are you going to proceed? Do you know all of the issues with your design? Do you understand how your procedures will need to be changed? Can you ensure no disruption to your mission critical applications while you are making the changes?
27 Our Approach to Disaster Tolerance Solution approach through Disaster-Tolerant Cluster and Disaster Tolerant SAN Services delivering: –Proven hardware & software technologies –Proven implementation methodology –Address procedural as well as technology issues –Detailed Cluster & network design –Delivered as a partnership with the customer –Full documentation including recovery plan
28 The DTCS / DTSAN Package Compaq DTCS & DTSAN Build Services Management Station Installation & configuration DR tests Go-Live Support Disaster Recovery Documentation Review Training DTCS / DTSAN Management Software Consultancy
29 What are the DTCS / DTSAN Services? –True Disaster Tolerant Solutions –Built on any split-site technology supported by the vendor that is supported in the terms of the relevant SPD –Using standard operating system features and components Correctly tuned and configured for split-site operation –Together with state-of-the-art system management software From Compaq and Heroix Corporation –Consultancy and training to configure the system based upon years of experience of building Disaster Tolerant Systems Providing a system that is easy to operate in a way that meets the demands of disaster tolerance
30 The DTCS / DTSAN Services Are customized to client specific requirements –Monitoring and Management Software licenses, Media & Documentation –Quorum adjustment or Recovery Manager site reconfiguration software –Customized DT rules for problem detection and alerting –Consulting & Training package Design and implementation planning Disaster Recovery Planning and testing Network reviews
31 Effective Management Management Engine Management Engine Network Components Switches Storage Terminal Server Application Servers Remote Management Station Remote Management Station Comprehensive Integrated Environment specific DT Aware DT Specific Remote capable
32 The Solution Components Data Centre B Data Centre A RoboCentral Client Customizations RoboMon RoboEDA Standard DT Rules Site Reconciliation S/W
33 Recovery from Failure - OpenVMS Manual Recovery Configuration –Needs manual intervention by operator / system manager to continue processing with the surviving data center. –Usually only in 24 hour manned operation. Automatic Recovery Configuration –No intervention necessary to allow the OpenVMS Cluster to continue in the event of a data center failure. –Gives maximum application up-time, with minimum recovery time.
34 Recovery from Failure – SAN Solutions Manual Recovery Configuration –Needs manual intervention by operator / system manager to continue processing with the surviving data center –Unmanned operation achieved using comprehensive alerting facilities –Comprehensive recovery manager addresses all components of the solution SAN / Server / Application / Network….. –Gives maximum application up-time, with minimum recovery time
35 Fiber Channel ATM Example Scenario – Our Bristol Facility Domain Controller Domain Controller Domain Controller Domain Controller Clustered Application Server Clustered Application Server Clustered Application Server Clustered Application Server Bridge/Router RA8000 Storage RA8000 Storage DRM Switches Management Station Management Station Management Station Management Station Avon Computer Room (Master for applications, Initiator for DRM) Somerset Computer Room (Fallback for applications, Target for DRM)
36
37
38
39
40
41 Additional Optional Modules The DTCS / DTSAN Project Outline DTCS / DTSAN Standard Package Disaster Tolerant Design Services Disaster Tolerant Build Services Cluster Preparation and build Inter-site network build Phased cluster introduction Disaster Tolerant Implementation Disaster Tolerant Cluster / SAN Services Pre-Installation Visit Transition Plan Training DTCS / DTSAN Management Station - Install, Configure,Customize DT “Go-Live” DT Readiness Review Recovery Testing Documentation Post-Implementation Review
42 Harnessing the Power of the Technology Industry leading DT solutions Best in class technical expertise Off-the-shelf technology components provide the backbone for the solution Business needs satisfied by addressing all issues Quality consulting with low risk implementation Partnership delivery with knowledge transfer Minimize Risk / Maximize Business Benefit