Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation
Maximum Availability Architecture Oracle's Recipe For Building An Unbreakable System
Agenda Achieving High Availability Maximum Availability Architecture (MAA) Overview MAA Components Performance Considerations MAA Test Lab Q & A
High Availability is …
Causes of Downtime Maintenance & Continuous Operations Scheduled Outages Inadequate System Design, Testing & Process Unscheduled Outages Data Center Disasters Human Error System Faults and Crashes Data and Media Failures
High Availability Goal Design and validate the best, integrated High Availability solution – Unbreakable Architecture Handle all outages at all tiers – Best Practices Cookbook for prevention, avoidance, mitigation, and recovery Configuration, operational, outage solutions, restore fault tolerance – Complete out-of-the-box high availability Tested and validated solution Unbreakable Architecture + Best Practices = Maximum Availability
Maximum Availability Architecture Best Oracle High Availability Architecture – Blueprint for Database and Oracle9iAS – Guidelines for hardware and non-Oracle software but platform, OS, storage, network, … independent – Evolves with new Oracle versions and features Best Practices – Configuration and operational – Outages and detailed solutions – Restoring fault tolerance after an outage
Maximum Availability Architecture WAN Traffic Manager Dedicated Network Primary Site RAC Oracle9iAS Secondary Site Oracle9iAS RAC Data Guard
Secondary Site Secondary Site is a Mirror of the Primary Site – Resolve unscheduled outages quickly and easily – Allow site-wide scheduled outages Same Service Levels – Predictable performance and response time – Site transparency Consistent Procedures and Processes – Reduces administrative complexity
Highly Available Database Real Application Clusters Fast Failover – Protection from local site system failures – Faster than cold cluster failover solution – Fast-start fault recovery (instance failure MTTR) Availability and Accessibility – Allows for scheduled outages Add and remove nodes transparently – Transparent Application Failover (TAF) provides uninterrupted service
Highly Available Database Real Application Clusters Higher Scalability – All system resources from all nodes are leveraged – Cache fusion eliminates need to partition data or modify the application – fully application transparent – Connection load balancing distributes connection requests from application tier Manageability – Provides a single image of the database to manage
Highly Available Database Oracle Data Guard Data Protection – Protection from site failures, data failures, human errors, and corruptions Protection modes balance availability with performance Apply delay prevents user error propagation – Greater protection, performance, and manageability compared to remote mirroring solution – Offload processing from primary database system Role Management – Switchover operation for scheduled outages – Failover operation for unscheduled outages
Highly Available Application Oracle9iAS Availability – Oracle9iAS J2EE (OC4J) and Web Cache clustering for protection against system outages – Automatic monitor and restart of failed processes – Application state preserved through failures – Add and remove nodes transparently Scalability – Hardware network load balancer distributes client requests to Web Cache – Web Cache clustering for distributed caching and load balancing across multiple OC4J instances
Highly Available Application Oracle9iAS Application Server Tier Database Tier Clients Web Cache OC4J Clusters Load Balancer
Network Infrastructure Wide Area Traffic Manager to direct client traffic to proper site Network load balancer to distribute incoming requests Dedicated, fast link between sites – Influences production database performance Redundant components and paths – Network paths to the site and within the site
Best Practices Configuration – Detailed recommendations for Oracle software Features to use, parameters to set – Guidelines for hardware and other software Operational – Technical – e.g. Switchover and failover procedures – Logistical – e.g. Change management considerations – Emphasis on outages Outages to monitor Detailed steps to resolve outages How to restore fault tolerance
Best Practices Detect Outage Configuration Monitor for Outage Restore Fault Tolerance Resolve Outage Database Oracle9iAS OS Storage Network Operational
HA and Performance Combining high availability and performance – Secondary site with identical configuration as primary site – Network bandwidth and latency between sites – Data Guard protection mode – Instance recovery time
Network Bandwidth / Latency Network bandwidth and latency between sites influences commit response time Longer network latency will increase response time – Remote write = network round trip time + local write I/O time at secondary site Network bandwidth should be greater than maximum redo generation rate
Database Protection Modes Balance performance with level of protection from human error, data failures, and disasters Maximum Protection and Maximum Availability modes – No-data-loss protection, but can have a performance impact on production service levels Maximum Performance mode – Data loss possible, but less impact on production service levels
Instance Recovery Time Balance performance with level of protection from system faults and crashes Short instance recovery times can be achieved with negligible impact on performance – Provided sufficient I/O capacity exists to handle additional data block writes generated Fast-start checkpointing makes instance recovery time-bounded and predictable
Instance Recovery Time
MAA Test Lab Oracle, Sun, HP, EMC, F5 WAN Traffic Manager Dedicated Network Primary Site RAC Oracle9iAS Secondary Site Oracle9iAS RAC Data Guard F5 Networks EMC Hewlett-Packard Sun Microsystems
Maximum Availability Architecture Best Oracle High Availability Architecture What to use Best Practices How to build it How to manage it How to fix it
MAA Information Sources Oracle Technology Network – High Availability Collateral section Maximum Availability Architecture - Overview Maximum Availability Architecture – The Details Oracle Consulting – Advanced Technologies Solutions (ATS) Group
Next Steps Sessions by Oracle Database Development RAC: The Present, The Future, but not Science Fiction Mon, 1pm -- Moscone Room 103 Running Your Applications on Oracle Real Application Clusters Mon, 11am -- Moscone Room 134 Real Customers, Real Application Clusters, Real Results Mon, 4pm -- Moscone Room 134 Deploying A Highly Manageable Oracle Real Application Clusters Database Mon, 5:30pm -- Moscone Room 134 Breaking All the Rules with The Unbreakable Database Tue, 11am -- Moscone Room 103 Oracle’s Recipe For Building An Unbreakable System Tue, 1pm -- Moscone Room 134 Bullet-Proof Data Protection with Oracle Data Guard Tue, 4pm -- Moscone Room 134 TuesdayMonday For More Info On Oracle HA Go To
Next Steps Sessions by Oracle Database Development Getting Under The Hood With Data Guard SQL Apply Wed, 8:30am -- Moscone Room 134 LogMiner, Flashback Query and Online Redefinition: Power Tools For DBAs Wed, 11am -- Moscone Room 134 Are You Using The Best To Protect Your Enterprise Data? Wed, 4pm -- Moscone Room 252 Oracle LogMiner - Not Just An Error Recovery Tool Wed, 5:30pm -- Moscone Room 102 Wednesday For More Info On Oracle HA Go To Real Application Clusters Data Guard Backup & Recovery with Recovery Manager LogMiner, Flashback Query and Online Redefinition Database HA Demos All Four Days In The Oracle Demo Campground
Next Steps Sessions by Oracle Database Development Showcase Presentation/Demo 11:00 AM -- Database High Availability: Data Guard 11:30 AM -- Database High Availability: Backup & Recovery and Recovery Manager 12:00 PM -- Database High Availability: Online Reorg, Flashback Query and LogMiner 11:00 AM -- Real Application Clusters: Scalability 11:30 AM -- Real Application Clusters: High Availability 12:00 PM -- Real Application Clusters: CFS on Linux 11:00 AM -- Real Application Clusters: Scalability 11:30 AM -- Real Application Clusters: High Availability 12:30 PM -- Database High Availability: Data Guard Monday Tuesday Wednesday For More Info On Oracle HA Go To
A Q & Q U E S T I O N S A N S W E R S