CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Data Protection with Oracle Data Guard Jacek Wojcieszuk, CERN/IT-DM Distributed Database.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Advertisements

Stephen D. Mund, OCP Database Administrator Nevada System of Higher Education System Computing Services.
Introduction to DBA.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
© 2015 Dbvisit Software Limited | dbvisit.com An Introduction to Dbvisit Standby.
CERN IT Department CH-1211 Geneva 23 Switzerland t Marcin Blaszczyk, IT-DB Atlas standby database tests February.
Keith Burns Microsoft UK Mission Critical Database.
EIM April 19, Robin Weaver 13 Years with IBM Prior to Assignment at UNC Charlotte Range of Database Development/Data Management Projects and Products.
Backup and Recovery Part 1.
CERN IT Department CH-1211 Genève 23 Switzerland t Backup configuration best practices Jacek Wojcieszuk, CERN IT-DM Distributed Database.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Data Guard for RAC migrations WLCG Service Reliability Workshop CERN, November.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
National Manager Database Services
Introduction to Oracle Backup and Recovery
CHAPTER 16 User-Managed Backup and Recovery. Introduction to User Managed Backup and Recovery Backup and recover is one of the most critical skills a.
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
ORACLE DATABASE HIGH AVAILABILITY 1. OUTLINE I. Overview Of High Availability II. Oracle Database High Availability Architecture III. Determining Your.
1 Data Guard Basics Julian Dyke Independent Consultant Web Version - February 2008 juliandyke.com © 2008 Julian Dyke.
Building Highly Available Systems with SQL Server™ 2005 Vineet Gupta Evangelist – Data and Integration Microsoft Corp.
Selling the Database Edition for Oracle on HP-UX November 2000.
Oracle Recovery Manager (RMAN) 10g : Reloaded
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
High-Availability Methods Lesson 25. Skills Matrix.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
Clustering  Types of Clustering. Objectives At the end of this module the student will understand the following tasks and concepts. What clustering is.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
ORACLE 10g DATAGUARD Ritesh Chhajer Sr. Oracle DBA.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
DATABASE MIRRORING  Mirroring is mainly implemented for increasing the database availability.  Is configured on a Database level.  Mainly involves two.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
Selling the Storage Edition for Oracle November 2000.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
OSIsoft High Availability PI Replication
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Marcin Blaszczyk, Zbigniew Baranowski – CERN Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
CERN IT Department CH-1211 Genève 23 Switzerland t Possible Service Upgrade Jacek Wojcieszuk, CERN/IT-DM Distributed Database Operations.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
High Availability in DB2 Nishant Sinha
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
18 Copyright © 2004, Oracle. All rights reserved. Backup and Recovery Concepts.
Course Topics Administering SQL Server 2012 Jump Start 01 | Install and Configure SQL Server04 | Manage Data 02 | Maintain Instances and Databases05 |
CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.
CERN IT Department CH-1211 Genève 23 Switzerland 1 Active Data Guard Svetozár Kapusta Distributed Database Operations Workshop November.
18 Copyright © 2004, Oracle. All rights reserved. Recovery Concepts.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop November 17 th, 2010 Przemyslaw Radowiecki CERN.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
6 Copyright © Oracle Corporation, All rights reserved. Backup and Recovery Overview.
Windows Server Failover Clustering (WSFC) with SQL Server.
14 Copyright © 2007, Oracle. All rights reserved. Backup and Recovery Concepts.
CERN IT Department CH-1211 Genève 23 Switzerland t Using Data Guard for hardware migration UKOUG RAC & HA SIG, Feb 2008 Miguel Anjo, CERN.
Oracle Standby Implementation Tantra Invedy. Standby Database Introduction Fail over Solution Disaster Recovery Solution if remote Ease of implementation.
Oracle Database High Availability
Backup & Recovery of Physics Databases
Maximum Availability Architecture Enterprise Technology Centre.
Oracle Database High Availability
Introduction of Week 6 Assignment Discussion
SQL Server High Availability Amit Vaid.
Introduction.
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t Data Protection with Oracle Data Guard Jacek Wojcieszuk, CERN/IT-DM Distributed Database Operations Workshop November 12 th, 2008

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 2 Outline Main challenges Oracle Maximum Availability Architecture Backup and recovery CERN Data Guard Physical Standby Database –Disaster recovery for production DBS –Standby technology for large-scale testing Possible RMAN backup improvements

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 3 Databases for LHC experiments - challenges Data volume –Very quick, linear growth of databases (especially during LHC run) –Some databases may reach 10 TB already next year Database availability –On-line databases highly critical for data taking Few hours of downtime can already lead to data loss –Off-line databases critical for data distribution and analysis

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop More challenges With data volume increase: –Increases probability of physical data corruption –Increases frequency of human errors Traditional RMAN and tape-based approach doesn’t fit well into this picture: –Leads to backup and recovery times proportional to or dependent on data volume –Makes certain recovery scenarios quite complex e.g: recovery of a single database object

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 5 Maximum Availability Architecture (MAA) Oracle's best practices blueprint Goal: to achieve the optimal high availability architecture at the lowest cost and complexity Helps to minimize impact of different types of unplanned and planned downtimes Is based on such Oracle products/features like: –RAC –ASM –RMAN –Flashback –Data Guard ailability/htdocs/maa.htmhttp:// ailability/htdocs/maa.htm

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 6 Evolution towards MMA at CERN – DAS era DB1 Clients DB2DB3 RMAN WAN/Intranet

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 7 Evolution towards MMA at CERN – RAC+ASM era Clients SAN RAC database with ASM RMAN WAN/Intranet

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 8 Evolution towards MMA at CERN – RAC + ASM + on-disk copies Clients WAN/Intranet SAN RAC database with ASM and on-disk copy RMAN

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 9 Strong and weak points Server failure: –RAC keeps the database available Failure at the storage level: –ASM keeps data healthy Small physical corruption: –On-disk or on-tape backup can be efficiently used to resolve the problem Logical corruption (human error): –On-disk backup can be used restore damaged data quite quickly –However the procedure is labour-intensive and error-prone Wide-range physical data corruption: –On-disk backup can be used if not corrupted –On-tape backup can be used but time-to-restore proportional to the database size Disasters (flood, fire, overvoltage, etc): –On-tape backup can be used but time-to-restore proportional to the database size

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 10 Solution – Data Guard Physical Standby Database Mature and stable technology: –Introduced in Oracle 8i –Relying on old and proven functionality: redo generation media recovery Flexible configuration –Synchronous or asynchronous propagation of data changes –Immediate or delayed standby database’s updates –Different protection levels: Maximum performance Maximum availability Maximum protection Small performance overhead on the primary system

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 11 Data Guard Physical Standby Database WAN/Intranet Primary RAC database with ASM RMAN Physical Standby RAC database with ASM Data changes

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 12 Data Guard configuration details Standby databases configured as RAC –Servers sized to handle moderated applications’ load –Enough disk space to fit all the data and few days of archived redo logs –No Flash Recovery Area Identical OS and Oracle RDBMS version on primary and standby –The same patch level Asynchronous data transmission with the LGWR process –Standby redo logs necessary Shipped redo data applied with 24 hours delay No fast-start-failover

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 13 Handling human errors If a logical data corruption is discovered within assumed lag period (24 hours) standby database can be used to recover corrupted data Procedure: –Stop all standby RAC instances but one –Disable managed recovery mode: –If needed roll standby database forward to a desired point in time –Create a guaranteed restore point on a standby database: STDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; STDBY> CREATE RESTORE POINT bef_stop GUARANTEE FLASHBACK DATABASE; –Switch logs and disable log transmission: PRIMARY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_x=DEFER SID='*'; PRIMARY> ALTER SYSTEM ARCHIVE LOG CURRENT; STDBY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_x=DEFER SID='*';

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 14 Handling human errors (2) –Open standby database for writing: STDBY> ALTER DATABASE ACTIVATE STANDBY DATABASE; STDBY> ALTER DATABASE OPEN; –Copy over corrupted data to the primary database –Flashback database and resume standby: STDBY> STARTUP MOUNT FORCE; STDBY> FLASHBACK DATABASE TO RESTORE POINT bef_stop; STDBY> ALTER DATABASE CONVERT TO PHYSICAL STANDBY; STDBY> STARTUP MOUNT FORCE; STDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT; –Cleanup: STDBY> DROP RESTORE POINT bef_stop; STDBY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_x=ENABLE SID='*'; PRIMARY> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_x=ENABLE SID='*';

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 15 Running tests Standby Database can be open read-write to perform applications’ tests –Performance tests –Change/migration tests –Etc... –Especially useful in case of applications with a lot of data Procedure is basically identical to the one used for handling human errors While standby database is open read-write it it can’t receive data changes from the primary system

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 16 Handling wide-range corruptions and disasters In case the primary system becomes unavailable clients can be failed over to the standby database Data Guard environment can be configured for either automatic (fast-start-failover) or manual failover The most tricky part is re-directing clients to the standby database in an efficient way Two approaches: –DNS alias-based –Oracle service-based

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 17 DNS alias based failover A DNS alias assigned to each VIP of the primary RAC Clients specify those DNS aliases in their connection descriptors During failover aliases have to be moved to VIPs of the standby cluster Pros: –Simplicity Cons: –Moving aliases requires noticeable time (~30 min at CERN) –Problematic when standby RAC has more nodes than the primary one alias1alias2alias3 Primary RAC database Oracle Service_A App_A = DESCRIPTION= (ADDRESS= (PROTOCOL=TCP) (HOST=alias1) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=alias2) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=alias3) (PORT=1521) )... (SERVICE_NAME=Service_A)... Standby RAC database Oracle Service_A

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 18 Oracle service based failover Client specify both primary and standby db VIPs in the connection descriptor Oracle service used by the client started only on the primary After failover a trigger starts the service on the standby database Clients use SQLNET.OUTBOUND_CONNECTION_TIMEOUT Pros: –Relatively short downtime associated with the failover Cons: –Quite complex setup Primary RAC database Oracle Service_A App_A = DESCRIPTION= (ADDRESS= (PROTOCOL=TCP) (HOST=prim1) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=prim2) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=prim3) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=stdby1) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=stdby2) (PORT=1521) ) (ADDRESS= (PROTOCOL=TCP) (HOST=stdby3) (PORT=1521) )... (SERVICE_NAME=Service_A)... stdby1stdby2stdby3 Standby RAC database Oracle Service_A prim1prim2prim3

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 19 Possible improvements – Active Data Guard New feature of Oracle 11gR1: –Extra-cost option for Oracle RDBMS 11g Enterprise Edition Allows clients to connect to and query a physical standby database while ‘managed recovery’ mode enabled: –Better resource utilization –Excellent idea for read-only workload –Many potential use-cases at CERN e.g. CMS on-line database Extensive tests started

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 20 Possible improvements – LAN- free tape backups Traditionally tape backups are sent over a general purpose network to a media management server: –This may limit backup speed to ~80 MB/s –At the same time tape drives can archive data with the speed of MB/s compressed Tivoli Storage Manager supports so-called LAN-free backup: –Backup data flows to tape drives directly over SAN –Media Management Server used only to register backups –Preliminary tests show that with 2 tape drives a database can be backed up with the speed of ~400 MB/s Database Media Management Server Tape drives RMAN backups Metadata

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 21 Possible improvements - Using a disk pool instead of tapes Tape infrastructure is expensive and difficult to maintain: –costly hardware and software –noticeable maintenance effort –tape media is quite unreliable and needs to be validated At the same time disk space is getting cheaper and cheaper: –1.5 TB SATA disks already available Pool of disks can be easily configured as destination for RMAN backups: –Can simplify backup infrastructure –Can improve backup performance –Can increase backup reliability Several configurations possible: –NFS mounted disks –SAN-attached storage with CFS –SAN-attached storage with ASM Tests is progress

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 22 Conclusions Production databases are growing very large –Recovery time in case of failure becomes critical Procedures successfully tested to make recovery time scale: –On-disk backup –Physical Standby for disaster recovery –Physical Standby for large-scale testing –RMAN backup optimisations

CERN IT Department CH-1211 Genève 23 Switzerland t Distributed Databse Operations Workshop - 23 Q&A Thank you