Emil Pilecki Credit: Luca Canali, Marcin Blaszczyk, Steffen Pade.

Slides:



Advertisements
Similar presentations
SQL-BackTrack for Sybase
Advertisements

Active Data Guard at CERN
ORACLE DATABASE HIGH AVAILABILITY & ORACLE 11GR2 DATA GUARD 1 Güneş EROL.
ITEC474 INTRODUCTION.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Deployment Agility Through Tier Testing Hanan Hit, NoCOUG President 2009.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
© 2015 Dbvisit Software Limited | dbvisit.com An Introduction to Dbvisit Standby.
FlareCo Ltd ALTER DATABASE AdventureWorks SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS Slide 1.
Replication Technologies at WLCG Lorena Lobato Pardavila CERN IT Department – DB Group JINR/CERN Grid and Management Information Systems, Dubna (Russia)
Keith Burns Microsoft UK Mission Critical Database.
10 Copyright © 2009, Oracle. All rights reserved. Managing Undo Data.
EIM April 19, Robin Weaver 13 Years with IBM Prior to Assignment at UNC Charlotte Range of Database Development/Data Management Projects and Products.
Oracle Database 11g: First Experiences with Grid Computing
Backup and Recovery Part 1.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
National Manager Database Services
Oracle backup and recovery strategy
Introduction to Oracle Backup and Recovery
Database Upgrade/Migration Options & Tips Sreekanth Chintala Database Technology Strategist.
Backup & Recovery Concepts for Oracle Database
Proven Techniques for Maximizing Availability Maximum Availability Architecture Lawrence To, Shari Yamaguchi High Availability Systems Group Systems Technologies.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
1 Data Guard Basics Julian Dyke Independent Consultant Web Version - February 2008 juliandyke.com © 2008 Julian Dyke.
Building Highly Available Systems with SQL Server™ 2005 Vineet Gupta Evangelist – Data and Integration Microsoft Corp.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
Lost Writes, a DBA’s Nightmare? UKOUG 2013 Technology Conference Luca Canali – CERN Marcin Blaszczyk - CERN.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
16 Copyright © 2007, Oracle. All rights reserved. Performing Database Recovery.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
Oracle Database 10 g Time Navigation: Human-Error Correction Magnus Lubeck DBA/Systems Analyst CERN, IT-DB Group Session id: Tammy Bednar Sr. Product.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Continuous DB integration testing with RAT „RATCOIN”
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
IT Database Administration Section 09. Backup and Recovery Backup: The available options Full Consistent (cold) Backup Database shutdown, all files.
Marcin Blaszczyk, Zbigniew Baranowski – CERN Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions.
High Availability in DB2 Nishant Sinha
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
6 Copyright © 2007, Oracle. All rights reserved. Performing User-Managed Backup and Recovery.
CERN IT Department CH-1211 Genève 23 Switzerland 1 Active Data Guard Svetozár Kapusta Distributed Database Operations Workshop November.
Your Data Any Place, Any Time Always On Technologies.
Agenda Data Guard Architecture & Features
13 Copyright © 2007, Oracle. All rights reserved. Using the Data Recovery Advisor.
1 Copyright © 2007, Oracle. All rights reserved. Realistic Testing Setting Up the Test Environment.
10 Copyright © 2007, Oracle. All rights reserved. Managing Undo Data.
Oracle Database High Availability
Backups for Azure SQL Databases and SQL Server instances running on Azure Virtual Machines Session on backup to Azure feature (manual and managed) in SQL.
Database recovery contd…
Agenda Data Guard Architecture & Features
How To Pass Oracle 1z0-060 Exam In First Attempt?
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Oracle Database High Availability
Introduction of Week 6 Assignment Discussion
Your Data Any Place, Any Time
Performing Database Recovery
Introduction.
Oracle Data Guard Session-4
Designing Database Solutions for SQL Server
Presentation transcript:

Emil Pilecki Credit: Luca Canali, Marcin Blaszczyk, Steffen Pade

Agenda About CERN Oracle and Data Guard at CERN DG perks and benefits Zero data loss over long distances (far sync) Far sync testing results

3 About CERN European Organization for Nuclear Research founded in member states, 2 candidates, 6 observers + UNESCO and UE 60 Non-member States collaborate with CERN 2500 staff members and scientists

4 LHC and Experiments Large Hadron Collider (LHC) – particle accelerator collides beams at very high energy 27 km long circular tunnel Located ~100m underground Protons travel at % the speed of light Collisions are analysed with usage of special detectors and software in the experiments dedicated to LHC New particle discovered! Consistent with the Higgs Boson Announced on July 4th 2012

5 Oracle at CERN Since 1982, version 2.3 Oracle DBs play a key role in the LHC production chains Accelerator logging and monitoring systems Online acquisition, offline data (re)processing, data distribution, analysis Grid infrastructure and operation services Monitoring, dashboards, etc. Data management services File catalogues, file transfers, etc. Metadata and transaction processing for tape storage system Administrative services

6 CERN’s Databases Over 100 Oracle databases, mostly RAC NAS storage plus some SAN with ASM ~400 TB of data files for production DBs Examples of CERN’s critical DBs: LHC logging database ~170 TB, expected growth up to 70 TB / year 13 Production experiments’ databases ~140 TB in total 15 production systems protected with Data Guard Active Data Guard since 11g

7 Our Data Guard architecture Primary Database Active Data Guard for disaster recovery Active Data Guard for read only workloads 2. Busy & critical ADG 1. Low load ADG Active Data Guard for read only workloads and disaster recovery Primary Database Maximum performance Redo Transport LOG_ARCHIVE_DEST_X=‘SERVICE= OPTIONAL ASYNC NOAFFIRM VALID FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME= ’

8 (Active) Data Guard benefits Features and functionalities we profit from: Data protection for disaster recovery Replication and offloading read only workload Database backups from standby Safeguard logical data corruptions with flashback Snapshot standby for testing Fast upgrades and hardware migrations Detection of lost writes Automatic block media recovery

9 Disaster recovery We have been using it since a few years Switchover/failover is our first line of defence Saved the day already for production services Current disaster recovery site at 10 km away from our main datacentre Remote site in Hungary to be used soon Over 1000km distance Network latency of 25ms is a challenge Plan to move most of the standby databases there within 1 year

10 Offloading production databases Efficient replication of the whole database Workload distribution Transactional workload runs on primary Read-only workload can be moved to ADG Read-mostly workload: DMLs can be redirected to primary with a dblink Database backups from standby Significantly reduces load on primary by Removes sequential I/O of full backup ADG allows usage of block change tracking for fast incremental backups

11 Flashback and snapshot standby Flashback enabled on standby only Recover from human errors and data corruptions Avoid impacting primary database with flashback logs generation Snapshot standby Testing changes before implementing them on primary Safe – redo is still sent to standby Very easy to use SQL> ALTER DATABASE CONVERT TO SNAPSHOT STANDBY; SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;

12 Fast upgrades and migrations Clusterware 11g + RDBMS 11g Clusterware 12c + RDBMS 11g Redo Transport RW Access RW Acess Clusterware 12c + RDBMS 12c RDBMS upgrade DATABASE downtime Upgrade complete!

13 Fast upgrades and migrations Risk mitigation Fresh installation of the new clusterware Old system stays untouched Allows full upgrade test Allows stress testing of new system Downtime reduction ~ 1h for RDBMS upgrade Additional hardware required unless migration to new one is expected anyway

14 Lost write detection and ABMR Slave exiting with ORA-752 exception Errors in file /ORA/dbs0a/PDBR_RAC50/diag/rdbms/pdbr_rac50/PDBR1/trace/PDBR1_pr0l_92600.trc: ORA-00752: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 67, block# , file offset is bytes) ORA-10564: tablespace STRMMON ORA-01110: data file 67: '/ORA/dbs03/PDBR_RAC50/datafile/STRMMON_67.dbf' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# Mon Apr 14 06:52: Recovery Slave PR0L previously exited with exception 752 Stops redo application when a lost write is detected Previous consistent block version still on standby Helps to diagnose and repair the error Automatic Block Media Recovery with ADG Fixes physical block corruptions Works both ways: Primary  ADG

15 Zero data loss replication Use synchronous redo transport method DML statements impacted due to commit acknowledgment on standby LOG_ARCHIVE_DEST_X=‘SERVICE= OPTIONAL SYNC AFFIRM VALID FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME= ’ Data Guard Standby Primary Database Redo Transport Commit Ack Network latency matters!!!

16 Long distances = high network latency = slow commit acknowledge with SYNC redo transport Far Sync concepts Redo Transport 25 ms sync async

17 Far Sync testing at CERN Functional Does it work? Are there any bugs? Performance Simulated heavy DML workload with and without Far Sync Oracle Real Application Testing – workload captured from production databases Redo Transport 25 ms Redo Far Sync

18 Far Sync testing results Functional tests It works well!!! but… : FRA not cleaned up automatically on FAR SYNC instance : Failover to alternate destination does not work with FAR SYNC Both bugs still present in production Some configuration issues with Data Guard Broker Redo Transport 25 ms Redo Far Sync

19 Far Sync testing results Performance tests with simulated heavy DML workload 256 parallel sessions inserting data in 500 row batches, 50 batches per session. The target table partitioned and indexed: 4 local b-tree indexes, 6 local bitmap indexes, global primary key index with reversed keys. Each session inserting data into it's own partition.

20 Far Sync testing results Performance tests with Oracle Real Application Testing framework Real production workload captured per schema Workload replay with and without Far Sync 25ms latency Replay parameters: connect_time_scale=0 think_time_scale=0 CMSR – DML mostly workload LCGR – read only workload

21 Far Sync summary Very promising for long distance replication if data loss is not acceptable Up to 60% performance gain (DML only workloads) with 25ms network latency Lightweight and easy to deploy (virtual machine) If latency <5ms most likely you don’t need Far Sync There are still bugs that need fixing Redo Transport 25 ms Redo Far Sync

22 Discussion