Backup and restore of Oracle databases: introducing a disk layer

Slides:



Advertisements
Similar presentations
Tom Hamilton – America’s Channel Database CSE
Advertisements

ITEC474 INTRODUCTION.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Introduction to DBA.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
CERN - IT Department CH-1211 Genève 23 Switzerland t Backup & Recovery with RMAN LCG 3D Workshop, Bologna June 12 th, 2007 Jacek Wojcieszuk.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Barracuda Backup Service Data Backup and Disaster Recovery.
Backup and Recovery Part 1.
CERN IT Department CH-1211 Genève 23 Switzerland t Backup configuration best practices Jacek Wojcieszuk, CERN IT-DM Distributed Database.
Oracle’s Backup and Recovery Tool
Configuring Recovery Manager
4 Copyright © 2008, Oracle. All rights reserved. Configuring Backup Specifications.
Chapter 5 Configuring the RMAN Environment. Objectives Show command to see existing settings Configure command to change settings Backing up the controlfile.
9 Copyright © Oracle Corporation, All rights reserved. Oracle Recovery Manager Overview and Configuration.
1 Recovery and Backup RMAN TIER 1 Experience, status and questions. Meeting at CNAF June of 2007, Bologna, Italy Carlos Fernando Gamboa, BNL Gordon.
CHAPTER 17 Configuring RMAN. Introduction to RMAN RMAN was introduced in Oracle 8.0. RMAN is Oracle’s tool for backup and recovery. RMAN is much more.
The Oracle Recovery Manager (RMAN)
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Oracle backup and recovery strategy
CERN IT Department CH-1211 Geneva 23 Switzerland t CERN IT Department CH-1211 Geneva 23 Switzerland t
Introduction to Oracle Backup and Recovery
Using RMAN to Perform Recovery
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
Navigating the Oracle Backup Maze Robert Spurzem Senior Product Marketing Manager
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Oracle Recovery Manager (RMAN) 10g : Reloaded
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Recovery Manager Overview Target Database Recovery Catalog Database Enterprise Manager Recovery Manager (RMAN) Media Options Server Session.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
5 Copyright © 2004, Oracle. All rights reserved. Using Recovery Manager.
5 Copyright © 2008, Oracle. All rights reserved. Using RMAN to Create Backups.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
11 Copyright © Oracle Corporation, All rights reserved. RMAN Backups.
Backup and Recovery Protects From Data Loss. Backup and Recovery Protects From Data Loss Provides for Media Recovery.
11 Copyright © Oracle Corporation, All rights reserved. RMAN Backups.
Experience in running relational databases on clustered storage CERN, IT Department CHEP 2015, Okinawa, Japan 13/04/2015.
Chapter 9 Scripting RMAN. Background Authors felt that scripting was a topic not covered well Authors wanted to cover both Unix/Linux and Windows environments.
15 Copyright © 2005, Oracle. All rights reserved. Performing Database Backups.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
© 2006 IBM Corporation Flash Copy Solutions im Windows Umfeld TSM for Copy Services Wolfgang Hitzler Technical Sales Tivoli Storage Management
15 Copyright © 2007, Oracle. All rights reserved. Performing Database Backups.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
A Guide to Oracle9i1 Database Instance startup and shutdown.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
CERN IT Department CH-1211 Geneva 23 Switzerland t IT/DB Tests and evolution SSD as flash cache.
Distributed Backup And Disaster Recovery for AFS A work in progress Steve Simmons Dan Hyde University.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
CERN IT Department CH-1211 Genève 23 Switzerland t Possible Service Upgrade Jacek Wojcieszuk, CERN/IT-DM Distributed Database Operations.
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
3 Copyright © 2006, Oracle. All rights reserved. Using Recovery Manager.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
2 Copyright © 2007, Oracle. All rights reserved. Configuring for Recoverability.
2 Copyright © 2006, Oracle. All rights reserved. Configuring Recovery Manager.
8 Copyright © 2007, Oracle. All rights reserved. Using RMAN to Duplicate a Database.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
10 Copyright © 2007, Oracle. All rights reserved. Using RMAN Enhancements.
13 Copyright © 2007, Oracle. All rights reserved. Using the Data Recovery Advisor.
Oracle Database High Availability
Backup and Recovery (1) Oracle 10g Hebah ElGibreen CAP364.
Maximum Availability Architecture Enterprise Technology Centre.
Oracle Database High Availability
Presentation transcript:

Backup and restore of Oracle databases: introducing a disk layer by Ruben Gaspar IT-DB-DBB BR evolution: Backup to disk

Agenda CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary BR evolution: Backup to disk- 2

Agenda CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary BR evolution: Backup to disk- 3

Target Oracle databases for backup to disk ~70 Oracle databases, most of them running Oracle clusterware (RAC) 49 are being backed up to disk and then tape 21 are just backed up with snapshots. Test and development instances. 15 Data Guard RAC clusters in Prod Active Data Guard since upgrade to 11g They are just backed up to tape 10 Oracle single instance in DBaaS also backed up using snapshots. Redo Transport BR evolution: Backup to disk- 4

Oracle backup basics The Oracle clock: System Change Number (SCN) It will take 544 years to run out of SCN at 16K/s smon_scn_time tracks time versus SCN Type of backups Consistent: taken while database has been cleanly shutdown. All redo applied to data files. Archive logs are not produced. Inconsistent: taken while database is running. Database must be in archivelog mode. It means archive logs will be produced. Point in Time Recoveries (PITR) are possible. Drawback: clean-up of archivelogs is critical to avoid that database blocks → TSM was playing a critical role here Backup sets: Oracle proprietary format for backups. Binary files. Backup sets are containers for one or several backup pieces Backup pieces contain blocks of 1 or several data files (multiplexing) RMAN channels: disk or tape or proxy, read data files and write back to the backup media. We use SBT: serial backup to tape API, using IBM Tivoli Data Protection 6.3 (provided by TSM support) BR evolution: Backup to disk- 5

Oracle backup basics (II) Backup jobs based on templates. Recovery Manager API --Full backup incremental level 0 database; --comulative backup incremental level 2 cumulative database; --Incremental backup incremental level 1 database; --Archivelogs backup tag 'BR_TAG' archivelog all delete all input; Retention policy from 60 to 90 days, depending on DB. CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 90 DAYS; e.g. LEMONRAC → [1xfull + 6xdifferential + archivelogs] * 13 weeks Controlfile backup, automatically taken by each backup CONFIGURE CONTROLFILE AUTOBACKUP ON; e.g. LHCBSTG → [2xfull + 5xdifferential + 24x4 archivelogs] *13 weeks = 934GB 1 2 Arch Full Cum. Inc PITR Fulls (GB) Inc (GB) Archived logs Total LEMONRAC 87902.42 857.52 13319.39 102079.32 BR evolution: Backup to disk- 6

BR evolution: Backup to disk- 7

What is there to be backed up ? Backup jobs using RMAN API take care of : Database files: user and system files Control files: contain structure and status of data files. They have also all backup history Archived logs: backup of redo logs. Needed for inconsistent backup strategies. They need to be backed up and removed from the active file system otherwise if running out of space, database freezes/stops. 5.1TB redo logs produced per day ALL THREE ARE CRITICAL FOR A BACKUP/RECOVERY strategy BR evolution: Backup to disk- 8

Agenda CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary BR evolution: Backup to disk- 9

Backup architecture Custom solution: about 15k lines of code, Perl + Bash Flexible: easy to adapt to new Oracle release, backup media Based on Oracle Recovery Manager (RMAN) templates Central logging Easy to extend via Perl plug-ins: snapshot, exports, RO tablespaces,… We send compressed: 1 out of 4 full backups All archivelogs BR evolution: Backup to disk- 10

Impact on TSM Savings depend on database workload, e.g.: backup sets on disk for three databases DB Full (GB) Inc (GB) Archived logs (GB) Savings EDHP 29197.76 1216.697 2169.766 70% CASTORNS 4944.839 213.256 336.2889 71% ATLASSTG 1484.146 724.9567 3063.658 45% + x 1/4 Sent to tape + backup sets are compressed (see later) Source: TSM support Savings ~ 71% 17 5 BR evolution: Backup to disk- 11

Impact on TSM (II) ~70% savings ~47% savings 15 accounts: alicestg,atlasstg,cmsstg,castorns,.. Source: TSM support ~70% savings 29 accounts: pdb,wcernp,ITCORE,AISDBP,… ~47% savings BR evolution: Backup to disk- 12

Workflow for disk/tape backups Same workflow as per tape backups → to ease maintenance Disk or Tape templates are almost identical, just channel allocation differs Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI About 75 templates to adapt to all type of backup strategies Tape and disk backup strategies co-exist Reversible changing from one to another is a matter of changing templates. DISK BR evolution: Backup to disk- 13

Typical DB architecture LAN Public interface Public interface 10GbE RAC 10GbE 10GbE Interconnect 10GbE 1 GbE C-mode Cluster interconnect Private network 10GbE 1 GbE 10GbE 7-mode 6Gb/s Media Manager Server mgmt network 6Gb/s 01 02 03 04 backup01 backup02 IBM TSM Archivelogs controfile datafiles At least 2 file systems for backup to disk: /backup/dbsXX/DBNAME BR evolution: Backup to disk- 14

New C-mode features Transparent file system movements: cluster01::> volume move start -destination-aggregate aggr1_c01n02 -vserver vs1 -volume castorns03 -cutover-window 10 DNS load balancing inside the cluster Automatic virtual IP rebalancing (based on failover groups) Access security via “export-policy” joins firewall + different authentication mechanisms: sys, krb5, ntlm Global namespace Compression and Deduplication We strongly rely on compression as the way to satisfy 2.3PB of backup set storage needs using 1.1PB of disk The DNS load balancing zone dynamically calculates the load on all LIFs. Every LIF is assigned a weight based on its port load and CPU utilization of its home node. LIFs that are on less-loaded ports have a higher probability of being returned in a DNS query. Weights can also be manually assigned. The commands to manually assign weights are available only at the advanced privilege level. * Automatic LIF rebalancing gets disabled if a LIF is enabled for automatically reverting to the home-port (by enabling the auto-revert option in the network interface modify command). BR evolution: Backup to disk- 15

Backup to disk configuration on database servers Global namespace in use: /backup/dbsXX Ease management: mount point unchanged as data moves. It’s a Netapp C-mode feature (see later) 7-mode: mount –o … priv-controllerIP:/vol/castorns03 /ORA/dbs03/CASTOR C-mode: mount -o … public-ip-cluster:/backup/dbs01/CASTORNS /backup/dbs01/CASTORNS /backup/dbs01/<DBNAME> → autobackup controlfile + backupsets /backup/dbsXX/<DBNAME> → backupsets RMAN configuration parameters: minimal change CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/dbs01/<DBNAME>/<DBNAME>_%F'; BR evolution: Backup to disk- 16

for users’ access and for disaster recovery Particular cases Solution also operational in a Data Guard configuration: full and incremental taken on standby (more while talking about restores) Multiple channels: rman_channels_connect in order to distribute backup load Plug-in for RO tablespaces backup (ACCLOG: size about 170TB, growth 70TB/year) Automatic clean-up in case of tablespace state change One backup set per tablespace Extension to allow special mount points (ACCLOG) rman_mounts_readonly full + incremental + controlfile archivelogs + controlfile Redo Transport Primary Database Active Data Guard for users’ access and for disaster recovery username/password@rac-node1 username/password@rac-node2 BR evolution: Backup to disk- 17

Backup to disk performance Backups run faster ~ 50% than on tape ACCLOG full backup 5TB 34 hours ~ 35 MB/s Tape 14 hours ~ 100MB/s Disk Sending backup sets from disk to tape needs optimisation Work on progress with TSM support BR evolution: Backup to disk- 18

Backup to Disk space consumption Channels order is important → storage management Space distribution should be according planning to avoid miss balance. File systems should grow at same pace. Emptiest volume is always selected on top Automatic size extension BR evolution: Backup to disk- 19

Agenda CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary BR evolution: Backup to disk- 20

Recovery platform Only reliable proof of truth: run a recovery Any change introduce in backup platform/backup strategy is always validated via test recoveries Isolation Run independently of the production database Cant access any other system (database network links) No user jobs must run Flexible and easy to customize Maximize recovery server: several recoveries at the same time Exports taken after a successful recovery → help in support cases: mainly logical errors Open source: http://sourceforge.net/projects/recoveryplat/ BR evolution: Backup to disk- 21

Recovery platform (II) Introducing disk buffer highly improves our recovery testing Also tested with Data Guard configurations: Data Guard: Oracle support ID 1070039.1 RMAN> set backup files for device type disk to accessible Restore from disk are usually 50% faster More recoveries can be run, nowadays about 40 recoveries per week No blocking of tape resources that could be used by backups BR evolution: Backup to disk- 22

Agenda CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary BR evolution: Backup to disk- 23

Backup to disk cluster 2xFAS6240 Netapp controllers 24xdiskshelf DS4243 24x3TB SATA disks each (576 disks) raid_dp (raid6) → 1.1 PB usable space split into 8 aggregates ~ 135TB each 2xquad core 64bit Intel(R) Xeon(R) CPU E5540 @ 2.53GHz 10gbps connectivity Multipath SAS loops 3 gbps Flash cache 512GB per node BR evolution: Backup to disk- 24

How fast, How compressed Compression (datafiles) Online compression of datafiles ~55% (saved by compression) Backupsets compression of a 501 GB tablespace of random alphanumeric strings, dbms_random. *Ontap 8.1.1. fas6240, 72x 3TB SATA disks. no-compressed (t) basic low medium high No-compressed-fs Cron- compression Netapp 8.1.1 Inline-compression 501GB 83GB (6h21’) 116GB (49’) 88GB (07h23’) 82GB (11h02’) 459GB(41’) 188GB 188GB(46’) Percentage saved (%) 83% 76,8% 82,4% 83,6% 8,3% 62% BR evolution: Backup to disk- 25

Compression: real values Used(GB)* Saved (GB) %saved-by-compression AISDB_PROD 24719 25941 52 CASTORNS 3629 3448 49 CMSSTG 6510 6395 50 CSR 20636 32008 61 ITCORE 16387 23552 60 EDHP 9631 24913 66 LEMONRAC 47104 49152 51 *Space used on controller side Logical space used: Used + Saved BR evolution: Backup to disk- 26

NAS controllers throughput net_data_recv disk_data_written compression ratio BR evolution: Backup to disk- 27

Deduplication When combined with compression, it doesn’t provide good results Due to the way compression works: compression group: 32k, our Oracle block is 8k, Wafl block is 4k Control files are a different story. Block size of 16k 4k 4k DB         Type                      Location               Size(GB)   PAYP     archives               /backup/dbs01 0.91 PAYP     archives               /backup/dbs02 22.90 PAYP     controlfile           /backup/dbs01 456.92 PAYP     fullinc    /backup/dbs01 68.00 PAYP     fullinc    /backup/dbs02 81.10 Checksum BR evolution: Backup to disk- 28

Agenda CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary BR evolution: Backup to disk- 29

Summary Backup and Recovery testing is critical Tape copies are essential but TSM became a critical point of failure for DB services Adding a disk buffer Removes TSM criticality Reduces DB volume in TSM Speeds up backups and restores Better response time Better resource utilization Disk buffer plug-ins were easily integrated in our backup framework First system to exploit Ontap C-mode features Valuable experience for the future BR evolution: Backup to disk- 30

Questions ? BR evolution: Backup to disk- 31