Backup & Recovery of Physics Databases

Slides:



Advertisements
Similar presentations
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
Advertisements

Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
EIM April 19, Robin Weaver 13 Years with IBM Prior to Assignment at UNC Charlotte Range of Database Development/Data Management Projects and Products.
Backup and Recovery Part 1.
CERN IT Department CH-1211 Genève 23 Switzerland t Backup configuration best practices Jacek Wojcieszuk, CERN IT-DM Distributed Database.
Backup/Recovery Strategy and Impact on Applications Jacek Wojcieszuk, CERN IT Database Deployment and Persistancy Workshop October, 2005.
Chapter 5 Configuring the RMAN Environment. Objectives Show command to see existing settings Configure command to change settings Backing up the controlfile.
Backup & Recovery with RMAN
1 Recovery and Backup RMAN TIER 1 Experience, status and questions. Meeting at CNAF June of 2007, Bologna, Italy Carlos Fernando Gamboa, BNL Gordon.
CHAPTER 17 Configuring RMAN. Introduction to RMAN RMAN was introduced in Oracle 8.0. RMAN is Oracle’s tool for backup and recovery. RMAN is much more.
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Backup Rationalisation Reorganisation of the CERN Computer Centre Backups David Asbury IT/DS Friday 6 December 2002.
Introduction to Oracle Backup and Recovery
Simplify your Job – Automatic Storage Management Angelo Session id:
Database Upgrade/Migration Options & Tips Sreekanth Chintala Database Technology Strategist.
Backup & Recovery Concepts for Oracle Database
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
CERN IT Department CH-1211 Genève 23 Switzerland t Data Protection with Oracle Data Guard Jacek Wojcieszuk, CERN/IT-DM Distributed Database.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Backup & Recovery 1.
Oracle Recovery Manager (RMAN) 10g : Reloaded
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
BACKUP & RESTORE The purpose of backup is to protect data from loss. The purpose of restore is to recover data that is temporarily unavailable due to some.
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
Chapter 18: Windows Server 2008 R2 and Active Directory Backup and Maintenance BAI617.
15 Copyright © 2005, Oracle. All rights reserved. Performing Database Backups.
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
RMAN: Your New Best Friend for Backup and Recovery Ruth Gramolini ORACLE DBA Vermont Department of Taxes.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
Backup and Recovery Overview Supinfo Oracle Lab. 6.
Distributed Backup And Disaster Recovery for AFS A work in progress Steve Simmons Dan Hyde University.
CERN IT Department CH-1211 Genève 23 Switzerland t Possible Service Upgrade Jacek Wojcieszuk, CERN/IT-DM Distributed Database Operations.
1 D0 Taking Stock By Anil Kumar CD/LSCS/DBI/DBA June 11, 2007.
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
2 Copyright © 2007, Oracle. All rights reserved. Configuring for Recoverability.
2 Copyright © 2006, Oracle. All rights reserved. Configuring Recovery Manager.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop November 17 th, 2010 Przemyslaw Radowiecki CERN.
6 Copyright © Oracle Corporation, All rights reserved. Backup and Recovery Overview.
CDP Technology Comparison CONFIDENTIAL DO NOT REDISTRIBUTE.
Oracle Standby Implementation Tantra Invedy. Standby Database Introduction Fail over Solution Disaster Recovery Solution if remote Ease of implementation.
Unit 8: Database and Storage Pool Backup and Recovery.
PHD Virtual Technologies “Reader’s Choice” Preferred product.
Sharing experience on RMAN backups ...
Database recovery contd…
Open-E Data Storage Software (DSS V6)
Storage Area Networks The Basics.
Integrating Disk into Backup for Faster Restores
Disaster Planning and Recovery
Maintaining Windows Server 2008 File Services
Backup and Recovery (1) Oracle 10g Hebah ElGibreen CAP364.
Experiences and Outlook Data Preservation and Long Term Analysis
Maximum Availability Architecture Enterprise Technology Centre.
Bharath Ram Ramanathan, Storage Solutions TME,
SAN and NAS.
Acutelearn Technologies Tivoli Storage Manager(TSM) Training Tivoli Storage Manager Basics: Tivoli Storage Manager Overview Tivoli Storage Manager concepts.
Introduction of Week 6 Assignment Discussion
Oracle Storage Performance Studies
Case studies – Atlas and PVSS Oracle archiver
Data Lifecycle Review and Outlook
SpiraTest/Plan/Team Deployment Considerations
Fault Tolerance Distributed Web-based Systems
Performing Database Recovery
Presentation transcript:

Backup & Recovery of Physics Databases Jacek Wojcieszuk IT-DM Technical Meeting December 16th, 2008 The title has to be worked on.

Outline Why to backup? Current implementation – Maximum Availability Architecture Main concerns Possible improvements IT/DM technical meeting - 2

Why to backup? Backup is one of the main techniques for data protection Properly planned and implemented backup and recovery strategy is critical for business continuity Ideally backups should allow for recovery from any kind of failure without data loss. IT/DM technical meeting - 3

Types of failures Oracle instance failure Media failure Usually due to an Oracle process failure Media failure Disk failure, controller failure, etc. Physical data corruption Human error In most cases accidentally deleted/updated data Database user or DBA Disaster Fire, flood, earthquake, plane crash, overvoltage, etc. IT/DM technical meeting - 4

Available tools Oracle offers many tools that help to backup data and address failures: Recovery Manager (RMAN) Data Guard Export/Import Data Pump Streams Oracle supports using OS and hardware features for taking backups snapshots cp command None of mentioned tools alone can protect the data from all types of failures IT/DM technical meeting - 5

Oracle Maximum Availability Architecture (MAA) Oracle's best practices blueprint Goal: to achieve the optimal high availability architecture at the lowest cost and complexity Helps to minimize impact of different types of unplanned and planned downtimes Is based on such Oracle products/features like: RAC ASM RMAN Flashback Data Guard http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm This slide is just to provide some information about MAA and at the same time to give some extra context for the rest of the presentation. On OTN one can also find tables showing impact (expressed in lenght of db unavailability) of different planned and unplanned interventions in case the system implements MAA recommendations. I was thinking about putting an extra slide with this tables. IT/DM technical meeting - 6

Data protection at CERN – RAC + ASM + on-disk copies Clients WAN/Intranet SAN RAC database with ASM TSM RMAN IT/DM technical meeting - 7

Data Guard Physical Standby Database WAN/Intranet RMAN Primary RAC database with ASM Physical Standby RAC database with ASM Data changes Physical standby database added. Animation. Standby database and data flow path will show up after clicking. IT/DM technical meeting - 8

Backup implementation - summary Tape-based incremental backup strategy Be-weekly full backups Daily incremental backups Hourly archivelog backups Disk-based incrementally updated image copies The copy lags 2-3 days behind the database to facilitate handling human errors Oracle Data Guard physical standby databases Configured for 1 day lag IT/DM technical meeting - 9

Hours or days in case of restore from tapes Failure handling Failure Recovery Downtime Oracle instance failure Not needed - RAC keeps the database available Media failure Not needed - ASM keeps data healthy Small physical data corruption RMAN block media recovery using on-disk or on-tape backup Database: 0 Affected application: few hours Wide-range physical data corruption Switchover to the standby database RMAN full database restore using on-disk backup <1 hour with Data Guard <1 hour with on-disk backup Human error RMAN + DataPump using on-disk backup Standby DB + DataPump RMAN + DataPump using on-tape backup Affected application: few hours or even days (if on-tape backup needed) Disaster Switchover to the standby database (if available) RMAN full database restore using on-tape backups Hours or days in case of restore from tapes IT/DM technical meeting - 10

Main concerns Data volume Database availability Very quick, linear growth of databases (especially during LHC run) Some databases may reach 10 TB already next year Database availability On-line databases highly critical for data taking Few hours of downtime can already lead to data loss Off-line databases critical for data distribution and analysis This slide is quite clear, I think. I couldn’t find any concrete numbers concerning maximum allowed unavailability. If you know some you can quote them. IT/DM technical meeting - 11

... More concerns With data volume increase: Increases probability of physical data corruption Increases frequency of human errors Traditional RMAN and tape-based approach doesn’t fit well into this picture: Leads to backup and recovery times proportional to or dependent on data volume Currently at CERN speed of backup/recovery to/from tapes limited by the speed of 1 Gb Ethernet Standby database located in the same building as primary With the size of the databases the number of hardware pieces used to support them is growing which in turn increases probability of physical data corruption. Also the number of human erros can be higher in case of big databases although the dependence is not so clear in this case. By ‘recovery of a single database object’ I mean Tablespace point in time recovery. IT/DM technical meeting - 12

Possible improvements LAN-free backups to tapes Disk pool instead of tapes Declaring data read-only Archiving old data to limit database growth IT/DM technical meeting - 13

LAN-free tape backups Traditionally at CERN tape backups are sent over a general purpose network to a media management server: This limits backup/recovery speed to ~80 MB/s Backup/restore of a 10TB database takes almost 40 hours! At the same time tape drives can archive data with the speed of 160 MB/s compressed Metadata TSM Server 1Gb Database Animation showing the difference between both types of backups. RMAN backups Tape drives IT/DM technical meeting - 14

LAN-free tape backups (2) Tivoli Storage Manager supports so-called LAN-free backup When using LAN-free configuration: Backup data flows to tape drives directly over SAN Media Management Server used only to register backups Very good performance observed during tests (see next slide) Metadata TSM Server 1Gb Database Animation showing the difference between both types of backups. RMAN backups Tape drives IT/DM technical meeting - 15

LAN-free tape backups - tests 1 TB test database with contents very similar to one of the production DB ~5% of empty blocks Different TSM configurations: TCP and Shared Memory mode Backups taken using 1 or 2 streams TCP Shared mem 1 stream 198 MB/s 231 MB/s 2 streams 361 MB/s 402 MB/s Restore tests done using 1 stream only Performance of a test with 2 streams affected by Oracle software issues (followed up with Oracle Support) TCP Shared mem 1 stream 150 MB/s 158 MB/s IT/DM technical meeting - 16

Using a disk pool instead of tapes Tape infrastructure is expensive and difficult to maintain: costly hardware and software noticeable maintenance effort tape media is quite unreliable and needs to be validated At the same time disk space is getting cheaper and cheaper: 1.5 TB SATA disks already available Pool of disks can be easily configured as destination for RMAN backups: Can simplify backup infrastructure Can improve backup performance Can increase backup reliability I will extend this slide to provide more details about these improvements IT/DM technical meeting - 17

Using a disk pool instead of tapes (2) Several configurations possible: NFS mounted disks SAN-attached storage with a file system (tested) SAN-attached storage with ASM (tested) RMAN backups I will extend this slide to provide more details about these improvements Storage Area Network Database Remote storage IT/DM technical meeting - 18

Using a disk pool instead of tapes - tests Test perform as part of backup & recovery implementation for LHCb on-line database 1x16-bay disk array used as backup storage 2x8-disk RAID 5 devices Backup storage configured either as an ASM diskgroup or ext3 file system Tests performed with 4 streams ext3 ASM 4 streams 235 MB/s 369 MB/s Tests need to be repeated in the testbed used for test LAN-free backups to tapes IT/DM technical meeting - 19

Declaring data read-only Tablespaces containing static data can be declared read-only Read-only tablespaces does not need to be backed up as often as read-write ones This may significantly reduce amount of resources needed for backups In case of restores one can think of restoring read-only data after read-write part of the database is restored We are improving our backup scripts to handle properly read-only data IT/DM technical meeting - 20

Archiving legacy data Data collected by some big applications is accessed for a very limited period of time only Later on it is not needed but has to be kept ‘just in case’ Keeping such data on-line may badly affect database performance and time-to-recover To avoid that legacy data should be archived Archiving of Oracle data is not an easy task and cannot be transparently done by DBAs Proper application design is vital Splitting data using Oracle partitioning or schemas Ensuring self-containment of data from different periods IT/DM technical meeting - 21

Data Archiving – possible implementation Several implementations possible – the simplest presumes creation of an archive DB IT/DM technical meeting - 22

Conclusions Production databases are growing very large Recovery time in case of failure becomes critical Certain types of failures require database restore from a tape backup Time-to-recover proportional to the database size Hours or days in case of big databases LAN-free backups to tapes can significantly shorten backup&recovery time and lead to better resource utilization Replacing tapes with a disk pool can also result in significant backup and recovery time decrease Declaring data read-only and archiving can further help to keep the restore time reasonable IT/DM technical meeting - 23

Acknowledgements Many thanks to Dawid, Luca and Lukasz who were helping with the tests and Oracle tuning IT/DM technical meeting - 24

Q&A Thank you