Backup and Recovery for Hadoop: Plans, Survey and User Inputs

Slides:



Advertisements
Similar presentations
Networking Essentials Lab 3 & 4 Review. If you have configured an event log retention setting to Do Not Overwrite Events (Clear Log Manually), what happens.
Advertisements

Database Management System MIS 520 – Database Theory Fall 2001 (Day) Lecture 13.
Backup and Disaster Recovery (BDR) A LOGICAL Alternative to costly Hosted BDR ELLEGENT SYSTEMS, Inc.
11 BACKING UP AND RESTORING DATA Chapter 4. Chapter 4: BACKING UP AND RESTORING DATA2 CHAPTER OVERVIEW Describe the various types of hardware used to.
FlareCo Ltd ALTER DATABASE AdventureWorks SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS Slide 1.
©2006 ITT Educational Services Inc. Course Name: IT390 Business Database Administration Unit 5 Slide 1 IT390 Business Database Administration Unit 5 :
Keith Burns Microsoft UK Mission Critical Database.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
CN2140 Server II Kemtis Kunanuraksapong MSIS with Distinction MCT, MCITP, MCTS, MCDST, MCP, A+
Chapter 12 File Management Systems
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Implementing Disaster Protection
®® Microsoft Windows 7 for Power Users Tutorial 10 Backing Up and Restoring Files.
Backup and Recovery Part 1.
Back Up and Recovery Sue Kayton February 2013.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Maintaining Windows Server 2008 File Services
Module 8 Implementing Backup and Recovery. Module Overview Planning Backup and Recovery Backing Up Exchange Server 2010 Restoring Exchange Server 2010.
Module 12: Planning for and Recovering from Disasters.
1. Preventing Disasters Chapter 11 covers the processes to take to prevent a disaster. The most prudent actions include Implement redundant hardware Implement.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Module 8: Designing Active Directory Disaster Recovery in Windows Server 2008.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Chapter 18: Windows Server 2008 R2 and Active Directory Backup and Maintenance BAI617.
DotHill Systems Data Management Services. Page 2 Agenda Why protect your data?  Causes of data loss  Hardware data protection  DMS data protection.
Asset & Security Management Chapter 9. IT Asset Management (ITAM) Is the process of tracking information about technology assets through the entire asset.
15 Copyright © 2005, Oracle. All rights reserved. Performing Database Backups.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
David N. Wozei Systems Administrator, IT Auditor.
NOAA WEBShop A low-cost standby system for an OAR-wide budgeting application Eugene F. Burger (NOAA/PMEL/JISAO) NOAA WebShop July Philadelphia.
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
15 Copyright © 2007, Oracle. All rights reserved. Performing Database Backups.
11 DISASTER RECOVERY Chapter 13. Chapter 13: DISASTER RECOVERY2 OVERVIEW  Back up server data using the Backup utility and the Ntbackup command  Restore.
Distributed Backup And Disaster Recovery for AFS A work in progress Steve Simmons Dan Hyde University.
1 Backups Part of a Systems Administrators job is maintaining the integrity of the system. This does not mean that she is expected to prevent anything.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
The Problem YOU are responsible for confidential, mission- critical data... but.
2 Copyright © 2007, Oracle. All rights reserved. Configuring for Recoverability.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
1 Calgary SharePoint Users Group The Challenges of Office 365.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
WHAT ARE BACKUPS? Backups are the last line of defense against hardware failure, floods or fires the damage caused by a security breach or just accidental.
Backup and Disaster Dr Stuart Petch CeG IT/IS Manager
Basic Guide to Computer Backups Eric Moore Computer Users Group of Greeley September 13, 2008.
Networking Objectives Understand what the following policies will contain – Disaster recovery – Backup – Archiving – Acceptable use – failover.
Big Data & Test Automation
Backups for Azure SQL Databases and SQL Server instances running on Azure Virtual Machines Session on backup to Azure feature (manual and managed) in SQL.
Database recovery contd…
Planning for Application Recovery
Basic Guide to Computer Backups
CompTIA Security+ Study Guide (SY0-401)
TRANSACTION PROCESSING SYSTEM (TPS)
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Maintaining Windows Server 2008 File Services
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Building Effective Backups
Ministry of Higher Education
Storage & Digital Asset Management CIO Council Update
CompTIA Security+ Study Guide (SY0-501)
Back Up and Recovery Sue Kayton October 2015.
Backing Up 01/12/2018.
Backup and restoration of data, redundancy
data backup & system report
Presentation transcript:

Backup and Recovery for Hadoop: Plans, Survey and User Inputs Prasanth Kothuri, CERN Hadoop User Forum 18th May 2016

Agenda Need for Backup Backup Design Philosophy Questionnaire & User Inputs Implementation Challenges Size and Complexity

What can go wrong? Hardware Failures Human / Application Error Disk Corruption Node Failure Rack Failure Human / Application Error Logical corruption Accidental deletion of data Site Failure

What Needs Protection? Data sets Applications Configuration Data & Meta data (NameNode, Hive, HBASE etc) Applications System and user applications Configuration Config of various Hadoop components

Spotlight is on.. Data sets For many of the applications / analysis this is the only copy of the data, i.e the data sets cannot be regenerated And the applications and configuration is protected by the way we work (git, puppet and service documentation)

Why it’s not already there? No out of the box point in time recovery solution Complex and not straightforward with several data stores (HDFS, HBASE) Hence we work with the users to understand their usage of Hadoop and come up with possible implementations

Backup Design Philosophy Determine whether the data stored in HDFS are mission critical Determine the datasets to be backed up? Priority of the datasets to be backed up What’s the best way to back up data?

Backup Design Philosophy Changes since the last backup Tracking the changes and the count of files added/changed The time interval between backups (reducing the window of possible data loss) The rate of new data arrival Determine the size of new data Time factor involved between two backups (data loss)

Questionnaire – Gathering User Inputs

Questionnaire – User Inputs How critical is the data and do you have another copy outside of hadoop Semi-critical to very critical Only one user responded as ‘not critical’ For about 50% of the applications data in hadoop is the only copy What type of data is stored in Hadoop Server logs, security logs, system logs, job logs and metrics Event metadata (ATLAS) Accelerator devices logs

Questionnaire – User Inputs Data organization Most of the users responded with precise HDFS tree structure Example:- /project/itmon/archive/{provider}/{hostgroup}/YYYY-MM /user/WDTMON/{provider}/YYYY/MM/DD Data in HBASE How dynamic is the data Data once written is not changed for most applications

Questionnaire – User Inputs Daily volume growth 2 TB/day to 150 GB/day Acceptable amount of data loss All the users replied ‘NO DATA LOSS’ How quickly we need to recover All of them replied as soon as possible

Questionnaire – User Inputs Could you envisage to trigger the backup yourself? Except one all of the users want central Hadoop service to run backups What is the required backup retention Same as matching what’s stored on HDFS Should the restore performance favour bulk or detail restore User responses are a mixture between bulk and detail restore

Backup Ideas Possible generic Hadoop backup solutions Copying vs Teeing Copying to another storage system like EOS/CASTOR distcp is run to copy data from master storage to slave storage Teeing

Backup Ideas HDFS Snapshots Snapshots of the entire file system or subtree of the file system can be taken for data backup, protection against user errors and disaster recovery. Snapshot differential report can be used for incremental backups

Backup Ideas Possible backup solutions for HBASE

Size and Complexity Application Current Size Daily Growth IT Monitoring 320.5 TB 140 GB WLCG Monitoring 40.0 TB 60 GB IT Security 156.8 TB 2048 GB Accelerator Logging 500.0 TB 500 GB ATLAS Rucio 5.9 TB CMS WMArchive 1.0 TB 50 GB AWG 62.2 TB CASTOR Logs 163.1 TB WinCC OA 10.0 TB 25 GB ATLAS EventIndex 220.0 TB 200 GB* Total 1.5 PB 4 TB

Challenges and Complexity Performing the backups without impacting performance Ability to determine the changes for the incremental backup Catering to various Hadoop processing frameworks (HDFS, HBASE, Impala etc) Consistency of the backups

A word on Recovery Multiple scenarios – from total site failure to lost files Involves manual actions, can be time consuming and collaboration with the application users Some identified scenarios should be tested and rehearsed

Conclusion Thanks to the user response to the questionnaire we now have requirements for the backup and recovery on Hadoop There are several possible implementations We will do the PoC to determine a solution that covers all/most of the user requirements

Discussion / Feedback Q & A