Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB

Slides:

Advertisements

Similar presentations

MIS 2000 Class 20 System Development Process Updated 2014.

Advertisements

11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.

CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.

Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.

Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.

Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.

NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge

Experiences and Challenges running CERN's High-Capacity Tape Archive 14/4/2015 CHEP 2015, Okinawa2 Germán Cancio, Vladimír Bahyl

PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,

Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.

1 Maintain System Integrity Maintain Equipment and Consumables ICAS2017B_ICAU2007B Using Computer Operating system ICAU2231B Caring for Technology Backup.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Summary of CASTOR incident, April 2010 Germán Cancio Leader,

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Tape Monitoring Vladimír Bahyl IT DSS TAB Storage Analytics.

CERN Physics Database Services and Plans Maria Girone, CERN-IT

ALMA Archive Operations Impact on the ARC Facilities.

RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.

CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.

Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.

Report from CASTOR external operations F2F meeting held at RAL in February Barbara Martelli INFN - CNAF.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.

Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Drupal at CERN Juraj Sucik Jarosław Polok.

CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.

WLCG Service Report ~~~ WLCG Management Board, 18 th September

CERN - IT Department CH-1211 Genève 23 Switzerland Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN.

01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.

CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.

Tape archive challenges when approaching Exabyte-scale CHEP 2010, Taipei G. Cancio, V. Bahyl, G. Lo Re, S. Murray, E. Cano, G. Lee, V. Kotlyar CERN IT-DSS.

Storage & Database Team Activity Report INFN CNAF,

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR and EOS status and plans Giuseppe Lo Presti on behalf.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR status and development HEPiX Spring 2011, 4 th May.

CERN - IT Department CH-1211 Genève 23 Switzerland CERN Tape Status Tape Operations Team IT/FIO CERN.

15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.

CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc

Jean-Philippe Baud, IT-GD, CERN November 2007

ITIL: Service Transition

Integrating Disk into Backup for Faster Restores

Ian Bird WLCG Workshop San Francisco, 8th October 2016

Service Challenge 3 CERN

Elizabeth Gallas - Oxford ADC Weekly September 13, 2011

Experiences and Outlook Data Preservation and Long Term Analysis

Enrico Fattibene CDG – CNAF 18/09/2017

Update on Plan for KISTI-GSDC

CERN Lustre Evaluation and Storage Outlook

Luca dell’Agnello INFN-CNAF

WLCG Management Board, 16th July 2013

Castor services at the Tier-0

Olof Bärring LCG-LHCC Review, 22nd September 2008

WLCG Service Interventions

Ákos Frohner EGEE'08 September 2008

The INFN Tier-1 Storage Implementation

CTA: CERN Tape Archive Overview and architecture

Acutelearn Technologies Tivoli Storage Manager(TSM) Training Tivoli Storage Manager Basics: Tivoli Storage Manager Overview Tivoli Storage Manager concepts.

Upgrading to Microsoft SQL Server 2014

Monitoring of the infrastructure from the VO perspective

CASTOR: CERN’s data management system

Chapter 2: Operating-System Structures

Backup Monitoring – EMC NetWorker

Backup Monitoring – EMC NetWorker

Chapter 2: Operating-System Structures

Presentation transcript:

Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB CASTOR Review, CERN, 22 September 2010 Focus: - marketing, we are very good, more about how good we are doing things - explain we receive requests - deploy modification requests via change management - infrastructure media and drive problems - act as backend - everything is controlled - we do have monitoring - we do have media verification Service is well understood and managed - database lock contention - better decoupling - better interfaces - more modularity - better dependency between user access and tape activity - move out user access from CASTOR Media verification new and old media optimizing / tuning CASTOR to archive not HSM reduce tape mounts – read efficiency, work with users, improve access patterns Software improvements, write efficiency – buffered tape marks Serviceclass Number of repreated mounts Username:group (experiment)

Agenda Overview Success stories Challenges Outlook Conclusion Hardware, software, people Success stories Change management Monitoring Data unavailability vs. Proactive checking Documentation Problem resolution automation Challenges Media migration Recalls Outlook Conclusion

Overview Hardware Software People CASTOR Tape Service is shared 4 x Oracle SL8500, 30000 tapes, 70 x T10000B 3 x IBM TS3500 (+ 2 for backup), 15000 tapes, 60 x TS1130 ~150 tape servers (running SLC5, Quattor managed) Software CASTOR 2.1.9-5 ACSLS for Oracle libraries; SCSI media changer for IBM TSM 5 People 3 FTE CASTOR Tape Operations Service Managers + 2 FTE for TSM 1 external tape operator Vendor engineers (2) CASTOR Tape Service is shared CASTOR Tape Development Stager-Tape interface, Volume Database, Tape Request Queue manager, Tape Server daemons 2 FTE – part of the same section Spending often up to 50% of their time on 3rd level support Additional activities Capacity planning / procurement – new call for tender in 2011

Success Stories

Change management Tape infrastructure autonomous from stagers Keep it stable – only upgrade when tape related changes 2.1.9-5 (tape) vs. 2.1.9-8 (stagers) Always test new version/configuration Validate new setup on few servers in each tape library Wider deployment Announced beforehand Transparent – never disable whole library Changes tracked in Savannah Risk assessment Pre-Approval Notifications

Monitoring & Visualisation LEMON Reactive monitoring – tape servers act on simple errors Request queue monitored TapeLog = our central log database Data collected from various tape daemons Correlation engine handles complex incidents E. g.: a tape showing errors on several tape drives Additional actions recorded by human Record of what was done to tapes sent for recovery Interface for experts for data mining SLS for users Huge amount of detailed counters with plots Structured per stager/experiment

SLS examples CMS ATLAS

Data unavailability Tapes grow in capacities – currently at 1 TB More data on tape = more reasons to access it = higher the risk of an error

Media health-check / tape scrubbing When users tell you about unavailable data, it is too late = need to be proactive, not reactive Active archive manager needs to know the state of the data in the archive Perform periodic checks = read the data back All tapes written FULL Tapes not accessed for very long time Process runs in the background Uses resources if available, not overloading the system 10 drives in parallel at ~120 MB/s Daily, weekly, monthly reports Over time to read all data in the archive Easy way to detect failures early

Documentation Working Instructions Problem handling procedures Regular Periodic Tasks Tape Operations, Vendors, Administrators How to: rise a vendor call, announce interventions Media Management Physical entering and removal into libraries Pool management (creation, deletion, etc.) Drives Working Instructions Installation, Configuration, Testing and Operations Libraries and Controllers Working Instructions Evaluation, Installation, Configuration and Operations Problem handling procedures Media, Drive, Library, Tape Server

Problem resolution automation Failures occur with: drives, media, libraries Goal: no involvement from our side (ideal case)! FIX the tape drive Vendor Engineer TEST the tape drive Tape drive I/O ERROR Remedy Problem Management workflow Tape Server Put the tape server back into production CLOSE the ticket Tape Operator

Challenges Reducing HSM mode -> going to Archive mode … Paradigm shift?

Media migration Media migration or “repacking” required for Recovery from faulty media Defragmentation (file deletions) Migration to higher-density media generations, and / or higher-density tape drives Costly but necessary operation in order to save library slots and new media spending Completed migration from 500GB to 1TB tapes 45,000 tapes – took around a year using 1.5 FTE and up to 40 tape drives running at ~50 MB/s ... but that was done during a period when LHC was not running ... ... but that was ~25 PB, next time it will be ~50 PB ... Improvements needed Identify network/disk server contention bottlenecks Move to 10 Gb/s – at least for the repack infrastructure Remove small file overhead

Small file write overhead Writing well understood and managed Postpone writing until enough to fill a (substantial fraction of a) tape High aggregate transfer rates System designed to split the write stream onto several tapes However, low per-drive performance Disk cache I/O and network contention Per-file based tape format – high overhead for writing small files -> “shoe-shining” Not fast enough to migrate all data in the archive on a reasonable timescale Improvements: Looking at bulk data transfers using “buffered” tape marks “buffered” means no sync and tape stop – increased throughput and reduced tape wear Buffered tape marks are part of SCSI standard since SCSI-2 and available on standard tape drives, however no support via Linux kernel Worked with Linux tape driver maintainer and have now a test driver version 1 synchronising TM already available in the upcoming release Drive performance in MB/s eth speed (MB/s) File size (MB)

Recalls at random – identification Random READ access on tape Current system is file based With experiment data sets containing 1000s of files, these are spread across many tapes Users asking for files not on disk cause (almost) random file recalls Many tapes get mounts but average number of files read is very low Few files per mount  drives busy for short time  up to 9K mounts/day Low effective transfer rates Until recently, it was complicated to trace tape mounts to users Now – twice/day report on what is going on: Short incident report to service managers Long report archived for future comparison More castor into efficient archive Move away from HSM mode to Archive mode Bring down number of mounts Optimise read access Using policies / restrictions Will increase realibility Move to archive mode from HSM … Disk buffer sizes are correct – life data kept on disk … if not, impact of the tape infrastructure … STAGER USER:GROUP #MOUNTS #TAPES RATIO Avg#FILES AvgFILESIZE(MB) PUBLIC vkolosa:vy 2167 66 32.8 1.3 1047 ALICE aliprod:z2 834 85 9.8 1.7 4305 LHCb sagidova:z5 285 5 57 1.4 160 CMS mplaner:zh 119 2 59.5 1 202

Recalls at random – actions Follow-up with VO's and users in case of irregular or inefficient tape usage Potentially never ending activity Re-initiated SW developments for better controlling read efficiency via policies Grouping of requests Ceilings for concurrent tape usage Investigate larger disk caches to reduce load on tape; move to tape for archive only, not per-user / per-file HSM Mismatch between disk pool size and actual activity can cause high load on the tape infrastructure affecting everybody Review periodically disk pool setup with experiments Increase tape storage granularity from files to data sets, or co-location More castor into efficient archive Move away from HSM mode to Archive mode Bring down number of mounts Optimise read access Using policies / restrictions Will increase realibility Move to archive mode from HSM … Disk buffer sizes are correct – life data kept on disk … if not, impact of the tape infrastructure …

Outlook No replacement for tape on the horizon for large long term archives Neither at CERN nor outside We expect to grow at ~20 PB/year with LHC running Around ~10 PB in 2012 when LHC stopped Need to shift from HSM model towards archive model Use tape for bulk transfers only, not for random file access Investigate alternatives (such as GPFS/TSM) “House keeping” traffic in an archive is proportional to its size and will exceed the LHC generated traffic at some point Actively looking at potential alternatives Such as GPFS/TSM

Conclusion The Tape Service is well understood and managed It is successfully coping with the LHC data Change management system Test changes before deploying widely Detailed monitoring Provides information from various viewpoints Good data simplifies incident follow-up Problem handling procedures Minimize load on the service managers whenever possible Plan exist for upgrades and expansions