Enrico Fattibene CDG – CNAF 18/09/2017

Slides:



Advertisements
Similar presentations
HP Virtual Tape Library (VTL) Appliance Powered by IPStor Ross Parker – Sales Director, Northern Europe.
Advertisements

MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Welcome To. Improving Remote File Transfer Speeds By The Solution For: %
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
DMF Configuration for JCU HPC Dr. Wayne Mallett Systems Manager James Cook University.
Purpose Intended Audience and Presenter Contents Proposed Presentation Length Intended audience is all distributor partners and VARs Content may be customized.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
Implementation of Travel Time in the CHART ATMS ITS America Annual Conference May 3, 2010 Scott Dalrymple.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
SRM Monitoring 12 th April 2007 Mirco Ciriello INFN-Pisa.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Hosted by Creating RFPs for Tape Libraries Dianne McAdam Senior Analyst and Partner Data Mobility Group.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Automated File Server Disk Quota Management May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department Sandia is.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR status and development HEPiX Spring 2011, 4 th May.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
Working Items and plan Hideki Miyake (KEK) June 30 th, 2014 KISTI, Korea.
CASTOR at RAL in 2016 Rob Appleyard. Contents Current Status Staffing Upgrade plans Questions Conclusion.
Chapter 20: Multimedia Systems
Chapter 2: Computer-System Structures
Lecture 1: Operating System Services
Server Upgrade HA/DR Integration
Chapter 20: Multimedia Systems
Robotics and Tape Drives
Klara Nahrstedt Spring 2009
Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
The Unbearable Slowness of Tape
Experiences and Outlook Data Preservation and Long Term Analysis
Update on Plan for KISTI-GSDC
Proposal for obtaining installed capacity
WLCG Management Board, 16th July 2013
Bernd Panzer-Steindel, CERN/IT
Chapter 3: Processes.
Operating System I/O System Monday, August 11, 2008.
Ákos Frohner EGEE'08 September 2008
Bernd Panzer-Steindel CERN/IT
Luca dell’Agnello Daniele Cesini GDB - 13/12/2017
Storage Virtualization
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Chapter 20: Multimedia Systems
Malwarebytes Problems Windows 10
ExaO: Software Defined Data Distribution for Exascale Sciences
OPTIONAL PRACTICAL TRAINING
Lecture 15 Reading: Bacon 7.6, 7.7
OPTIONAL PRACTICAL TRAINING
OPTIONAL PRACTICAL TRAINING
James Bellinger University of Wisconsin at Madison 28-June-2011
FTS Issue in Beijing Erming PEI 2010/06/18.
Overview Multimedia: The Role of WINS in the Network Infrastructure
Chapter 2: Computer-System Structures
High Performance Storage System
3 Week A: May 1 – 19 3 Week B: May 22 – June 9
Chapter 2: Computer-System Structures
Chapter 20: Multimedia Systems
Tests with VOLATILE disk
Presentation transcript:

Enrico Fattibene CDG – CNAF 18/09/2017 Update on tape system Enrico Fattibene CDG – CNAF 18/09/2017

Tape status Used 5354 (44.3 PB) Free 969 (7.7 PB) Repackable 1 PB (after recent CMS cancellation) Incoming 270 (2.3 PB) Usage rate: 50 tapes per week

HSM transfer rate Mean daily transfer rate for each server hitting 600 MB/s Each tape server capable of handling 800 MB/s simultaneously for in-bound and out-bound traffic Theoretical limit is defined by a FC8 connection to the Tape Area Network We do not reach 800 MB/s of mean daily transfer rate because of a bit of inefficiency: For migrations it is caused by the time spent in filesystem scan to select the candidates For recalls it depends on the position of data on tape We plan to move HSM services to new servers equipped by FC16 connection to the Tape Area Network Double transfer rate for each server

Future data transfer Data transfer rate to CNAF tapes will increase in the next years 160 PB by 2021 Experiments plan to use tapes as nearline disk Important increase of recall activity Need to optimize drive usage to reduce costs Planning to buy new library

Drive usage configuration Tape drives are shared among experiments Each experiment can use a maximum number of drives for recall or migration, statically defined in GEMSS In case of scheduled massive recall or migration activity these parameters are manually changed by administrators We noticed several cases of free drives that could be used by pending recall threads Considering the limit of 8 Gbit/s on FC connection of each HSM server We are working on a software solution to dynamically allocate additional drives to experiments and manage concurrent recall access

Drive usage efficiency Total number of recall threads pending and free drives June-July 2017 In several cases a subset of free drives could be used by recall threads Total duration (in days) of recall and migration processes June-July 2017 Stacked plot aggregated by day The total usage is never greater than 8 days (over a total of 17 drives)

Recent intense recall activity by CMS In the end of July-beginning of August we noticed increase in recall activity from CMS In attempt to help CMS with this activity we allocated more (x2) of tape drives from 3 (default) to 6 and than to 7 Since there was no communication from experiment regarding this activity we asked CMS contact for info Suggested to open a ticked After some investigations it turned out that this activity was “not intended”

Recent intense recall activity by CMS (cont.) In 28 days (from 30/7 to 26/8): bytes recalled: 1.11PB (of 12PB total) - 9.2 % tapes used in recall: 1103 (out of 1544 total) - 72 % files recalled: 400k number of mounts (tot): 9858 mount/day (average): 352 mount/tape (average): 9, (max): 17 number of tape drives used (avr): 6 (max): 7 Waiting time: 5.6 days (average of max/day) After all, we’ve got re-read ~10% of all CMS data 1 read error recovered after tape copy to new media 1 robot arm (out of 4) broken Took more than a week to repair

Lessons learned and questions Need to pay attention to un-announced but heavy activities Should we impose a protection mechanism against such events? Artificially delay recall processing? Limit number of tape drives? If we would do so, we can reduce number of mounts by increasing recall time Experiments should consider that recall time in this case may increase significantly From few minutes to 5-7 days Is it still convenient (use of tapes as slow disks as ATLAS asks)?