CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN.

Slides:



Advertisements
Similar presentations
Chapter 20 Oracle Secure Backup.
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS TSM CERN Daniele Francesco Kruse CERN IT/DSS.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Block devices and Linux Linux has a generic block device layer with which all filesystems will interact. SCSI is no different in this regard – it registers.
Hugo HEPiX Fall 2005 Testing High Performance Tape Drives HEPiX FALL 2005 Data Services Section.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
1 Network File System. 2 Network Services A Linux system starts some services at boot time and allow other services to be started up when necessary. These.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,
Business Unit or Product Name © 2007 IBM Corporation Introduction of Autotest Qing Lin.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR Operational experiences HEPiX Taiwan Oct Miguel Coelho dos Santos.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Summary of CASTOR incident, April 2010 Germán Cancio Leader,
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
CERN IT Department CH-1211 Genève 23 Switzerland t Windows Desktop Applications Life-cycle Management Sebastien Dellabella, Rafal Otto Internet.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Metalink for Tier 1 Miguel Anjo Database mini workshop 26.January.2007.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
Write-through Cache System Policies discussion and A introduction to the system.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
CERN IT Department CH-1211 Genève 23 Switzerland t Castor development status Alberto Pace LCG-LHCC Referees Meeting, May 5 th, 2008 DRAFT.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Tape Monitoring Vladimír Bahyl IT DSS TAB Storage Analytics.
CERN - IT Department CH-1211 Genève 23 Switzerland t CASTOR Status March 19 th 2007 CASTOR dev+ops teams Presented by Germán Cancio.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Update on Windows 7 at CERN & Remote Desktop.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CERN - IT Department CH-1211 Genève 23 Switzerland Tier-0 CCRC’08 May Post-Mortem Miguel Santos Ricardo Silva IT-FIO-FS.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
CERN IT Department CH-1211 Genève 23 Switzerland PES 1 Ermis service for DNS Load Balancer configuration HEPiX Fall 2014 Aris Angelogiannopoulos,
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Castor incident (and follow up) Alberto Pace.
CERN IT Department CH-1211 Genève 23 Switzerland t DBA Experience in a multiple RAC environment DM Technical Meeting, Feb 2008 Miguel Anjo.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS New tape server software Status and plans CASTOR face-to-face.
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t ALICE XROOTD news New xrootd bundle release Fixes and caveats A few nice-to-know-better.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
CERN - IT Department CH-1211 Genève 23 Switzerland Operations procedures CERN Site Report Grid operations workshop Stockholm 13 June 2007.
CERN - IT Department CH-1211 Genève 23 Switzerland CCRC Tape Metrics Tier-0 Tim Bell January 2008.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
Developments for tape CERN IT Department CH-1211 Genève 23 Switzerland t DSS Developments for tape CASTOR workshop 2012 Author: Steven Murray.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
CERN IT Department CH-1211 Genève 23 Switzerland t Increasing Tape Efficiency Original slides from HEPiX Fall 2008 Taipei RAL f2f meeting,
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
Tape write efficiency improvements in CASTOR Department CERN IT CERN IT Department CH-1211 Genève 23 Switzerland DSS Data Storage.
Ch. 31 Q and A IS 333 Spring 2016 Victor Norman. SNMP, MIBs, and ASN.1 SNMP defines the protocol used to send requests and get responses. MIBs are like.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
CERN - IT Department CH-1211 Genève 23 Switzerland CERN Tape Status Tape Operations Team IT/FIO CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR Overview.
Tape Drive Testing IBM 3592.
Tape Drive Testing.
Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
The Unbearable Slowness of Tape
Luca dell’Agnello INFN-CNAF
Artem Trunov and EKP team EPK – Uni Karlsruhe
PES Lessons learned from large scale LSF scalability tests
Pierre-Emmanuel Brinette
CASTOR: CERN’s data management system
Presentation transcript:

CERN - IT Department CH-1211 Genève 23 Switzerland Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN

CERN - IT Department CH-1211 Genève 23 Switzerland Agenda Progress on issues (since the last meeting) Current equipment and challenges Development changes Operational changes Conclusion 2

CERN - IT Department CH-1211 Genève 23 Switzerland Progress on issues NI_FAILURE –Problem still present –Simple procedure exist = no need to reinstall tplabel command –By default, existing labels are not overwritten –– f option introduced to force relabelling Cmonitd –No longer used at CERN 3

CERN - IT Department CH-1211 Genève 23 Switzerland Equipment today 25 PB total (around 50% free) IBM –2 libraries –~ slots; 700 GB each –60 TS1120 drives Sun –4 libraries –~ slots; 500 GB each –60 T10000A drives 4

CERN - IT Department CH-1211 Genève 23 Switzerland Equipment near future Tape space sufficient for 2008 –Unbalanced New drives –IBM TS1130: ~160 MB/s, 1 TB cartridges –Sun T10000B: ~130 MB/s, 1 TB cartridges IBM High density frame 5

CERN - IT Department CH-1211 Genève 23 Switzerland Challenges Atlas write low rate partially caused by additional mounts due to a CASTOR policy bug Alice rate affected by small files from users writing to default pool 6

CERN - IT Department CH-1211 Genève 23 Switzerland Development 1/3 Patch free kernel version ( ) –Goal: by SLC5 do not use any CASTOR specific kernel patches –All necessary settings moved to CASTOR tape layer –New SCSI tape driver options introduced: TAPE ST_ASYNC_WRITES 0 TAPE ST_BUFFER_WRITES 0 TAPE ST_LONG_TIMEOUT 3600 TAPE ST_READ_AHEAD 0 TAPE ST_TIMEOUT 900 –Testing on few machines already on SLC4 7

CERN - IT Department CH-1211 Genève 23 Switzerland Development 2/3 Library failure handling ( ) –Now possible to overcome short temporary failures of Sun libraries –Options introduced: TAPE ACS_MOUNT_LIBRARY_FAILURE_HANDLING retry TAPE ACS_UNMOUNT_LIBRARY_FAILURE_HANDLING retry Use non-labeled tapes ( ) –By default, we use AUL ( ) tape labels –NL tapes are now also supported 8 American National Standard label and American National Standard user label

CERN - IT Department CH-1211 Genève 23 Switzerland Development 3/3 Option to log to SysLog ( ) –See the talk of Giuseppe Lo Re –Can log to DLF since the last meeting –SysLog now also supported Uses local0 and local1 facilities –Options needed: TAPE TPLOGGER SYSLOG local0.*;local1.* /var/log/castor-tape.log –Log example: Jun 6 15:52:23 tpsrv623 rtcpd[16828]: "TYPE"="RT044 – Request statistics", "FUNC"="rtcpd_FreeResources", "MESSAGE"="Request statistics", "REQUESTTYPE"="READ", "VID"="T07106", "MOUNTTIME"="163", "SERVICETIME"="209", "WAITTIME"="164“, "TRANSFERTIME"="7", "POSITIONTIME"="36", "DATAVOLUMEMB"=" ", "DATARATEMBS"=" ", "FILES"="1", "DGN"="T10KR1", "VOLREQID"="77219", "CLIENTNAME"="stage”, "CLIENTUID"="14029", "CLIENTGID"="1474", "CLIENTHOST"="c2publicsrv102.cern.ch", "TPVID"="T07106", "REQUESTSTATE"="successful“ 9

CERN - IT Department CH-1211 Genève 23 Switzerland Operational changes 1/2 RTCPD self monitor enabled –RTCP daemon sometimes gets stuck –Self monitor terminates the job and does proper cleanup RTCOPYD SELF_MONITOR YES RTCOPYD MOUNT_TIME 900 SNMP traps handling –IBM libraries send SNMP traps directly Volser CLN168JA, A Enterprise Tape cleaning cartridge has expired. –ACSLS sends traps on behalf of Sun libraries ACSLS info Lsm 0,7 number of drives changed from 6 to 7. Lsm will be updated. –LEMON creates alarms 10

CERN - IT Department CH-1211 Genève 23 Switzerland Operational changes 2/2 TSMOD (Tape Service Manager on Duty) –Receives daily report TD01E | Drive Down Without Reason | DN 3592B2 DOWN (No_dedication) None TD03E | Job running for too long | DA 994BR0 RUNNING (No_dedication) P17080 P17080 R TQ01E | DGN Queue Wait Time Long | Average queue wait time in T10KR1 is seconds TQ02E | Queue Request Too Old | Q T10KR1 T13388 R –Follows procedures according to the error code –Handles most other common issues E.g. contacting vendors for problems –Weekly rotation 11

CERN - IT Department CH-1211 Genève 23 Switzerland Conclusion Tape capacity sufficient for 2008 New tape related CASTOR features are constantly being put into production We are trying to simplify our setup and automate the problem handling 12