Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN.

Similar presentations


Presentation on theme: "CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN."— Presentation transcript:

1 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN

2 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Agenda Progress on issues (since the last meeting) Current equipment and challenges Development changes Operational changes Conclusion 2

3 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Progress on issues NI_FAILURE –Problem still present –Simple procedure exist = no need to reinstall tplabel command –By default, existing labels are not overwritten –– f option introduced to force relabelling Cmonitd –No longer used at CERN 3

4 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Equipment today 25 PB total (around 50% free) IBM –2 libraries –~12 000 slots; 700 GB each –60 TS1120 drives Sun –4 libraries –~ 36 000 slots; 500 GB each –60 T10000A drives 4

5 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Equipment near future Tape space sufficient for 2008 –Unbalanced New drives –IBM TS1130: ~160 MB/s, 1 TB cartridges –Sun T10000B: ~130 MB/s, 1 TB cartridges IBM High density frame 5

6 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Challenges Atlas write low rate partially caused by additional mounts due to a CASTOR policy bug Alice rate affected by small files from users writing to default pool 6

7 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Development 1/3 Patch free kernel version (2.1.6-8) –Goal: by SLC5 do not use any CASTOR specific kernel patches –All necessary settings moved to CASTOR tape layer –New SCSI tape driver options introduced: TAPE ST_ASYNC_WRITES 0 TAPE ST_BUFFER_WRITES 0 TAPE ST_LONG_TIMEOUT 3600 TAPE ST_READ_AHEAD 0 TAPE ST_TIMEOUT 900 –Testing on few machines already on SLC4 7

8 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Development 2/3 Library failure handling (2.1.7-3) –Now possible to overcome short temporary failures of Sun libraries –Options introduced: TAPE ACS_MOUNT_LIBRARY_FAILURE_HANDLING retry 3 300 TAPE ACS_UNMOUNT_LIBRARY_FAILURE_HANDLING retry 3 300 Use non-labeled tapes (2.1.7-3) –By default, we use AUL ( ) tape labels –NL tapes are now also supported 8 American National Standard label and American National Standard user label

9 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Development 3/3 Option to log to SysLog (2.1.7-4) –See the talk of Giuseppe Lo Re –Can log to DLF since the last meeting –SysLog now also supported Uses local0 and local1 facilities –Options needed: TAPE TPLOGGER SYSLOG local0.info;local1.info @castortapelog local0.*;local1.* /var/log/castor-tape.log –Log example: Jun 6 15:52:23 tpsrv623 rtcpd[16828]: "TYPE"="RT044 – Request statistics", "FUNC"="rtcpd_FreeResources", "MESSAGE"="Request statistics", "REQUESTTYPE"="READ", "VID"="T07106", "MOUNTTIME"="163", "SERVICETIME"="209", "WAITTIME"="164“, "TRANSFERTIME"="7", "POSITIONTIME"="36", "DATAVOLUMEMB"="115.570068", "DATARATEMBS"="16.510010", "FILES"="1", "DGN"="T10KR1", "VOLREQID"="77219", "CLIENTNAME"="stage”, "CLIENTUID"="14029", "CLIENTGID"="1474", "CLIENTHOST"="c2publicsrv102.cern.ch", "TPVID"="T07106", "REQUESTSTATE"="successful“ 9

10 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Operational changes 1/2 RTCPD self monitor enabled –RTCP daemon sometimes gets stuck –Self monitor terminates the job and does proper cleanup RTCOPYD SELF_MONITOR YES RTCOPYD MOUNT_TIME 900 SNMP traps handling –IBM libraries send SNMP traps directly Volser CLN168JA, A Enterprise Tape cleaning cartridge has expired. –ACSLS sends traps on behalf of Sun libraries ACSLS info Lsm 0,7 number of drives changed from 6 to 7. Lsm will be updated. –LEMON creates alarms 10

11 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Operational changes 2/2 TSMOD (Tape Service Manager on Duty) –Receives daily report TD01E | Drive Down Without Reason | DN 3592B2 35922005@tpsrv135 DOWN 20530 (No_dedication) None TD03E | Job running for too long | DA 994BR0 994B0618@tpsrv635 RUNNING 27769 (No_dedication) P17080 P17080 R 30726 (stage,st)@lxmrrk2707.cern.ch TQ01E | DGN Queue Wait Time Long | Average queue wait time in T10KR1 is 14729 seconds TQ02E | Queue Request Too Old | Q T10KR1 T13388 R 143229 (stage,st)@c2cmssrv102.cern.ch 37990 –Follows procedures according to the error code –Handles most other common issues E.g. contacting vendors for problems –Weekly rotation 11

12 CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Conclusion Tape capacity sufficient for 2008 New tape related CASTOR features are constantly being put into production We are trying to simplify our setup and automate the problem handling 12


Download ppt "CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN."

Similar presentations


Ads by Google