Advances in Bit Preservation (since DPHEP’2015) 3/2/2016 DPHEP / WLCG Workshop1 Germán Cancio IT Storage Group CERN DPHEP / WLCG Workshop Lisbon, 3/2/2016
Outline Advances at CERN since last DPHEP WS Environmental sensor Logical Block Protection LEP data on EOS Outlook for /2/2016 DPHEP / WLCG Workshop
Environmental sensor (aka ”dust sensor”) 3 3/2/2016 DPHEP / WLCG Workshop
Environmental sensor (aka ”dust sensor”) 4 3/2/2016 DPHEP / WLCG Workshop Sensors in full production at CERN Dust Temperature Relative humidity Specs available via ohwr.orgohwr.org HW design & Arduino board schematics Rpi software Puppet templates Can be integrated in tape libraries or used stand-alone Presented at Oracle LTUG Interest from other sites and vendor
Environmental sensor (aka ”dust sensor”) 5 3/2/2016 DPHEP / WLCG Workshop
Logical Block Protection 6 3/2/2016 DPHEP / WLCG Workshop
Available in latest CASTOR release, deployed at CERN Support for IBM and Oracle enterprise drives (using crc32c) (small changes required to make it work on LTO as well) Blocks checksummed and verified during both read and write operations Low overhead – max 5% for writing, zero for reading Next step: stand-alone tape verification without sending any data off the drive, working at full streaming speed 7 3/2/2016 DPHEP / WLCG Workshop Logical Block Protection
Other improvements Extended tape verification Light mode – after every write mount, verify critical tape areas (BOT, EOT, random sample in the middle) Exploit low-level tape system information Transient/internal drive read/write/mount stats at SCSI level; library low-level logs Assess the state of the drive and forecast a potential failure before it actually happens Differences between Oracle and IBM – needs homogenization 8 3/2/2016 DPHEP / WLCG Workshop T05:46: :00 tpsrv220 tapeserverd[3335]: LVL=Info TID=3350 MSG="Logging volume statistics" firmwareVersion="460E" lifetimeBOTPasses="1486" lifetimeMOTPasses="1556" lifetimeVolumeMounts="202" lifetimeVolumeRecoveredReadErrors="167" lifetimeVolumeRecoveredWriteErrors="30" lifetimeVolumeUnrecoveredReadErrors="4" lifetimeVolumeUnrecoveredWriteErrors="2" validity="1" volumeManufacturingDate=" " T05:46: :00 tpsrv220 tapeserverd[3335]: LVL=Info TID=3350 MSG="Logging volume statistics" firmwareVersion="460E" lifetimeBOTPasses="1486" lifetimeMOTPasses="1556" lifetimeVolumeMounts="202" lifetimeVolumeRecoveredReadErrors="167" lifetimeVolumeRecoveredWriteErrors="30" lifetimeVolumeUnrecoveredReadErrors="4" lifetimeVolumeUnrecoveredWriteErrors="2" validity="1" volumeManufacturingDate=" " Not good.. Really bad!
EOS 9 3/2/2016 DPHEP / WLCG Workshop timeline to be defined in 2016
Outlook for Expect +45PB/year in LS2 ( ) ideal moment in time for next repack (media replacement) ~260PB to repack, compared to 85PB in 2014/15 However, new drive generation in ~2017 may allow media reuse … and significant $aving$ Another 160PB to move! After repack is before repack! 10 3/2/2016 DPHEP / WLCG Workshop