Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN Site Report Giuseppe Lo Presti

Similar presentations


Presentation on theme: "CERN Site Report Giuseppe Lo Presti"— Presentation transcript:

1 CERN Site Report Giuseppe Lo Presti
on behalf of the CASTOR Disk+Tape Ops team CASTOR Face-to-face workshop, 2/11/2015

2 Outline CASTOR 2.1.15 Deployment Main activities in 2015
TapeOps: Repack et al. DiskOps: hardware migration (and puppet consolidation) Monitoring {in|e}volution Start of LHC The ALICE use case

3 Staff Massimo (Section Leader) Giuseppe (Service Manager, support+dev)
Jan (“Co-Service Manager”) Sebastien (development, support) Luca, Xavi, Herve (2nd level support, procedures) Belinda, Alessandro, Jesus (2nd level support, procedures)

4 CASTOR 2.1.15 First production version: January 2015
Lots of bug fixing since then Most critical concerning xrootd & third-party copy Details in Sebastien's talk Lots of internal releases, mostly to test the new tapeserverd daemon Details in Eric's talk Today: Not yet deployed everywhere Typical practice: LHC instances are upgraded during the LHC Technical Stop week, the Public instance on the accelerators' MD day.

5 Big Repack completed Challenge: ~85 PB of data
2013: ~ tapes 2015: ~ tapes Verify all data after write 3x (255PB!) pumped through the infrastructure (read->write->read) Liberate library slots for new cartridges Decommission ~ obsolete tape cartridges Constraints: Be transparent for user/experiment activities Preserve temporal collocation Finish before LHC run 2 start

6 More Tape Operations Purchased, validated and deployed 46xTS1150 drives New JD media (10TB), JC media 7TB But.. still dealing with teething problems, ~25% of drives being replaced, close collaboration with IBM Tucson Introduced “light” tape verification Executed every time a tape is mounted for write (first + last + random sample of segments) – do not wait until tape gets full

7 Disk Ops: hardware migration
Some 100s of disk servers to be retired across all disk pools Affecting mainly PUBLIC and ALICE All VMs to be dismantled and recreated in the new OpenStack infrastructure Affecting all the dev/certification instances All production head nodes to be replaced = Exercised migration procedures a lot Not without pain...

8 Disk Ops: hardware migration
On the 'brighter' side, the puppet manifests have evolved included all that was left behind on the way from Quattor ...to the best of our knowledge… Once puppet works, it is ~OK Minimal modifications needed for ordinary operations like upgrade instance X to version Y However, the tools chain overhead requires more work compared with Quattor to really get a change through Typical case: make sure a cluster of 10s-100s of nodes all run puppet and applied a change Concept of 'translucent' upgrades ...

9 More Disk Operations As part of the HW migration: Software RAID60 widely adopted for the new diskservers New detector for nTOF, producing more data New dedicated disk pool in the Public instance But tons of tiny files… Retired all D1T0 pools from CASTOR Last ones to go: LHCB/lhcbuser, ATLAS/atlcal LEP data being copied to EOS for filesystem- based access After having broken part of the legacy libshift API used with Fortran

10 SRM SRM 2.11 (still) in production and puppetized
SRM 2.14 (still) in pre-production The SRM probe is actually able to crash the front-end daemon from time to time Didn't invest as much resources as needed to get SRM 2.14 to production Thanks to Shaun for his independent tests Where do we go now?

11 Monitoring {in|e}volution
Lots of different tools/technologies ... Trying to evolve and consolidate SLS is gone as of today (!) Kibana for high-level I/O monitoring Not suitable for custom plots, rather slow Graphana for all operations Easy and fast for all kind of plots LogViewer w/ Hbase for daily CASTOR operations/analysis HDFS as a long-term log store Data mining, analytics

12 Monitoring examples

13 LHC Restart First beam: June 2015 Run-2 data influx increasing
Current total 121PB 30PB of new written data since 1/ PB in last 30 days

14 LHC Restart Experiments were given two options for the Tier0 data flows The 'traditional' model The 'EOS' model (ALICE, LHCb, non-LHC) (ATLAS and CMS)

15 The ALICE use case ALICE chose the 'traditional' model
One pool in CASTOR for DAQ + first reprocessing + T1 export, one pool in EOS for all user analysis But the CASTOR disk pool was under- dimensioned for the activity ALICE has ~25K slots in LxBatch, CASTOR had ~20 120TB disk servers – the rest of the capacity being in EOS Xrootd stalling does not match CASTOR queuing First solution: go to infinite slots Hoping that DAQ and Offline don't come at the same time and kill the disks ...Did I say hoping?

16 The ALICE use case cont'd
Second solution: propose to move part of the activity to EOS But: automatic copy from CASTOR to EOS missing Risk of putting EOS in the DAQ critical path EOSALICE has 200M+ files, hitting namespace operational limits Third solution: back to Run1 Double pool in CASTOR with automatic replication (cf. ATLAS/CMS Run1 model) Put on hold some hardware retirements DAQ pool with ~40 10Gbps disk servers Default pool with ~120 1Gbps old disk servers

17 The ALICE use case cont'd
Lessons learnt, my take: Expts need automatic data placement, only ATLAS and CMS can afford (and prefer) to place the data themselves We can afford to remove the 'scheduling' only when the load is spread enough across a large pool of disk servers. A bunch of fatty disk servers must be managed. RAID60 goes well with streaming, not with lots of small I/O activity

18 That's all (More) Questions?


Download ppt "CERN Site Report Giuseppe Lo Presti"

Similar presentations


Ads by Google