CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 6 th July 2009
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 2 Agenda Resources CASTOR status and performance Progress with new data centre project
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 3 Agenda Resources CASTOR status and performance Progress with new data centre project
CERN IT Department CH-1211 Genève 23 Switzerland t Procurements 2009 Status & 2010 outlook CPU & Disk – ~60% of foreseen 2009 pledges available in April –(Additional ATLAS request not included) – Balance to be operational in October Tight schedule, but agreed with Purchasing dept. Exploring options to purchase iSCSI disk storage –Greater cost/TB, but avoids interruption to CASTOR service due to disk server failure (#1 cause of incidents; disk failures are handled transparently) – 2010 procurement planning underway Tenders issued in June; adjudication in ~November. Tape – Expect ~20PB spare capacity by October. – Will purchase “high density” IBM robot in autumn 14,000 slots — 14PB – Can convert an existing IBM robot to “high density’ version in 2010 (with no service interruption) if additional capacity required. Tier0 Status - 4 February Status On Schedule Purchased
CERN IT Department CH-1211 Genève 23 Switzerland t Resource Usage Efficiency Now have 167 boxes dedicate to running VO specific functions. CPU utilisation is poor. Clear opportunity to reduce server count (and power consumption) through virtualisation. Consolidation project underway – Requires reliable storage for virtual machine images as we need to be able to support virtual machine migration. – Production service expected by end Scheduling of virtual machine images for batch demonstrated end-June. – Expect autumn hardware to be installed with hypervisors – More work needed to allow LCG-wide scheduling of virtual machines, however. Tier0 Status - 5 (VO Boxes)
CERN IT Department CH-1211 Genève 23 Switzerland t SLC5 Migration Migration of batch resources underway – All new capacity introduced will be SLC5 based – Existing capacity migrated progressively. Migration of LXPLUS alias is an issue: – Principle is easy: switch when majority of batch capacity is SLC5. But measured CERN: switch early on grid: switch late. – No clear/obvious solution yet. [Rapid migration of other grid sites would help. And is maybe sensible before September anyway?] Tier0 Status - 6 Still an issue To be discussed at GDB on Wednesday February Status
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 7 Agenda Resources CASTOR status and performance Progress with new data centre project
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 8 Agenda Resources CASTOR status and performance – CASTOR status & plans – Metrics Progress with new data centre project
CERN IT Department CH-1211 Genève 23 Switzerland t Status – Generally quiet/good... –... except for tape repack BUT we are reasonably confident about our ability to support production; user analysis is the concern and there is no major load. – CASTOR 2.1.8, with integrated xrootd redirector, should deliver improvements for analysis LSF bypass & reduced latency, but also improved scalability as xrootd daemon has smaller footprint than rfio (to be deprecated?) Also delivers –end-to-end checksumming for rfio –User space accounting (required for later deployment of quotas) –operational improvements (notably automatic draining of disk servers) –fixes to problems identified by repack (main reason for deployment delays) Schedule: end-Feb release, in production on c2cernt3 end-March, deployment for experiment instances in April. CASTOR Status & Plans Tier0 Status - 9 February Status CASTOR deployment delayed, but in production for STEP. Improved xrootd implementation being tested by ALICE Excellent performance for STEP Repack much improved, even if some concerns remain CASTOR readiness for Tier0/Tier1 production confirmed. Still lack experience supporting heavy analysis load. Need this experience to understand if/where improvements needed.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 10 Performance metrics Metrics have been implemented and deployed on preproduction cluster – Data collected in lemon – RRD graphs not yet implemented Production deployment delayed for several reasons – New metrics imply several changes to exception/alarms and automated actions used in production – An unexpected technical dependency on the late SRM 2.7 version Ongoing work to back-port the implementation All still true November Status All but two of the agreed performance metrics now available via Lemon Exceptions are SRM time to TURL, but needs SRM 2.8 for reasonable time- stamp granularity Migration rate which was available but is currently broken after system update.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status- 11 CASTORATLAS (courtesy Miguel Santos) Need to add plot with migration rate Currently missing, to be done by fixing iptables with next sensor. STEP’09 Adaptive migrations
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 12 CASTORATLAS (courtesy Miguel Santos) Average read open time of 4s (see disk cache read scores) Average write open time of 1.4s Peaks of 400 running transfers Peaks of 20 pending transfers Using ~22 tape drives ~5% of available transfer slots used. Only Tier0 function (t0atlas) exercised! SRM time to TURL
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 13 Agenda Resources CASTOR status and performance Progress with new data centre project
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 14 New data centre project Reminder: the selected strategy is to do a single tender for an overall solution Four phase process developed: 1.Request (many) conceptual designs 2.Commission 3-4 companies submitting conceptual designs to develop an outline design 3.In-house, turn a selected outline design into plans and documents enabling 4.Single tender for overall construction.
CERN IT Department CH-1211 Genève 23 Switzerland t Deadline: 28 th November – Contacts with all 4 companies during design phase – All 4 companies say deadline will be met Meetings to review proposed designs scheduled in week of December 8 th. Market Survey in preparation as first stage in selection of company for detailed design & construction. Discussions in Oslo on 28 th November to further investigate possible remote server installation in 2011 (and beyond) – RAL also have power available in 2011, but not as much and for a shorter period. Tier0 Status - 15 Outline Design Phase November Status
CERN IT Department CH-1211 Genève 23 Switzerland t Four designs reviewed – No clear winner, but consensus on leading design. New Management supports project. Good, but… – New requirements --- “Green” & Prévessin heat recovery option – New organisation brings new players to brief “Single Contract for construction” agreed Agreement to work with one company to deliver fully acceptable design with modifications for new requirements. – Will lead to ~6 month delay. – [Personal view] Plan to continue with only one company should be agreed by Directorate now to avoid potential hiccups later. Frédéric Hemmer discussing with Sergio Bertolucci. Will need to revisit option to install equipment at University of Oslo. Tier0 Status - 16 Current Status February Status
CERN IT Department CH-1211 Genève 23 Switzerland t Current Status “New organisation brings new players to brief” – Wolfgang von Rüden asked to review past assumptions (e.g. no power available on Meyrin site); reported early June. – Meeting with Sergio Bertolucci organised for July 20 th. New interest from Norway to provide centre for CERN near Stavanger; design and costs to be available by end-August. Project delayed by 6-9 months. However, latest power projections extend usable lifetime of B513 by 1 year. Tier0 Status - 17
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 18 Questions? Comments?