Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tier-1 Andrew Sansum Deployment Board 12 July 2007.

Similar presentations


Presentation on theme: "Tier-1 Andrew Sansum Deployment Board 12 July 2007."— Presentation transcript:

1 Tier-1 Andrew Sansum Deployment Board 12 July 2007

2 Agenda Monitoring Deployment tools Other stuff

3 Staff Changes Lex Holt (Fabric Team) left in June.

4 Network CERN Lightpath –10Gb line to CERN working well but recently suffered a 2 day break. SuperJanet 5 –10Gb link to site –10Gb LAN on Tir-1 –Share of 2Gb through firewall –Work underway for bypass for SRM/SE traffic

5 Hardware Hardware operating well – very stable EU Tenders for: –>1PB disk –>2MSI2K –1PB tape (framework purchasing) –Tape drives (just beginning to start) Underway and expected to deliver in Q4. Specification of technology required is very general and we are waiting to see solutions

6 RAL Site 5510 5530 4 x 5530 Router A OPN Router 3 x 5510 + 5530 6 x 5510 + 5530 ADS Caches CPUs + Disks CPUs + Disks CPUs + Disks CPUs + Disks 10Gb/s to CERN N x 1Gb/s 10Gb/s 5 x 5510 + 5530 2 x 5510 + 5530 RAL Tier 2 Tier 1 Oracle systems 1Gb/s to SJ4 Tier-1 LAN

7 CASTOR 2.1.2 and previous releases of CASTOR: –Implemented as a shared single instance – very unreliable with missing functionality –Unable to cope with various use cases –Essentially unusable How to make things better –Improve relationship with CERN get product improvements –1 extra contractor 2.1.3 release now deployed: –Instance planned for ATLAS/CMS/LHCB/Others –Stable –Being load tested by CMS –Promising

8 dCache Still running version 1.7 –Reliability reasonable Phase out had been planned for June/July but CASTOR not sufficiently advanced –Now plan to continue running dCache at least until Christmas –Will give six months warning of closure

9 New Machine Room Tender underway, planned completion: August 2008 800M**2 can accommodate 300 racks + 5 robots 2.3MW Power/Cooling capacity (some UPS) Office accommodation for all E-Science staff Combined Heat and Power Generation (CHP) on site Not all for GRIDPP (but you get most)!

10 Reliability (Recent issues) RB –Continue to see: Load related issues Database size issues (need frequent cleaning) –Now running: rb01/rb02 as general RB service rb03 dedicated to Alice and LHCB –Will add more if necessary but wish to minimise work on RB and wait for WMS Top level BDII –3 servers (March) resolved timouts for a while but recurred recently –Recent upgrade to indexing version appears to have helped CE –Experienced unidentified load problem at start of June no recurrence

11 SL4 SL4 test service is available with a dedicated CE and a few worker nodes Expect to run both SL3 and SL4 concurrently and gradually migrate between the two. –Migration will take place as fast as experiments want –Capacity will initially be moved at experiment’s request. – Once ATLAS/LHCB and CMS are migrated we will announce a termination date of SL3 service

12 Grid Only Long standing milestone that Tier-1 was to offer a “Grid Only” service by the end of August 2007. Recent discussion within UB concluded that the absence of a reliable CASTOR prevented Tier-1 offering a Grid only service PMB has subsequently said that we should nevertheless move what we can to a Grid only service. (grid only job submission for example). Position statement needs to be submitted to PMB outlining what can be achieved.


Download ppt "Tier-1 Andrew Sansum Deployment Board 12 July 2007."

Similar presentations


Ads by Google