UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010
Seven(-teen) Sisters
SouthGrid August UK Tier 2 reported CPU – Historical View to present
SouthGrid August SouthGrid Sites Accounting as reported by APEL Sites Upgrading to SL5 and recalibration of published SI2K values
SouthGrid August Site Resources HEPSPEC06 CPU (kSI2K) converted from HEPSPEC06 benchmarksStorage (TB) Site EDFA-JET Birmingham Bristol Cambridge Oxford RALPPD Totals
Gridpp3 h/w generated MoU for 2010,11, TB2011 TB2012 TB bham bris cam ox RALPPD HS HS HS06 bham14502, bris6611, cam11481, ox20342, RALPPD
SouthGrid August JET Stable operation, (SL5 WNs) Could handle more opportunistic LHC work 1772HS06 1.5TB
SouthGrid August Birmingham Just purchased 40TB Storage –total storage to 10TB + 6*20 + 2*40 = 210 TB in a week or two Two new 64 bit servers –(SL5) Site BDII + monitoring VMs –(SL5) DPM head node Everything (except mon) is SL5 Both clusters have dual lcg- CE/CreamCE front ends Sluggish response/instabilities with GPFS on Shared Cluster –Installed 4TB NFS mounted file server for experiment software/middleware/user areas Taken on someone else's proprietary (non SL5) smart phone. He couldn't get signal in there either.
SouthGrid August Birmingham
Bristol LCG StoRM SE with gpfs, 102TB 90% full of CMS data StoRM developers are finishing testing on SL5 64bit, plan to provide both for slc4 ia32 and sl5 x86_64 to Early Adopters this month (August). Bristol is waiting for stable well-tested StoRM v1.5 SL5 64-bit release. In the meantime Bristol's StoRM v1.3 (32-bit on SL4) working very well! On 1Gbps network, getting good bandwidth utilization Servers (StoRM & gridftp) very responsive despite load:
Prior WN: Intel XEON 2.0GHz; Dec2009 new WN: AMD 2.4GHz each AMD WN = 2 x 1TB drive, part of 1 disk = WN space Dr Metson experimenting with HDFS using rest of 1 disk + 2 nd disk, working with INFN on possibility of StoRM on top of HDFS Also experimenting with using Hadoop to process CMS data In Other News... Swingeing IT staff cuts being planned at U Bristol (and downgrades for those few remaining) Started planning that SouthGrid will take over Bristol LCG Site Admin from April 2011 Consolidate & reduce PP servers so Astro admin can inherit PP Staff will best-effort support Bristol AFS server (IS won't) HDFS with StoRM
SouthGrid August Bristol Plan to try to run the ces and other control nodes on Virtual machines using an identical setup to Oxford, to enable remote management. The StoRM SE on GPFS will be run by Bob Cregan on site.
SouthGrid August Cambridge 32 cores CPU installed April 2010: bought from GridPP3 tranche 2. Server to host several virtual machines (BDII, Mon, etc.) just delivered. Network upgraded last November to provide gigabit ethernet to all GRID systems. Storage is still 140TB; CPU will be increased due to the purchase in the first point. Atlas production is the main VO running on this site. Investigating current under utilisation, possible Accounting issues?
SouthGrid August RALPP We believe we are now through all the messing about with air conditioning, with our machine room now running on the refurbished/upgraded AC plant. Happy days, all except for the leaks shortly after they turned it on! We've been running well below nominal capacity for most of this year, but are pretty much back now. Joining with the Tier 1 for the tender process. Testing argus and glexec RGMA and site BDII now moved to SL5 VMs Working on setting up a test instance of dCache, working with the Tier 1, using Tier 2 hardware.
SouthGrid August Oxford Last 6 months cluster running with very high utilisation. Completed the tender for new kit and placed orders in July. Unfortunately the orders had to be cancelled due to manufacturing delays on the particular motherboard we ordered and a pricing problem. Now re-evaluating all suppliers with updated quotes. New Argus server installed. (Report by Kashif) –Installing Argus was easy and configuring was also OK once I understood the basic concept of policies but it took me a considerable time because of a bug in Argus which is partly due to old style of host certificate issued by UK CA. The same issue was responsible for gridpp voms server problem. I have reported this to UK CA. –Argus uses glexec on the WN, it is being tested the glexec installed on t2wn41. –Details on gridpp wiki Oxford has become an early adopter for CREAM and ARGUS.
SouthGrid August Grid Cluster setup CREAM ce & pilot setup t2ce02 CREAM Glite 3.2 SL5 T2wn41 glexec enabled t2argus02 t2ce06 CREAM Glite 3.2 SL5 T2wn Oxford
SouthGrid August gridppnagios Oxford runs the UKI Regional Nagios monitoring site. The Operations dashboard takes information from this. idServiceMonitoringInfo idServiceMonitoringInfo
Oxford Dashboard SouthGrid August Thanks to Glasgow for the idea / code
Oxfords Atlas dashboard SouthGrid August
SouthGrid August Conclusions SouthGrid sites utilisation generally improving Many had recent upgrades for hardware using Gridpp3 second tranche, others putting out tenders, some delays following issues with vendor at Oxford RALPPD back to full strength following AC upgrade Monitoring for production running improving Concerns over reduced manpower at sites as we move into GridPP 4
Future Meetings Look forward to GridPP 26 in Sheffield next April If you look in the right places the views are as good as here in the lakes.