UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator GridPP 24 - RHUL 15 th April 2010.

Slides:



Advertisements
Similar presentations
London Tier2 Status O.van der Aa. Slide 2 LT 2 21/03/2007 London Tier2 Status Current Resource Status 7 GOC Sites using sge, pbs, pbspro –UCL: Central,
Advertisements

Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.
UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010.
SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.
UKI-SouthGrid Overview GridPP27 Pete Gronbech SouthGrid Technical Coordinator CERN September 2011.
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
UKI-SouthGrid Overview GridPP30 Pete Gronbech SouthGrid Technical Coordinator and GridPP Project Manager Glasgow - March 2012.
Northgrid Status Alessandra Forti Gridpp25 Ambleside 25 August 2010.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
LT 2 London Tier2 Status Olivier van der Aa LT2 Team M. Aggarwal, D. Colling, A. Fage, S. George, K. Georgiou, W. Hay, P. Kyberd, A. Martin, G. Mazza,
SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
SL6 Status at Oxford. Status  SL6 EMI-3 CREAMCE  SL6 EMI3 WN and gLExec  Small test cluster with three WN’s  Configured using Puppet and Cobbler 
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Southgrid Technical Meeting Pete Gronbech: 24 th October 2006 Cambridge.
Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
14th October 2010Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and SouthGrid Technical Co-ordinator.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
HEPSYSMAN May 2007 Oxford & SouthGrid Computing Status (Ian McArthur), Pete Gronbech May 2007 Physics IT Services PP Computing.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Oxford & SouthGrid Update HEPiX Pete Gronbech GridPP Project Manager October 2015.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
BaBar Cluster Had been unstable mainly because of failing disks Very few (
SouthGrid Status Pete Gronbech: 31st March 2010 Technical Meeting.
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Cambridge Site Report John Hill 20 June 20131SouthGrid Face to Face.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.
Title of the Poster Supervised By: Prof.*********
Pete Gronbech GridPP Project Manager April 2016
Oxford Site Report HEPSYSMAN
Статус ГРИД-кластера ИЯФ СО РАН.
RHUL Site Report Govind Songara, Antonio Perez,
Presentation transcript:

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator GridPP 24 - RHUL 15 th April 2010

SouthGrid September UK Tier 2 reported CPU – Historical View to present

SouthGrid September SouthGrid Sites Accounting as reported by APEL Sites Upgrading to SL5 and recalibration of published SI2K values RALPP seem low, even after my compensation for publishing 1000 instead of 2500

SouthGrid September Site Resources HEPSPEC06 CPU (kSI2K) converted from HEPSPEC06 benchmarksStorage (TB) Site EDFA-JET Birmingham Bristol Cambridge Oxford RALPPD Totals

SouthGrid September Gridpp3 h/w generated MoU for 2012 Birmingham Bristol Cambridge Oxford RALPPD EDFA-JET Totals TB+JETHS06+JETKSI2K+JET

SouthGrid September JET Stable operation, (SL5 WNs) quiet period over Christmas Could handle more opportunistic LHC work 1772HS06 1.5TB

SouthGrid September Birmingham WN’s been SL5 since Christmas. Have an ARGUS server (no SCAS). glexec_wn installed on the local worker nodes (not the shared cluster), so we *should* be able to run multiuser pilot jobs, but this has yet to be tested. We also have a Cream CE in production, but this does not use the ARGUS server yet Some problems on the HPC, GPFS has been unreliable. Recent increases in CPU to 3344 HS06 and Disk to 114TB

SouthGrid September Bristol SL4 WN & ce01 retired, VMs ce03+04 brought online, giving 132 SL5 jobslots Metson Cluster also being used to evaluate Hadoop More than doubled in CPU size to 1836 HS06 Disk at 110 TB Storm will be upgraded to 1.5 once the SL5 version is stable

SouthGrid September Cambridge 32 new cpu cores added to bring total up to 1772 HS06 40TB of disk storage recently added bringing the total to 140TB All WNs upgraded to SL5, now investigating glexec Atlas production making good use of this site

SouthGrid September RALPP Largest SouthGrid site APEL accounting discrepancy now seems to be sorted. There was a very hot GGUS ticket which has resulted in a Savannah bug, some corrections being made, and accounting should now have been corrected. Air conditioning woes caused a load of emergency downtime. We're expecting some more downtimes (due to further a/c, power and BMS issues) but these will be more tightly planned and managed. Currently running on <50% CPU due to a/c issues, some suspect disks on WNs, some WNs awaiting rehoming in other machine room. Memory upgrades for 40 WNs, so we will have either 320 job 4GB/slot or a smaller number of slots with higher memory. These are primarily intended for Atlas simulation work. A (fairly) modest amount of extra CPU and disk purchased at end of year coming online soonish.

SouthGrid September Oxford AC failure on 23 rd December worst possible time –System more robust now –Better monitoring –Auto shutdown scripts (based on ipmi system temperature monitoring) Following SL5 upgrade Autumn 09 cluster running very well. Faulty Network switch had been causing various timeouts, replaced. DPM reorganisation completed quickly once network fixed. Atlas Squid server installed Preparing tender to purchase h/w with the 2 nd tranche of gridpp3 money

SouthGrid September Grid Cluster setup SL5 Worker Nodes T2ce04 LCG-ce T2ce05 LCG-ce t2torque02 T2wn40T2wn5xT2wn6xT2wn7xT2wn8xT2wn85 Glite 3.2 SL5 Oxford

SouthGrid September Grid Cluster setup CREAM ce & pilot setup t2ce02 CREAM Glite 3.2 SL5 T2wn41 glexec enabled t2scas01 t2ce06 CREAM Glite 3.2 SL5 T2wn Oxford

SouthGrid September Grid Cluster setup NGS integration setup ngsce-test.oerc.ox.ac.uk ngs.oerc.ox.ac.uk wn40wn5xwn6xwn7xwn8x Oxford ngsce-test is a Virtual Machine which has glite ce software installed. The glite WN software is installed via a tar ball in an NFS shared area visible to all the WN’s. PBSpro logs are rsync’ed to ngsce-test to allow the APEL accounting to match which PBS jobs were grid jobs. Contributed 1.2% of Oxfords total work during Q1

SouthGrid September Operations Dashboard used by ROD is now Nagios based

SouthGrid September gridppnagios Oxford runs the UKI Regional Nagios monitoring site. The Operations dashboard will take information from this in due course. idServiceMonitoringInfo idServiceMonitoringInfo

SouthGrid September Oxford Tier-2 Cluster – Jan 2009 located at Begbroke. Tendering for upgrade. Decommissioned January 2009 Saving approx 6.6KW Originally installed April th November 2008 Upgrade 26 servers = 208 Job Slots 60TB Disk 22 Servers = 176 Job Slots 100TB Disk Storage

SouthGrid September Grid Cluster Network setup 3com 5500 T2se0n – 20TB Disk Pool Node Worker Node 3com 5500 Backplane Stacking Cables 96Gbps full duplex T2se0n – 20TB Disk Pool Node T2se0n – 20TB Disk Pool Node Dual Channel bonded 1 Gbps links to the storage nodes Oxford 10 gigabit too expensive, so will maintain 1gigabit per ~10TB ratio with channel bonding in the new tender

SouthGrid September Production Mode Sites have to be more vigilant than ever. Closer monitoring Faster response to problems Proactive attitude to fixing problems before GGUS tickets arrive Closer interaction with Main Experimental Users Use the monitoring tools available:

SouthGrid September Atlas Monitoring

SouthGrid September PBSWEBMON

SouthGrid September Ganglia

SouthGrid September Command Line showq | more pbsnodes –l qstat –an ont2wns df –hl

SouthGrid September Local Campus Network Monitoring

SouthGrid September Gridmon

SouthGrid September Patch levels – Pakiti v1 vs v2

SouthGrid September Monitoring tools etc SitePakitiGangliaPbswebmonScas,glexec, argus JETNoYesNo BhamNoYesNoNo,yes,yes BristYes, v1YesNo CamNoYesNo OxV1 production, v2 test Yes Yes, yes, no RALPPV1YesNoNo (but started on scas)

SouthGrid September Conclusions SouthGrid sites utilisation improving Many had recent upgrades others putting out tenders Will be purchasing new hardware in gridpp3 second tranche Monitoring for production running improving