UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.

Slides:



Advertisements
Similar presentations
Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.
Advertisements

Southgrid Status Pete Gronbech: 30 th August 2007 GridPP 19 Ambleside.
SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin.
UCL HEP Computing Status HEPSYSMAN, RAL,
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.
17th October 2013Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Northgrid Status Alessandra Forti Gridpp25 Ambleside 25 August 2010.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.
Gareth Smith RAL PPD HEP Sysman. April 2003 RAL Particle Physics Department Site Report.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator GridPP 24 - RHUL 15 th April 2010.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
RAL PPD Site Update and other odds and ends Chris Brew.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
11th Oct 2005Hepix SLAC - Oxford Site Report1 Oxford University Particle Physics Site Report Pete Gronbech Systems Manager and South Grid Technical Co-ordinator.
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.
Lucien Boland and Sean Crosby Research Computing.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
2-3 April 2001HEPSYSMAN Oxford Particle Physics Site Report Pete Gronbech Systems Manager.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
13th October 2011Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
Southgrid Technical Meeting Pete Gronbech: 24 th October 2006 Cambridge.
Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.
14th October 2010Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and SouthGrid Technical Co-ordinator.
HEPSYSMAN May 2007 Oxford & SouthGrid Computing Status (Ian McArthur), Pete Gronbech May 2007 Physics IT Services PP Computing.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Oxford & SouthGrid Update HEPiX Pete Gronbech GridPP Project Manager October 2015.
11th October 2012Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
BaBar Cluster Had been unstable mainly because of failing disks Very few (
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.
Experience of Lustre at QMUL
Pete Gronbech GridPP Project Manager April 2016
Oxford University Particle Physics Unix Overview
Glasgow Site Report (Group Computing)
Experience of Lustre at a Tier-2 site
Luca dell’Agnello INFN-CNAF
Oxford Site Report HEPSYSMAN
Pete Gronbech, Kashif Mohammad and Vipul Davda
Presentation transcript:

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009

SouthGrid Status June SouthGrid Tier 2 The UK is split into 4 geographically distributed tier 2 centres SouthGrid comprise of all the southern sites not in London New sites likely to join

SouthGrid Status June UK Tier 2 reported CPU – Historical View to present

SouthGrid Status June SouthGrid Sites Accounting as reported by APEL

SouthGrid Status June Site Upgrades in the last 6 months RALPPD Increase of 960 cores (1716KSI2K) +380TB Cambridge 32 cores (83KSI2K) + 20TB Birmingham 64 cores on pp cluster and 192 cores HPC cluster which add ~535KSI2K Bristol original cluster replaced by new quad cores systems 24 cores + increased share of the HPC cluster 125KSI2k + 44TB Oxford extra 208 cores 540KSI2K + 60TB Jet extra 120 cores 240KSI2K

SouthGrid Status June New Total Q209 SouthGrid EDFA-JET Birmingham Bristol Cambridge Oxford RALPPD Totals GridPP CPU (kSI2K)Storage (TB) % of MoU CPU% of MoU Disk %105.88% %177.42% %136.36% %213.33% %207.54% %185.09%

SouthGrid Status June Site Setup Summary SiteCluster/sInstallation Method Batch System BirminghamDedicated & Shared HPCPXE, Kickstart, CFEngine. Tarball for HPC Torque BristolSmall Dedicated & Shared HPC PXE, Kickstart, CFEngine. Tarball for HPC Torque CambridgeDedicatedPXE, Kickstart, custom scripts Condor JETDedicatedKickstart, custom scripts Torque OxfordDedicatedPXE, Kickstart, CFEngine Torque RAL PPDDedicatedPXE, Kickstart, CFEngine Torque

SouthGrid Status June New Staff Jan 2009 Kashif Mohammad Deputy Technical Coordinator based at Oxford May 2009 Chris Curtis SouthGrid Hardware support based at Birmingham June 2009 Bob Cregan HPC support at Bristol

SouthGrid Status June Oxford Site Report

SouthGrid Status June Oxford Central Physics Centrally supported Windows XP desktops (~500) Physics wide Exchange Server for –BES to support Blackberries Network services for MAC OSX –Astro converted entirely to Central Physics IT services (120 OSX systems) –Started experimenting with Xgrid Media services –Photocopiers/printers replaced – much lower costs than other departmental printers. Network –Network is too large. Looking to divide into smaller pieces – better management and easier to scale to higher performance. –Wireless – introduced EDUROAM on all physics WLAN base stations. –Identified problems with 3com 4200G switch which caused a few connections to run very slowly. Now fixed. –Improved network core and computer room with redundant pairs of 3com 5500 switches.

SouthGrid Status June Oxford Tier 2 Report Major Upgrade 2007 Lack of decent Computer room with adequate power and A/C held back upgrading our 2004 kit until Autumn systems, 22 servers, 44 cpus, 176 cores. Intel 5345 Clovertown cpu’s provide ~430KSI2K, 16GB memory for each server. Each server has a 500GB SATA HD with IPMI remote KVM cards. 11 servers each providing 9TB usable storage after RAID 6, total ~99TB, 3ware ML controller. Two racks, 4 Redundant Management Nodes, 4 APC 7953 PDU’s, 4 UPS’s Old News

SouthGrid Status June Oxford Physics now has two Computer Rooms Oxford’s Grid Cluster initially housed in the departmental Computer room late 2007 Later moved to the new shared University room at Begbroke (5 miles up the road)

SouthGrid Status June Oxford Upgrade systems, 26 servers, 52 cpus, 208 cores. Intel 5420 Harpertown cpu’s provide ~540KSI2K, 16GB Low Voltage FBDIMM memory for each server. Each server has a 500GB SATA HD. 3 servers each providing 20TB usable storage after RAID 6, total ~60TB, Areca Controllers More of the same but better! 3 3com 5500 switches and backplane interconnects

SouthGrid Status June Grid Cluster setup SL5 test nodes available t2ce02t2ce04t2ce05 t2torque t2torque02 T2wn40 T2wn85 Glite 3.1 SL4 Glite 3.2 SL5 T2wn86T2wn87

SouthGrid Status June Nov 2008 Upgrade to the Oxford Grid Cluster at Begbroke Science Park

SouthGrid Status June Local PP Cluster (tier 3) Nov 2008 Upgrade same h/w as the Grid –3 Storage Nodes –8 Twins –3 3com 5500 switches with backplane interconnects –100Mb/s switches used for the management cards (ipmi and RAID) –APC Rack; Very easy to mount the APC pdu’s Still running SL4 but have a test SL5 system for users to try. We are ready to switch over when we have to. Lustre FS not yet implemented due to lack of time.

SouthGrid Status June Newer generation Intel Quads take less power Tested using one cpuburn process per core on both sides of a twin killing a process every 5 minutes. Electrical Power consumption Busy 645W Idle 410W Busy 490W Idle 320W Intel 5345 Intel 5420

SouthGrid Status June Electricity Costs* We have to pay for the electricity used at the Begbroke Computer Room: Cost in electricity to run old (4 years) Dell nodes is ~£8600 per year. (~79 KSI2k) Replacement cost in new twins is ~£6600 with electricity cost of ~£1100 per year. So saving of ~£900 in the first year and £7500 per year there after. Conclusion is, its not economically viable to run kit older than 4 years. * Jan 2008 figures

SouthGrid Status June IT related power saving Shutting down desktops when idle –Must be idle, logged off, no shared printers or disks, no remote access etc. –140 machines regularly shut down –Automatic power up early in the morning to apply patches and get ready for user (using Wake-On-LAN) Old cluster nodes removed/replaced with more efficient servers Virtualisation reduces number of servers and power. Computer room temperatures raised to improve A/C efficiency (from 19C to 23-25C) Windows 2008 server allows control of new power saving options on more modern desktop systems

SouthGrid Status June CPU Benchmarking HEPSPEC06 hostnamecpu typememoryno of coreshepspec06hepspec06/core node102.4GHZ zeon4GB273.5 node102..4GHz4GB t2wn61E GHz16GB pplxwn16E GHz16GB pplxint3E GHZ16GB These figures match closely with those published on Nehalem Dell server just arrived for testing.

SouthGrid Status June Roughly equal share between LHCb and ATLAS for CPU hours. ATLAS runs many short jobs. LHCb longer jobs. Cluster occupancy approx 70% so still room for more jobs. Local contribution To Atlas MC storage Cluster Usage at Oxford

SouthGrid Status June In out Oxford recently had its network link rate capped to 100mbs This was as a result of continuous mbs traffic caused by CMS commissioning stress testing. As it happens this test completed at the same time as we were capped, so we passed the test, and current normal use is not expected to be this high Oxfords Janet link is actually 2 * 1gbit links which had become saturated. Short term solution is to only rate cap JANET traffic to 200mbs which doesn’t impact on normal working (for now) all other on site traffic remains at 1gbs. Long term plan is to upgrade the JANET link to 10gbs within the year. 200mbps

SouthGrid Status June gridppnagios Have setup a nagios monitoring site for the UK which several other sites use to get advanced warnings of failures. idServiceMonitoringInfo idServiceMonitoringInfo