UKI-SouthGrid Overview GridPP30 Pete Gronbech SouthGrid Technical Coordinator and GridPP Project Manager Glasgow - March 2012.

Slides:

Advertisements

Similar presentations

Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.

Advertisements

UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010.

SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin.

UKI-SouthGrid Overview GridPP27 Pete Gronbech SouthGrid Technical Coordinator CERN September 2011.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Oxford Site Update HEPiX Sean Brisbane Tier 3 Linux System Administrator March 2015.

Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.

SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.

UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney.

Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator GridPP 24 - RHUL 15 th April 2010.

Quarterly report SouthernTier-2 Quarter P.D. Gronbech.

CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.

SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.

Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.

UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.

UK middleware deployment GridPP27 - CERN 15 th September 2011 GridPP27 - CERN 15 th September 2011 Status & plans Jeremy Coles.

SL6 Status at Oxford. Status  SL6 EMI-3 CREAMCE  SL6 EMI3 WN and gLExec  Small test cluster with three WN’s  Configured using Puppet and Cobbler 

Southgrid Technical Meeting Pete Gronbech: 24 th October 2006 Cambridge.

Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.

GridPP Dirac Service The 4 th Dirac User Workshop May 2014 CERN Janusz Martyniak, Imperial College London.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.

Grid DESY Andreas Gellrich DESY EGEE ROC DECH Meeting FZ Karlsruhe, 22./

Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.

Oxford & SouthGrid Update HEPiX Pete Gronbech GridPP Project Manager October 2015.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.

The GridPP DIRAC project DIRAC for non-LHC communities.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

BaBar Cluster Had been unstable mainly because of failing disks Very few (

RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.

A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

WLCG Operations Coordination Andrea Sciabà IT/SDC 10 th July 2013.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

The GridPP DIRAC project DIRAC for non-LHC communities.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.

DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.

Cambridge Site Report John Hill 20 June 20131SouthGrid Face to Face.

Accounting Update John Gordon. Outline Multicore CPU Accounting Developments Cloud Accounting Storage Accounting Miscellaneous.

EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.

HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.

18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.

WLCG IPv6 deployment strategy

Pete Gronbech GridPP Project Manager April 2016

Moving from CREAM CE to ARC CE

Update on Plan for KISTI-GSDC

Oxford Site Report HEPSYSMAN

Simulation use cases for T2 in ALICE

DPM releases and platforms status

Small site approaches - Sussex

Presentation transcript:

UKI-SouthGrid Overview GridPP30 Pete Gronbech SouthGrid Technical Coordinator and GridPP Project Manager Glasgow - March 2012

SouthGrid March UK Tier 2 reported CPU – Historical View to present Last reported stats at CERN in Sept 2011, so data since then.

SouthGrid March SouthGrid Sites Accounting as reported by APEL

VO Usage SouthGrid March Usage dominated by LHC VOs. 6% Non LHC

Non LHC VOs 5 A wide range of ‘Other VOs’

Gridpp4 h/w generated MoU for from Steve Lloyd 2012 TB2013 TB2014 TB bham bris cam ox Jet RALPPD Total HS HS HS06 bham bris cam ox Jet RALPPD SouthGrid December 2012 Q412 Resources Total available to GridPP SiteHEPSPEC06Storage (TB) EFDA JET Birmingha m Bristol Cambridge Oxford RALPP Sussex Totals

Question of MoU’s New experimental requirements in Sept 2012 generated new increased shares. SouthGrid March

Dave Britton generated MoU shown at GridPP TB bham269 bris68 cam214 ox567 RALPPD890 Total HS06 bham2990 bris1271 cam528 ox3685 RALPPD13653 Total SouthGrid December 2012 Q412 Resources Total available to GridPP SiteHEPSPEC06Storage (TB) EFDA JET Birmingham Bristol Cambridge Oxford RALPP Sussex Totals Up 151TB Up 2933 HS06

JET The site has been under used for the last 6 months. This is a non Particle Physics site so all LHC work is a bonus. SouthGrid March Essentially a pure CPU site –1772 HepSPEC06 –10.5 Tb of storage All service nodes have been upgraded to EMI2 CVMFS has been setup and configured for LHCb and Atlas. Could be utilised much more!! Active non LHC VOs : Biomed, esr, fusion and Pheno

Birmingham Tier 2 Site SouthGrid March Most active Non LHC VOs : Biomed and ILC. MS has helped a local Neuroscience group set up a VO, grid work to follow. (Some involvement with Mark Slater is the LHCb UK Operations rep/ shifter Major VO’s are Atlas, ALICE and LHCb. Middleware: DPM, WN: EMI 2, Everything else: UMD 1 Complete overhaul of aircon in the last 12 months. One unit (14kW) left to be installed in the next couple of weeks (hopefully!) CVMFS fully installed. Now providing 110TB space for ALICE in their own xrootd area. No other xrootd/webDAV updates

Bristol Status StoRM SE upgraded to SL6 Storm problematic at first but StoRM developers helped with modify from default config. Helped debug why it was publishing 0 used, apparently known bug. Upgrades to EMI middleware has improved CMS site readiness CVMFS setup for CMS, Atlas & LHCb Onoing development of Hadoop SE: gridFTP + SRM server ready, set-up of PhEDEx(Debug) in progress Active non LHC Vos : ILC Landslides VO work currently on hold. Working with CMS to plan the best way forward for Bristol SouthGrid March

Cambridge Status –CPU : 140 job slots, 1657 HS06 –Storage : 277TB [si] –Most active non LHC Vos: Camont, but almost exclusively and Atlas /LHCb site. The Camtology/Imense work at Cambridge has essentially finished (we still host some of their kit) We have an involvement in a couple of Computational Radiotherapy projects (VoxTox and AccelRT) where they may possibly be some interest in using the Grid SouthGrid March

RALPP SouthGrid’s biggest site. Major VO’s CMS, Atlas and LHCb Non LHC VOs – Biomed, ILC and esr. Planned migration to a new computer room this year, with six water cooled racks. Will try to minimise downtime. Other racks will move to the Atlas building. SE is dCache – planning to upgrade to 2.2 in the near future. 20Gbit link between the two computer rooms. Rob Harper is on the security team New member of staff (Ian Loader) starting very soon. SouthGrid March

Oxford Oxford’s workload is dominated by ATLAS analysis and production SouthGrid March Most active non LHC VOs : esr, fusion, hone, pheno, t2k and zeus Recent Upgrades –We have a 10Gbit link to the JANET router but is currently rate capped at 5Gbps. –Oxford will get a second 10Gbit line enabled soon and then the rate cap can get lifted SouthGrid Support –Providing support for Bristol, Sussex and JET –The Landslides VO supported at Oxford and Bristol –Helped bring Sussex onto the Grid as an Atlas site Oxford Particle Physics Masterclasses with Grid Computing talk.

Other Oxford Work CMS Tier 3 –Supported by RALPPD’s PhEDEx server. Now configured to use CVMFS and xrootd as the local file access protocol –Useful for CMS, and for us, keeping the site busy in quiet times –However can block Atlas jobs so during accounting period max running jobs limit applied –Largest non CMS Tier-2 site in UK ALICE Support –The ALICE computational requirements are shared between Birmingham and Oxford. UK Regional Monitoring –Kashif runs the nagios based WLCG monitoring on the servers at Oxford –These include the Nagios server itself, and support nodes for it, SE, MyProxy and WMS/LB –KM also remotely manages the failover instance at Lancaster –There are very regular software updates for the WLCG Nagios monitoring. VOMS server replication at Oxford (and IC) Early Adopters –In the past were official early adopters for testing of CREAM, ARGUS and torque_utils. Recently have tested early in a less official way. SouthGrid March

Multi VO Nagios Monitoring Monitoring three VO’s –T2k.org –Snoplus.snolab.ca –Vo.southgrid.ac.uk Customized to suit small VO’s –Only using direct job submission –Test jobs submitted every 8 hours –Test jobs can stay in queue for 6 hours before being cancelled by Nagios

Using VO-feed for topology information –VO-feed is also hosted at Oxford –Fine grained control of what to monitor –But it requires manual changes Probably it is the first production Multi-VO Nagios instance in egi –Found many bugs –It worked very well with two VO’s but after adding third VO it showing some strange issues. –Opened a GGUS ticket and working on it Multi VO Nagios Monitoring

GridPP Cloud work at Oxford Working to provide an OpenStack based cloud infrastructure at Oxford OpenStack Folsom release installed with the help of Cobbler and Puppet –Using RHEL 6.4 –Most of installation and configuration is automated Started with three machines –One controller node running OpenStack core services –One Compute node running Nova-compute and Nova-network –One storage node to provide block storage and NFS mount for glance images Plan to add more compute nodes in future We are open to provide our infrastructure for testing –

XrootD and WebDAV Xrootd and WebDAV on DPM Completely separate services, but similar minimum requirements, so if you can do XrootD, you can do WebDAV Local XrootD is pretty well organised, federations currently require name lookup libraries which lack a good distribution mechanism. Configuration is mostly boiler plate – copy ours: SouthGrid March # Federated xrootd DPM_XROOTD_FEDREDIRS="atlas-xrd-uk.cern.ch:1094:1098,atlas,/atlas xrootd.ba.infn.it:1094:1213,cms,/store" # Atlas federated xrootd DPM_XROOTD_FED_ATLAS_NAMELIBPFX="/dpm/physics.ox.ac.uk/home/atlas" DPM_XROOTD_FED_ATLAS_NAMELIB="XrdOucName2NameLFC.so root=/dpm/physics.ox.ac.uk/home/atlas match=t2se01.physics.ox.ac.uk" DPM_XROOTD_FED_ATLAS_SETENV="LFC_HOST=prod-lfc-atlas-ro.cern.ch LFC_CONRETRY=0 GLOBUS_THREAD_MODEL=pthread CSEC_MECH=ID" # CMS federated xrootd DPM_XROOTD_FED_CMS_NAMELIBPFX="/dpm/physics.ox.ac.uk/home/cms" DPM_XROOTD_FED_CMS_NAMELIB="libXrdCmsTfc.so file:/etc/xrootd/storage.xml?protocol=xroot" # General local xrootd DPM_XROOTD_SHAREDKEY=“bIgl0ngstr1ng0FstuFFi5l0ng" DPM_XROOTD_DISK_MISC="xrootd.monitor all rbuff 32k auth flush 30s window 5s dest files info user io redir atl-prod05.slac.stanford.edu:9930 if exec xrootd xrd.report atl-prod05.slac.stanford.edu:9931 every 60s all -buff -poll sync fi" DPM_XROOTD_REDIR_MISC="$DPM_XROOTD_DISK_MISC" DPM_XROOTD_FED_ATLAS_MISC="$DPM_XROOTD_DISK_MISC"

XrootD The main source of documentation is here: We’ve also switched to using xrootd for ATLAS and CMS local file access (which is a VO side change, not a site one), but this isn’t making use of the federation yet. ATLAS are currently using XrootD based file stager copies, not XrootD direct IO. We do hope to try that too. All the xrootd traffic currently gets reported to the ATLAS FAX monitoring, which made the graphs look a bit odd when we turned it on: SouthGrid March

WebDAV SouthGrid March # DPM webdav DPM_DAV="yes" # Enable DAV access DPM_DAV_NS_FLAGS="Write" # Allow write access on the NS node DPM_DAV_DISK_FLAGS="Write" # Allow write access on the disk nodes DPM_DAV_SECURE_REDIRECT="On" # Enable redirection from head to disk using plain HTTP. WebDAV is simpler: Commodity clients work, but nothing supports everything you’d want. More here: lcgdm/wiki/Dpm/WebDAV/Cli entTutorial

Sussex has a significant local ATLAS group, their system is designed for the high IO bandwidth patterns that ATLAS analysis can generate. EMI2 middleware installed CVMFS installed and configured Setup as an Atlas production site running jobs in anger since February 2013 New 64 core node arriving in the next week and a further 128 cores and 120TB to be added in the summer. SNO+ is expected to start using the site shortly. JANET link scheduled to be upgraded from current 2Gb (plus 1Gb resilient failover) to 10Gb in Autumn Sussex SouthGrid March

SouthGrid March Conclusions SouthGrid seven sites well utilised, but some sites small compared with others. Birmingham supporting Atlas, Alice and LHCb. Bristol; Have upgraded to the latest version of STORM, and EMI middleware and have been available to CMS since mid December. Hope to be better utilised by CMS now. Local funding will be used to enhance the CPU and storage. Cambridge; size was reduce when Condor part of the cluster decommissioned, local funding to be used to boost capacity. JET continue to be available as a CPU site but very little storage available. However CVMFS and all middleware at EMI2. Should be a useful MC site for LHCb Oxford wide involvement in many VOs and areas of development and GridPP infrastructure. RALPPD remain SouthGrid s largest site and a major CMS contributor. Sussex; successfully running as an Atlas site with some local upgrades planned for this year.