RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.

Slides:



Advertisements
Similar presentations
RAL Tier1 Operations Andrew Sansum 18 th April 2012.
Advertisements

Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
Cloud & Virtualisation Update at the RAL Tier 1 Ian Collier Andrew Lahiff STFC RAL Tier 1 HEPiX, Lincoln, NEBRASKA, 17 th October 2014.
Ceph vs Local Storage for Virtual Machine 26 th March 2015 HEPiX Spring 2015, Oxford Alexander Dibbo George Ryall, Ian Collier, Andrew Lahiff, Frazer Barnsley.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Tier1 Site Report HEPSysMan 30 June, 1 July 2011 Martin Bly, STFC-RAL.
RAL Site Report HEPiX 20 th Anniversary Fall 2011, Vancouver October Martin Bly, STFC-RAL.
Tier1 Site Report HEPSysMan, RAL June 2010 Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
Tier1 Report Cambridge 23rd October 2006 Martin Bly.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Tier1 Site Report HEPSysMan, RAL May 2007 Martin Bly.
Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
RAL Site Report HEPiX Spring 2011, GSI 2-6 May Martin Bly, STFC-RAL.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
HEPiX Summary Fall 2014 – Lincoln, Nebraska Martin Bly.
Oxford & SouthGrid Update HEPiX Pete Gronbech GridPP Project Manager October 2015.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Virtualisation at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Annecy, 23rd May 2014.
A Year of HTCondor at the RAL Tier-1 Ian Collier, Andrew Lahiff STFC Rutherford Appleton Laboratory HEPiX Spring 2014 Workshop.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RAL Site Report Martin Bly HEPiX Spring 2009, Umeå, Sweden.
RAL Site Report HEPiX Spring 2012, Prague April Martin Bly, STFC-RAL.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
RAL Site Report Martin Bly SLAC – October 2005.
RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL.
Monitoring with InfluxDB & Grafana
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LCG Tier1 Reliability John Gordon, STFC-RAL CCRC09 November 13 th, 2008.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
Tier 1 Experience Provisioning Virtualized Worker Nodes on Demand Ian Collier, Andrew Lahiff UK Tier 1 Centre, RAL ISGC 2014.
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
HEPiX spring 2013 report HEPiX Spring 2013 CNAF Bologna / Italy Helge Meinhard, CERN-IT Contributions by Arne Wiebalck / CERN-IT Grid Deployment Board.
Australia Site Report Lucien Boland Goncalo Borges Sean Crosby
WLCG IPv6 deployment strategy
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
HPEiX Spring RAL Site Report
Monitoring at a Multi-Site Tier 1
GridPP Tier1 Review Fabric
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Presentation transcript:

RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL

26/10/2013HEPiX Fall RAL Site Report

Tier1 Hardware CPU: ~97k HS06 (~10k cores) Storage: ~8PB disk Tape: 10k slot SL8500 FY13/14 procurement –~7.0PB useable capacity disk storage –~46k HS06 CPU –Tenders evaluated, in EU standstill period –Much better benchmarking by vendors this time! –Same as the FY12/13 model with bigger drives or faster CPUs 2007 kit mostly decommissioned 2008 generation being phased out 26/10/2013HEPiX Fall RAL Site Report

Networking WAN –RAL site has migrated to new Janet6 (aka SuperJanet 6) backbone –Dual 30Gb/s active/passive failover link –Two routes on to site Tier1 link to boundary now re-established at 20Gb/s –10Gb/s to CERN, 10Gb/s to Janet6 LAN –Mix of Dell/Force10 S4810p & 60, Arista 7124, Nortel/Avaya 55xx & 5 –6xx series. Mesh and routing with z9000 and Extreme x670 –Tier1 migration to Mesh network Testing internally complete, testing routing connections 26/10/2013HEPiX Fall RAL Site Report

Batch system HTCondor selected as replacement for Torque/Maui –Rollout in progress Migrating from CREAM to ARC CEs –SL6 Current status –All batch resources on SL6 –50% CPU resources moved to HTCondor –Migration will be completed in early November Testing opportunistic use of resources from private cloud in the batch system See talk by Andrew Lahiff

Grid Services 26/10/2013HEPiX Fall RAL Site Report SL6 migration mostly done New FTS3 system –MySQL backend database, VMs –Advanced testing –Production transfers with Atlas Quattor/Aquilon –200 systems now managed with Aquilon –Research Infrastructure Group experimenting with it Most services on VMs –Have or had issues with ganglia, BDIIs CVMFS service working well –Talk on Stratum-0 by Ian Collier

CASTOR / Storage Stats as of October 2013: –64m files –Stored data capacities:14PB on tape and 8PB on disk Recent news: –Preparing for T10KD tape media for production next year –Developed a new WebDAV front end Not fully production ready yet. Next generation (disk) storage project continues –Challenge: Try to find something simple and easy to run in conjunction with CASTOR for tape access –Review of options in May concluded that there was no compelling reason to move from CASTOR at this time –Testing of CEPH continues as option for storage in Cloud infrastructure AFS: Terminating RAL cell on November 5th

UPS circuit uprating / testing Uprating of ‘Essential Power Board capacity –400A to 630A –Requires complete isolation from supply and UPS –No UPS supply to protect HA services etc Tier1, HPC and Corporate systems –Plan to have minimum service running Migrate various essential Corporate services up to Daresbury Lab Migrate Tier1 VMs to Hypervisor cluster in old computer centre –Shutdown Castor, batch, all non-essential services Mandatory testing requirement – regulatory requirement –Taking opportunity to test all UPS circuits from source Half-day for Essential Board, Castor fast-restart thereafter Batch resumes when all other services are back and stable 26/10/2013HEPiX Fall RAL Site Report

Other stuff ‘Facilities’ are using Castor, StorageD services –Run by SCD –Dedicated Castor instance with a set of associated services –Separate Nagios instance using "Icinga" rather than "Nagios” Various UPS generator ‘failures’ –Failed starts, failure to assume load etc –Due to latent faults and poor initial installation –More rigorous testing regime Full load tests conducted more often 26/10/2013HEPiX Fall RAL Site Report

Questions? 26/10/2013HEPiX Fall RAL Site Report