HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL

Slides:



Advertisements
Similar presentations
Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 GridPP 30, Glasgow, 26th March 2013.
Advertisements

RAL Tier1 Operations Andrew Sansum 18 th April 2012.
Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Cloud & Virtualisation Update at the RAL Tier 1 Ian Collier Andrew Lahiff STFC RAL Tier 1 HEPiX, Lincoln, NEBRASKA, 17 th October 2014.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Wahid Bhimji Andy Washbrook And others including ECDF systems team Not a comprehensive update but what ever occurred to me yesterday.
Ceph vs Local Storage for Virtual Machine 26 th March 2015 HEPiX Spring 2015, Oxford Alexander Dibbo George Ryall, Ian Collier, Andrew Lahiff, Frazer Barnsley.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Tier1 Site Report HEPSysMan 30 June, 1 July 2011 Martin Bly, STFC-RAL.
RAL Site Report HEPiX 20 th Anniversary Fall 2011, Vancouver October Martin Bly, STFC-RAL.
Tier1 Site Report HEPSysMan, RAL June 2010 Martin Bly, STFC-RAL.
Tier1 - Disk Failure stats and Networking Martin Bly Tier1 Fabric Manager.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
RAL Site Report HEPiX Spring 2011, GSI 2-6 May Martin Bly, STFC-RAL.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Virtualisation at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Annecy, 23rd May 2014.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RAL Site Report Martin Bly HEPiX Spring 2009, Umeå, Sweden.
RAL Site Report HEPiX Spring 2012, Prague April Martin Bly, STFC-RAL.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
CernVM-FS – Best Practice to Consolidate Global Software Distribution Catalin CONDURACHE, Ian COLLIER STFC RAL Tier-1 ISGC15, Taipei, March 2015.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
Core Network Services Robin Tasker 10 May Network Performance.
RAL Site Report Spring CERN 5-9 May 2008 Martin Bly.
CernVM-FS vs Dataset Sharing
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Introduction of load balancers at the RAL Tier-1
Experience of Lustre at QMUL
Pete Gronbech GridPP Project Manager April 2016
Helge Meinhard, CERN-IT Grid Deployment Board 04-Nov-2015
Virtualisation for NA49/NA61
Dag Toppe Larsen UiB/CERN CERN,
Belle II Physics Analysis Center at TIFR
Dag Toppe Larsen UiB/CERN CERN,
ATLAS Cloud Operations
INFN Computing infrastructure - Workload management at the Tier-1
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
How to enable computing
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
Virtualisation for NA49/NA61
Oxford Site Report HEPSYSMAN
Olof Bärring LCG-LHCC Review, 22nd September 2008
HPEiX Spring RAL Site Report
ATLAS Sites Jamboree, CERN January, 2017
GridPP Tier1 Review Fabric
HEPiX IPv6 Working Group F2F Meeting
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
IPv6 update Duncan Rand Imperial College London
HEPSYSMAN Summer th May 2019 Chris Brew Ian Loader
RHUL Site Report Govind Songara, Antonio Perez,
Frontier Status Alessandro De Salvo on behalf of the Frontier group
Presentation transcript:

HEPiX Spring 2014 Annecy-le Vieux 19-23 May Martin Bly, STFC-RAL RAL Site Report HEPiX Spring 2014 Annecy-le Vieux 19-23 May Martin Bly, STFC-RAL

HEPiX Spring 2014 - RAL Site Report 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Tier1 Hardware CPU: ~127k HS06 (~13k cores) Storage: ~13PB disk Tape: 10k slot SL8500 (one of two in system) FY13/14 procurement CPU: 32 x Supermicro Twin², 2 x E5-2650v2, 128GB RAM, 2 x 2TB HDD Storage: 57 x 36-bay Supermicro chassis, ~120TB useable per system 34 x 4TB WD SE HDD / LSI 9261-8i 36 x 4TB WD RE HDD / LSI 9271-4i 2008 generations being phased out 2009 generations phase out started FY14/15 procurement Similar to last year, starting soon. Usual litany of hardware updates. 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Networking Tier1 LAN Mesh network enabled Two Dell Force10 Z9000 in active-active VLT pair 2 or 4 x 40Gb/s LACP to/from each S4810P Some S4810P VLT pairs for resilience Services transferring to it Phase 1 of new Tier1 connectivity enabled Routing to RAL Site now via active/passive pair of Extreme x670V switches 20Gb/s redundant link from each Phase 2: move the firewall bypass and OPN links to new router Will provide 40Gb/s pipe to border Phase 3: 40Gb/s redundant link to RAL Site RAL LAN Migration to new firewalls almost complete Migration to new core switching infrastructure almost complete IPv6 test network soon Site WAN Dual 30Gb/s active/passive failover link to Janet6 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Processing Batch system Migration from Torque/Maui to HTCondor completed in November 2013 Currently running HTCondor 8.0.6 Very stable operation, no major problems Multicore jobs running successfully since November CEs ATLAS & CMS only using ARC CEs Gradually moving remaining VOs from CREAM to ARC Aim to phase out CREAM CEs Talk: Ian Collier ‘A year of Condor at RAL Tier 1’ 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Grid Services SL6 migration (still) mostly done Most services on VMs FTS3 A primary test site, extensive testing Now a production instance Quattor/Aquilon Talk: ‘Quattor Update’ – Ian Collier 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report CernVM-FS Deployment at RAL supported by GridPP EGI Infrastructure Initially for UK VOs extended to international small VOs and 2 NGIs Web interface for SGM to upload and unpack tarballs, and publish New GSI interface to transfer and process tarballs 11 repositories published at RAL Separate Stratum-1 service for non-LHC VOs 160GB published on Stratum-0 @ RAL EGI CVMFS task force KO meeting August 2013, regular meetings promotes the use of CVMFS technology by user communities Network of sites providing Stratum-0, Stratum-1 and squids 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Virtualisation Two production clusters with shared storage, several local storage hypervisors Windows Server 2008 + Hyper-V Issues with VMs Stability and migration problems Re-build the shared-storage clusters from scratch New configuration of networking and hardware Windows Server 2012 and Hyper-V Currently migrating most VMs to local storage systems Aim to have three ‘new’ clusters Include additional hardware with more RAM Talk: ‘RAL Tier 1 Cloud & Virtualisation’ – Ian Collier 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report CASTOR / Storage Castor June: Upgrade to new major version (2.1.14) with various improvements (disk rebalancing, xroot internal protocol) New logging system with ElasticSearch Ceph evaluations continue Talk: ‘Ceph at the UK Tier 1’ – George Ryall Storage woes 1 of 2010 sets (18 x 36TB) to be decommissioned early 4 catastrophic failures including 2 data loss over 2 years (20%) 4,3,7,3 drives thrown nearly simultaneously SM chassis, Adaptec 5405, WD 2TB RE4 WD2003FYYS So far unable to isolate – backplanes or disks 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Other stuff UPS ‘shutdown’ for circuit testing Successfully completed in November UPS generator load tests No further failures, test schedule reverted Windows XP ‘banned’ from site networks Almost all desktops and laptops upgraded to Windows 7 New telephone system rollout imminent Recruiting a grid-admin soon AFS: RAL cell was terminated on November 5th 2013 20/05/2014 HEPiX Spring 2014 - RAL Site Report

HEPiX Spring 2014 - RAL Site Report Questions? 20/05/2014 HEPiX Spring 2014 - RAL Site Report