RAL Site Report HEPiX Spring 2015 – Oxford 23-27 March 2015 Martin Bly, STFC-RAL.

Slides:



Advertisements
Similar presentations
Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
Advertisements

Cloud & Virtualisation Update at the RAL Tier 1 Ian Collier Andrew Lahiff STFC RAL Tier 1 HEPiX, Lincoln, NEBRASKA, 17 th October 2014.
VO Sandpit, November 2009 NERC Big Data And what’s in it for NCEO? June 2014 Victoria Bennett CEDA (Centre for Environmental Data Archival)
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Tier1 Site Report HEPSysMan 30 June, 1 July 2011 Martin Bly, STFC-RAL.
RAL Site Report HEPiX 20 th Anniversary Fall 2011, Vancouver October Martin Bly, STFC-RAL.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Tier1 Site Report HEPSysMan, RAL June 2010 Martin Bly, STFC-RAL.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
Cloud services at RAL, an Update 26 th March 2015 Spring HEPiX, Oxford George Ryall, Frazer Barnsley, Ian Collier, Alex Dibbo, Andrew Lahiff V2.1.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Tier1 Site Report HEPSysMan, RAL May 2007 Martin Bly.
Tier1 Hardware Review Martin Bly HEPSysMan - RAL, June 2013.
KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
VO Sandpit, November 2009 e-Infrastructure for Climate and Atmospheric Science Research Dr Matt Pritchard Centre for Environmental Data Archival (CEDA)
RAL Site Report HEPiX Spring 2011, GSI 2-6 May Martin Bly, STFC-RAL.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Rick Claus Sr. Technical Evangelist,
Platform Disaggregation Lightening talk Openlab Major review 16 th Octobre 2014.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Virtualisation at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Annecy, 23rd May 2014.
23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RAL Site Report Martin Bly HEPiX Spring 2009, Umeå, Sweden.
RAL Site Report HEPiX Spring 2012, Prague April Martin Bly, STFC-RAL.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
Experience integrating a production private cloud in a Tier 1 Grid site Ian Collier Andrew Lahiff, George Ryall STFC RAL Tier 1 ISGC 2015 – March 20 th.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
Bob Ball/University of Michigan
Belle II Physics Analysis Center at TIFR
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
CoSeC: Computational Science Centre for Research Communities
Welcome! Thank you for joining us. We’ll get started in a few minutes.
HPEiX Spring RAL Site Report
JASMIN Success Stories
GridPP Tier1 Review Fabric
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
RHUL Site Report Govind Songara, Antonio Perez,
Presentation transcript:

RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL

Outline Intro Hardware/Tapes Software/systems Networking JASMIN 24/03/15HEPiX Spring RAL Site Report

Rutherford Appleton Lab 15 miles south of Oxford on Harwell Campus Run by STFC Multi-discipline centre supporting university and industrial research in big facilities: Neutron Science, Lasers, Space Science, Computing Hosts UK LHC Tier1 Facility 24/03/15HEPiX Spring RAL Site Report

Tier1 Hardware CPU: ~119k HS06 (~10.6k cores) –FY 14/15: additional ~42kHS06 – delivered, in test –E5-2640v3 and E5-2650v2 (Fujitsu, Supermicro, 4 sleds/2U) First WNs with 10GbE NICs Storage: ~13PB disk –TY 14/15: additional ~5.2PB – delivery and testing ongoing –Standard ‘Castor’ spec (SATA-in-a-box) + SSD for CEPH journals, second CPU, 2 x 10GbE NICs Tape: 2 x 10k slot SL8500, 80+ drives –Migration from T10KA/B to T10KD tapes completed Dedicated migration system –4 stagers, 6 T10KB drives, 2 or 3 T10KD drives Averaged 50 tapes copied per day 3000 T10KA (1.2PB data) -> 160 T10KD 3950 T10KB (CMS, 3.6PB data) -> 550 T10KD (3 months, no data loss) 24/03/15HEPiX Spring RAL Site Report

CMS Migration 24/03/15HEPiX Spring RAL Site Report

Tape Store T1 Castorfiles= data= TBTapes (C/D)= 1932 Facilitiesfiles= data= 8463 TBTapes (C/D)= 1378 ADSfiles= data= 2426 TBTapes(B)= 2126 DMFfiles= data= 1572 TBTapes (A/B)= /03/15HEPiX Spring RAL Site Report

Software and Systems I Batch: HTCondor –Very flexible, stable Storage: Castor –Updating next week to v –v pending Storage: CEPH –Evaluations continue for use of CEPH as replacement for Castor disk-only service See talk by Alastair Dewhurst Databases: –Oracle On RHEL5, /4 –Database replication for ATLAS 3D now using Oracle Goldengate –Refresh of hosts and storage in planning More IOPs, faster SAN, more TB, distribution of data –MySQL, Postgres 24/03/15HEPiX Spring RAL Site Report

Software and Systems II ~120 production VMs running Grid production services –HyperV hypervisors, local storage –Shared storage cluster work ongoing Provisioning: –Quattor Aquilon approved for production use –Talk by James Adams Logging: –Moving to Elastic Search infrastructure Monitoring: –Ganglia, Nagios, Cacti, Observium, …. Cloud: –28 system cloud resource available for department testing No production services allowed! –Talk by George Ryall 24/03/15HEPiX Spring RAL Site Report

Networking Tier1 LAN –Mesh network transfer completed –Problem with X670v routers Primary stalls routing when acting as master in master/slave pair No failover – no failure of management layer interconnect Extreme on the case –Router issue delaying progress with developments: Phase 2: 40Gb/s redundant link T1 to RAL Site Phase 3: move the firewall bypass and OPN links to x670v routers –Will provide 40Gb/s pipe to border –Small IPv6 network RAL LAN –Additional 40GbE capacity for core switching Site WAN –No changes 24/03/15HEPiX Spring RAL Site Report

24/03/15HEPiX Spring RAL Site Report

JASMIN JASMIN = Joint Analysis System Meeting Infrastructure Needs –A “Super Data Cluster” not a “Super Computer” –Data movement and analysis. –Funded by NERC for all of NERC sciences. –Hosted at STFC RAL by the Research Infrastructure group of the SCD Satellite data, weather data, climate simulations from big HPC systems (“Archer”, MetOffce “Monsoon”, DKRZ.) –JASMIN Holds >60% of Data used by the latest IPCC report on Climate change –Largest data set 600TB 24/03/15HEPiX Spring RAL Site Report

JASMIN Tech 16 PB useable (20PB raw) – ~ 3,200,000 DVD’s = ~ 6km high tower of DVDs or > 36,000 years of MP3 –Two largest Panasas ‘realms’ in the world (109 and 125 shelves). –Largest single site Panasas customer in the world (251 shelves) –900TB useable (1.44PB raw) NetApp iSCSI/NFS for virtualisation + Dell Equallogic PS6210XS for high IOPS low latency iSCSI 4,000 CPU cores split dynamically between batch cluster and cloud/virtualisation (VMware vCloud Director and vCenter/vSphere) 24/03/15HEPiX Spring RAL Site Report

JASMIN Net >3 Tb/s bandwidth (~3500 DVD’s per minute) “hyper” converged network infrastructure –10GbE + MPI low latency (~10uS) + iSCSI over same network fabric No separate SAN or Infiniband Finalist for BCS UK industry Awards “Big Data Project of the Year” 2012 and 2014 Managed with 2FTE, recruiting for a 3 rd team member 24/03/15HEPiX Spring RAL Site Report

Other stuff Electrical ‘shutdown’ for circuit testing –Completed in January Phased circuit testing Tier1 continued to operate with minimal loss of batch capacity, no loss of storage access No issues major issues –a few wrongly identified circuits –one or two overload trips in batch racks Disposals –2006/7/8 and older hardware being scrapped Information security requirement to dispose of data securely Disks ‘blanked’, SNs recorded, scrapped Failed drives sent for secure disposal New telephone handsets being tested – VoiP –Call clarity said to be ‘stunning’ 24/03/15HEPiX Spring RAL Site Report

Questions? 24/03/15HEPiX Spring RAL Site Report Used tapes, anyone?