RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska 13-17 October 2014 Martin Bly, STFC-RAL.

Slides:



Advertisements
Similar presentations
Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 GridPP 30, Glasgow, 26th March 2013.
Advertisements

© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
Cloud & Virtualisation Update at the RAL Tier 1 Ian Collier Andrew Lahiff STFC RAL Tier 1 HEPiX, Lincoln, NEBRASKA, 17 th October 2014.
Ceph vs Local Storage for Virtual Machine 26 th March 2015 HEPiX Spring 2015, Oxford Alexander Dibbo George Ryall, Ian Collier, Andrew Lahiff, Frazer Barnsley.
HYPER-V AND SYSTEM CENTER ENABLING THE PRIVATE CLOUD Nicholas Papé Partner Technology Advisor Microsoft UK.
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Server Access and Virtualization Business Unit Cisco Nexus 1010.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Thursday November 15, System Center User Group Philadelphia Chapter.
Tier1 Site Report HEPSysMan 30 June, 1 July 2011 Martin Bly, STFC-RAL.
RAL Site Report HEPiX 20 th Anniversary Fall 2011, Vancouver October Martin Bly, STFC-RAL.
System Center 2012 Setup The components of system center App Controller Data Protection Manager Operations Manager Orchestrator Service.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Tier1 Site Report HEPSysMan, RAL June 2010 Martin Bly, STFC-RAL.
Tier1 - Disk Failure stats and Networking Martin Bly Tier1 Fabric Manager.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013.
CEMS: The Facility for Climate and Environmental Monitoring from Space Victoria Bennett, ISIC/CEDA/NCEO RAL Space.
Microsoft Virtual Academy. 2 Competitive Advantages I - Core VirtualizationII - Private Cloud.
Sydney Region IT School Support Term Smaller Servers available on Contract.
Planning and Designing Server Virtualisation.
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
ITServices Virtualization Terry Black January 2013.
Network Plus Virtualization Concepts. Virtualization Overview Virtualization is the emulation of a computer environment called a Virtual Machine. A Hypervisor.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
CHARUSAT CLOUD PROJECT. Phases 1.Hardware Commissioning 2.Implementing Cluster 3.Implementing VMware 4.Migration of campus servers to cloud…..
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
VO Sandpit, November 2009 e-Infrastructure for Climate and Atmospheric Science Research Dr Matt Pritchard Centre for Environmental Data Archival (CEDA)
RAL Site Report HEPiX Spring 2011, GSI 2-6 May Martin Bly, STFC-RAL.
Rick Claus Sr. Technical Evangelist,
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
Virtualisation at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Annecy, 23rd May 2014.
Arne Wiebalck -- VM Performance: I/O
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
RAL Site Report HEPiX Spring 2012, Prague April Martin Bly, STFC-RAL.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
RAL Site Report HEPiX Spring 2015 – Oxford March 2015 Martin Bly, STFC-RAL.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Juraj Sucik, Michal Kwiatek, Rafal.
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Office of Administration Enterprise Server Farm September 2008 Briefing.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.
Brian Lauge Pedersen Senior DataCenter Technology Specialist Microsoft Danmark.
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Luca dell’Agnello INFN-CNAF
Oxford Site Report HEPSYSMAN
HPEiX Spring RAL Site Report
JASMIN Success Stories
GridPP Tier1 Review Fabric
HEPiX IPv6 Working Group F2F Meeting
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
RHUL Site Report Govind Songara, Antonio Perez,
Presentation transcript:

RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL

14/10/2015HEPiX Fall RAL Site Report

Tier1 Hardware CPU: ~127k HS06 (~13k cores) Storage: ~13PB disk Tape: 10k slot SL8500 (one of two in system) FY14/15 procurement –Tenders ‘in flight’, closing 17 th October –Expect to procure 6PB and 42kHS06 –Depends on price… New this time: –Storage capable of both Castor and CEPH Extra SSDs for CEPH journals –10GbE for WNs 14/10/2015HEPiX Fall RAL Site Report

Networking Tier1 LAN –Mesh network transfer progressing slowly –Phase 1 of new Tier1 connectivity enabled –Phase 2: move the firewall bypass and OPN links to new router Will provide 40Gb/s pipe to border –Phase 3: 40Gb/s redundant link T1 to RAL Site RAL LAN –Migration to new firewalls completed –Migration to new core switching infrastructure almost complete –Sandboxed IPv6 test network available Site WAN –No changes 14/10/2015HEPiX Fall RAL Site Report

Network Weathermap 14/10/2015HEPiX Fall RAL Site Report

Virtualisation Issues with VMs –Had two production clusters with shared storage, several local storage hypervisors –Windows Server Hyper-V –Stability and migration problems on shared storage systems –Migrated all services to local storage clusters New HV clusters –New configuration of networking and hardware –Windows Server 2012 and Hyper-V –Three production clusters Include additional hardware with more RAM Tiered storage on primary clusters 14/10/2015HEPiX Fall RAL Site Report

CASTOR / Storage Castor –June: Upgrade to new major version (2.1.14) with various improvements (disk rebalancing, xroot internal protocol) Upgrade complete –New logging system with ElasticSearch –Draining disk servers still slow Major production problem Ceph –Evaluations continue on the small test cluster SSDs for journals installed in cluster nodes –Testing shows mixed performance results, needs more study –Large departmental resource 30 servers, ~1PB total –Dell R520, 8 x 4TB SATA HDD, 32GB RAM, 2 x E5-2403v2, 2 x 10GbE 14/10/2015HEPiX Fall RAL Site Report

Storage failover What if the local storage is unavailable? What if someone else’s local storage is unavailable Xrootd allows for remote access of data resources on demand if local data is not available At RAL, bulk data traffic bypasses firewall –To/from OPN and SJ6 for disk servers only –NOT WNs What happens at firewall? –Concern for non-T1 traffic if we have a failover Tested with assistance from CMS Firewall barely notices –Very small setup, then transfer offload to ASICs –Larger test to come 14/10/2015HEPiX Fall RAL Site Report

JASMIN/CEMS Hardware Sept 2014RIG The JASMIN super-data-cluster UK and Worldwide climate and weather modelling community. Climate and Environmental Monitoring from Space (CEMS) …and all of NERC environmental sciences since JASMIN2 Eg Environmental genomics, mud slides etc Facilitating further comparison and evaluation of models with data. 12 PB Storage Panasas at STFC (Largest in the world) Fast Parallel IO to Physical and VM servers Largest capacity Panasas installation in the world (204 shelves) Arguably one of top ten IO systems in the world (~250GByte/sec) Virtualised and Physical Compute (~3500 cores) Physical Batch compute “LOTUS” User + Admin provisioned Cloud of virtual machines Data xfer private network links to UK and World sites

JASMIN2  Expanded from 5.5PB to 12PB of high performance disc and added ~3,000 CPU cores + ~5PB tape  Largest single site Panasas deployment in the world  Benchmarks suggest this might be in the top ten IO systems in the world  Includes a large (100 server + 400TB NetApp VMDK storage) VMware vcloud Director Cloud deployment with custom user ortal  Gb eth ports non-blocking, zero congestion L3 ECMP/OSPF low latency (7.5mS MPI) interconnect –One converged network for everything –Implementing VXLAN L2 over L3 technology for Cloud  Same SuperMicro servers used for batch/MPI computing and cloud/hypervisor work –Mellanox ConnectX3 pro NICs do low latency for MPI and VXLAN offload for Cloud –Servers all 16 core Ivy Bridge, 128GByte with some at 512GB. All 10Gb networking  JASMIN3 this year will add mostly 2Tbyte RAM servers and several PB's of storage

Other stuff Shellshock –Patched exposed systems quickly –Bulk done within days –Long tail of systems to chase down Electrical ‘shutdown’ for circuit testing –Scheduled for January Phased circuit testing Tier1 will continue to operate, possibly with some reduced capacity Windows XP (still) banned from site networks New telephone system rollout complete Recruited a grid-admin, starts soon Recruiting a System Admin and a Hardware technician soon 14/10/2015HEPiX Fall RAL Site Report

Questions? 14/10/2015HEPiX Fall RAL Site Report