Small site approaches - Sussex

Slides:

Advertisements

Similar presentations

Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.

Advertisements

Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –

Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

UKI-SouthGrid Overview GridPP30 Pete Gronbech SouthGrid Technical Coordinator and GridPP Project Manager Glasgow - March 2012.

E-Infrastructure hierarchy Networking and Computational facilities in Armenia ASNET AM Network Armenian National Grid Initiative Armenian ATLAS site (AM-04-YERPHI)

CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.

Common Practices for Managing Small HPC Clusters Supercomputing 12

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.

JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.

Oxford & SouthGrid Update HEPiX Pete Gronbech GridPP Project Manager October 2015.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

BaBar Cluster Had been unstable mainly because of failing disks Very few (

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow EGI Technical Forum 21 st Sep.

Cambridge Site Report John Hill 20 June 20131SouthGrid Face to Face.

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.

RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.

BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.

Royal Holloway site report Simon George RAL Jun 2010.

Australia Site Report Lucien Boland Goncalo Borges Sean Crosby

NIIF HPC services for research and education

Dynamic Extension of the INFN Tier-1 on external resources

Extending the farm to external sites: the INFN Tier-1 experience

WLCG IPv6 deployment strategy

Review of the WLCG experiments compute plans

COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2015/2016

HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.

Experience of Lustre at QMUL

GRID OPERATIONS IN ROMANIA

SuperB – INFN-Bari Giacinto DONVITO.

Grid site as a tool for data processing and data analysis

Operations and plans - Polish sites

Pete Gronbech GridPP Project Manager April 2017

Cluster / Grid Status Update

LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,

HEPSYSMAN Summer June 2017 Ian Loader Chris Brew

Moroccan Grid Infrastructure MaGrid

Andrea Chierici On behalf of INFN-T1 staff

UK Grid: Moving from Research to Production

Western Analysis Facility

Jeremy Maris Research Computing IT Services University of Sussex

How to enable computing

Appro Xtreme-X Supercomputers

Update on Plan for KISTI-GSDC

Experience of Lustre at a Tier-2 site

Edinburgh (ECDF) Update

HPEiX Spring RAL Site Report

UTFSM computer cluster

Mattias Wadenstein Hepix 2010 Fall Meeting , Cornell

Статус ГРИД-кластера ИЯФ СО РАН.

Mattias Wadenstein Hepix 2012 Spring Meeting , Prague

Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.

Overview of HPC systems and software available within

ETHZ, Zürich September 1st , 2016

SiCortex Update IDC HPC User Forum

This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.

RHUL Site Report Govind Songara, Antonio Perez,

H2020 EU PROJECT | Topic SC1-DTH | GA:

HEPSYSMAN Summer June 2017 Ian Loader Chris Brew

Presentation transcript:

Small site approaches - Sussex GridP38 Collaboration Meeting Sussex 7th April 2017 Jeremy Maris/Leonardo Rojas University of Sussex David Britton, University of Glasgow IET, Oct 09

HPC origins Research Computing within ITS established 2010 Shared cluster with EPP Lustre (27 TB / 54 TB EPP) 20 nodes (11 ITS / 8 EPP / 1 Informatics) Shared management ITS and EPP Expanded to other schools / research groups Connectivity to JANET 2Gb To cluster, multiple 1Gb University of Sussex GridPP38

HPC current Current cluster located in new (2011) DC QDR Infiniband interconnect 27 x 12 core Westmere 2 x 48 core AMD 16 x 8 core Nehalem 32 x 64 core AMD 32 x 16 core 2.6GHz (Sandy Bridge onwards) 4 GPU nodes (K20, K40) 3 x 20 core E5-2630 v4 2.2GHz 600TB Lustre, 8 OSS 19 OST Univa Grid Engine Total physical cores 3168 Dedicated grid cores 256 (2114 HS06), 80TB Lustre Connectivity to JANET 20Gb; to cluster multiple 1Gb University of Sussex GridPP38

HPC DC 8 x 18kW racks Water cooled doors 1MW generator online UPS University of Sussex GridPP38

HPC upgrade New cluster Upgrade for mid-scale MPI Cosmology, CFD, Computational Chemistry, fMRI, EM Update fabric with Omni-Path for new nodes Replace 6 year old nodes LNET router to Infiniband for Lustre Bright Cluster Manager, Univa Grid Engine Hadoop on Lustre, Spark HPC nodes Centos 7.3, Grid nodes SL6.8 Connectivity to JANET 20Gb; to cluster multiple 10Gb University of Sussex GridPP38

HPC refreshed environment New cluster Centos 7.3 Omni-Path interconnect 62 x E5-2640 v3 1 x E5-2690 v4 physical cores 1020 From old cluster 32 x 64 core AMD 32 x 16 core 2.6GHz (Sandy Bridge onwards) 4 GPU nodes 3 x 20 core E5-2630 v4 2.2GHz physical cores 2620 600TB Lustre Total cores 3640 Dedicated grid cores 472 (4421 HS06) on SL6.8 University of Sussex GridPP38

HPC HSO6 shares 54k HSO6 University of Sussex GridPP38

HPC Architecture University of Sussex GridPP38

GridPP Maintain Grid Infrastructure Lustre shared file system Move SRM (Storm) from VM to new hardware (10Gbe) Cream, Apel, Argus, Bdii … XRootD Webdav VO: ATLAS production, SNO+ Previous EPP sysadmin (Emyr, Matt) funded by MPS school Last year, no EPP sysadmin. Grid kept running by ITS Research Computing (3 FTE) Now have Leo! + another sysadmin in Astronomy/Theory More sysadmin resource so better reliability and scope for development (Monitoring, OpenStack, Ceph etc.) University of Sussex GridPP38

GridPP – backfill Opportunistic use of idle CPU time via Singularity https://indico.cern.ch/event/612601/ Current utilisation ~ 75% Perhaps ~750 core, 7000 HS06, extra grid resource Pre-emptables jobs best… Under used GPU nodes for CMS, LHCb ? University of Sussex GridPP38

2018-2019 onwards Maths and Physics storage renewal Replace 400TB obsolete Lustre storage Move to Omni-Path New MDS/MDT 460TB (at least) new OST Probably using ZFS Maths and Physics compute renewal Replace AMD 64 core nodes (9k HS06) University of Sussex GridPP38